Text classification model training method, text classification method, and related device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By filtering difficult-to-classify texts and using a pre-trained language model to recall similar texts to construct a dataset, the text classification model is trained, solving the inefficiency problem caused by manual annotation in existing technologies and achieving efficient text classification model training and improved accuracy.

CN122240830APending Publication Date: 2026-06-19SHENZHEN TENCENT COMP SYST CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SHENZHEN TENCENT COMP SYST CO LTD
Filing Date: 2026-05-12
Publication Date: 2026-06-19

Application Information

Patent Timeline

12 May 2026

Application

19 Jun 2026

Publication

CN122240830A

IPC: G06F16/35; G06F16/353; G06F40/30; G06F18/21; G06F18/241; G06N5/04; G06N3/045; G06N3/08

AI Tagging

Application Domain

Semantic analysis Inference methods

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing technologies, the training of text classification models relies on manual annotation, which results in time-consuming and labor-intensive training phases and low training efficiency.

Method used

By acquiring multiple sample texts, filtering out texts that are difficult to classify, using a pre-trained language model to classify the text and infer the reasons for classification, recalling similar texts to construct a classification dataset, and training the text classification model to achieve automated closed-loop optimization.

Benefits of technology

It improves the training efficiency and accuracy of text classification models, and automates the process from problem discovery to problem solving without human intervention, enabling rapid iteration.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122240830A_ABST

Patent Text Reader

Abstract

This application provides a training method for a text classification model, a text classification method, and related apparatus. The method includes: inputting each first sample text into a first text classification model to obtain the confidence score of the first predicted category for each first sample text; filtering multiple first sample texts based on the confidence scores of their respective first predicted categories to obtain difficult-classified texts; inputting classification hints and difficult-classified texts into a pre-trained language model to obtain the second predicted category and first classification reason for the difficult-classified texts; recalling similar texts from multiple texts to be recalled based on the first classification reason; constructing a first classification dataset based on the difficult-classified texts, similar texts, and the second predicted category; and training the first text classification model based on the first classification dataset to obtain a second text classification model. This application can improve the training efficiency of text classification models and achieve rapid iteration.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer science, and in particular to a training method for a text classification model, a text classification method, and related apparatus. Background Technology

[0002] Text classification, a core task of natural language processing, aims to automatically map input text to a predefined category system and is widely used in various application scenarios. For example, in game conversation scenarios, it is necessary to identify whether the conversation text is normal or malicious; if it is malicious, then actions such as blocking or muting should be taken to maintain the game environment.

[0003] With the development of deep learning technology, text classification models are now commonly used for text classification. To improve text classification accuracy, these models need to be trained beforehand. However, current solutions rely on manual labeling of text categories, making the training phase of text classification models extremely time-consuming and labor-intensive, resulting in low training efficiency. Summary of the Invention

[0004] This application provides a training method for a text classification model, a text classification method, and related apparatus, which can improve the training efficiency of the text classification model.

[0005] To solve the above-mentioned technical problems, this application provides the following technical solution: According to one aspect of this application, a method for training a text classification model is provided, comprising: Obtain multiple first sample texts; Each of the first sample texts is input into the first text classification model to obtain the first predicted category of each first sample text and the confidence level of the first predicted category of each first sample text. Based on the confidence level of each of the first sample texts in the first predicted category, the multiple first sample texts are filtered to obtain difficult-to-classify texts; The classification prompt words and the difficult-to-classify text are input into the pre-trained language model to obtain the second predicted category of the difficult-to-classify text and the first classification reason of the second predicted category of the difficult-to-classify text. The classification prompt words are used to guide the pre-trained language model to perform text classification and classification reason inference. Based on the first classification reason of the second predicted category of the difficult-classified text, similar texts that are similar to the difficult-classified text are recalled from a plurality of texts to be recalled, wherein the texts to be recalled represent the first sample texts other than the difficult-classified text; A first classification sample is constructed based on the difficult-to-classify text and the second predicted category; a second classification sample is constructed based on the similar text and the second predicted category; and a first classification dataset is constructed based on the first classification sample and the second classification sample. The first text classification model is trained based on the first classification dataset to obtain the second text classification model.

[0006] According to one aspect of this application, a text classification method is provided, comprising: Get the text to be categorized; The text to be classified is input into the second text classification model to obtain the fourth predicted category of the text to be classified. The second text classification model is trained using the training method described above.

[0007] According to one aspect of this application, a training apparatus for a text classification model is provided, comprising: The sample acquisition module is used to acquire multiple first sample texts; The first prediction module is used to input each first sample text into the first text classification model to obtain the first predicted category of each first sample text and the confidence level of the first predicted category of each first sample text. The sample filtering module is used to filter multiple first sample texts according to the confidence level of their respective first predicted categories to obtain difficult-classification texts. The second prediction module is used to input classification prompt words and the difficult-to-classify text into a pre-trained language model to obtain the second predicted category of the difficult-to-classify text and the first classification reason of the second predicted category of the difficult-to-classify text. The classification prompt words are used to guide the pre-trained language model to perform text classification and classification reason inference. The sample recall module is used to recall similar texts that are similar to the difficult-classified text from a plurality of texts to be recalled, based on the first classification reason of the second predicted category of the difficult-classified text, wherein the texts to be recalled represent the first sample texts other than the difficult-classified text. The dataset construction module is used to construct a first classification sample based on the difficult-to-classify text and the second predicted category, construct a second classification sample based on the similar text and the second predicted category, and construct a first classification dataset based on the first classification sample and the second classification sample. The first training module is used to train the first text classification model based on the first classification dataset to obtain the second text classification model.

[0008] Optionally, the second prediction module is further configured to: The classification prompt words and each of the texts to be recalled are input into the pre-trained language model to obtain the third predicted category of each text to be recalled and the second classification reason of the third predicted category of each text to be recalled. The first classification reason of the second predicted category of the difficult-to-classify text and the second classification reason of the third predicted category of each of the texts to be recalled are matched to obtain the classification reason matching result between the difficult-to-classify text and each of the texts to be recalled; Based on the classification reason matching results between the difficult-classified text and the plurality of texts to be recalled, similar texts that are similar to the difficult-classified text are recalled from the plurality of texts to be recalled.

[0009] Optionally, the second prediction module is further configured to: Based on the first semantic features extracted from the difficult-classified text by the first text classification model and the second semantic features extracted from each of the texts to be recalled by the first text classification model, a first semantic similarity between the difficult-classified text and each of the texts to be recalled is determined. Based on the classification reason matching result between the difficult-classified text and each of the texts to be recalled, the first semantic similarity between the difficult-classified text and each of the texts to be recalled is updated to obtain the second semantic similarity between the difficult-classified text and each of the texts to be recalled. Based on the second semantic similarity between the difficult-classified text and the plurality of texts to be recalled, similar texts that are similar to the difficult-classified text are recalled from the plurality of texts to be recalled.

[0010] Optionally, the second prediction module is further configured to: When the classification reason matching result between the difficult-classified text and each of the texts to be recalled is a successful match, the first semantic similarity between the difficult-classified text and each of the texts to be recalled is amplified to obtain the second semantic similarity between the difficult-classified text and each of the texts to be recalled. When the classification reason matching result between the difficult-classified text and each of the texts to be recalled fails, the first semantic similarity between the difficult-classified text and each of the texts to be recalled is determined as the second semantic similarity between the difficult-classified text and each of the texts to be recalled.

[0011] Optionally, the second prediction module is further configured to: The first classification reason of the second predicted category of the difficult-to-classify text is embedded to obtain the first classification reason vector corresponding to the difficult-to-classify text; The second classification reason of the third predicted category of each text to be recalled is embedded to obtain the second classification reason vector corresponding to each text to be recalled; Vector matching is performed on the first classification reason vector corresponding to the difficult-to-classify text and the second classification reason vector corresponding to each of the texts to be recalled to obtain the classification reason matching result between the difficult-to-classify text and each of the texts to be recalled.

[0012] Optionally, the second prediction module is further configured to: The first classification reason of the second predicted category of the difficult-to-classify text and the second classification reason of the third predicted category of each of the texts to be recalled are subjected to literal matching processing to obtain the classification reason matching result between the difficult-to-classify text and each of the texts to be recalled.

[0013] Optionally, the sample recall module is further configured to: The recall prompt, the difficult-classified text, and multiple texts to be recalled are input into the pre-trained language model to obtain similar texts that are similar to the difficult-classified text. The recall prompt words are used to guide the pre-trained language model to recall texts from multiple texts to be recalled based on the difficult-classified texts.

[0014] Optionally, the first text classification model includes a backbone network and a classification head, wherein the backbone network includes N cascaded encoding layers, where N is an integer greater than 2, and the first training module is further configured to: Based on the first classification dataset, the encoding layer to be trained and the classification head in the first text classification model are trained to obtain the second text classification model; The coding layers to be trained include the nth coding layer to the Nth coding layer, where n is an integer greater than 1 and not exceeding N.

[0015] Optionally, the training method of the text classification model is applied to a conversational scenario, and the training device for the text classification model further includes a second training module, which is used for: Select the first conversation text from the conversation scenario; Select a second conversation text that has a contextual relationship with the first conversation text from the conversation scenario; Combine the first conversation text and the second conversation text into a second sample text, and obtain the tag category corresponding to the second sample text; A third category sample is constructed based on the second sample text and the label category, and a second category dataset is constructed based on the third category sample; The third text classification model is trained based on the second classification dataset to obtain the first text classification model.

[0016] Optionally, the first conversation text belongs to the first conversation stream in the conversation scenario, and the second training module is further configured to: Select adjacent conversation texts from the first conversation stream that are adjacent to the first conversation text; Determine the transmission time interval between the first session text and the adjacent session text; When the transmission time interval is less than a time threshold, the adjacent session text is determined to be the second session text that has a contextual relationship with the first session text.

[0017] Optionally, the second training module is further configured to: Perform at least one of the following processes: Select a second conversation text from the conversation scenario that has a contextual relationship with the first conversation text and belongs to the same conversation party as the first conversation text; Select a second conversation text from the conversation scenario that has a contextual relationship with the first conversation text and belongs to a different conversation party than the first conversation text.

[0018] Optionally, the session scenario includes multiple session streams, and the second training module is further configured to: Determine the session density corresponding to each of the multiple session streams; Based on the session density corresponding to each of the multiple session streams, the multiple session streams are filtered to obtain a first session stream; Select the first session text from the first session stream.

[0019] Optionally, the second training module is further configured to: Construct a mask prediction dataset based on the first session text; The third text classification model is jointly trained based on the mask prediction dataset and the second classification dataset to obtain the first text classification model.

[0020] Optionally, the second training module is further configured to: Based on the mask prediction dataset, determine the mask prediction loss value of the third text classification model when performing the mask prediction task; Based on the second classification dataset, determine the classification loss value of the third text classification model when performing the classification task; The mask prediction loss value and the classification loss value are fused together to obtain a fused loss value; The third text classification model is trained based on the fusion loss value to obtain the first text classification model.

[0021] Optionally, the second training module is further configured to: Select a third conversation text from the conversation scenario that has no contextual relationship with the first conversation text; The first conversation text and the third conversation text are combined into a third sample text; Based on the second sample text and the context relationship label, construct the next sentence prediction positive sample, wherein the context relationship label is used to indicate that there is a context relationship between the first session text and the second session text in the second sample text; Based on the third sample text and the non-contextual relationship label, construct the next sentence prediction negative sample, where the non-contextual relationship label is used to indicate that the first conversation text in the third sample text has a non-contextual relationship with the third conversation text. Construct a next sentence prediction dataset based on the next sentence prediction positive samples and the next sentence prediction negative samples; The fourth text classification model is trained based on the next sentence prediction dataset to obtain the third text classification model.

[0022] Optionally, the first conversation text belongs to the first conversation stream in the conversation scenario, and the second training module is further configured to: Perform at least one of the following processes: Random conversation text is selected from the second conversation stream, and the random conversation text is determined to be the third conversation text that has a non-contextual relationship with the first conversation text. The second conversation stream is different from the first conversation stream. Select non-adjacent session texts from the first session stream that are not adjacent to the first session text, and determine the non-adjacent session texts as the third session texts that have no contextual relationship with the first session text.

[0023] Optionally, the second training module is further configured to: Construct a mask prediction dataset based on the first session text; The fourth text classification model is jointly trained based on the mask prediction dataset and the next sentence prediction dataset to obtain the third text classification model.

[0024] Optionally, the first prediction module is further configured to: Each first sample text is input into the input layer to obtain the representation vectors corresponding to multiple words in each first sample text; The representation vectors corresponding to the multiple words in each first sample text are input into the backbone network to obtain the semantic features corresponding to each first sample text. The semantic features corresponding to each first sample text are input into the classification head to obtain the first predicted category of each first sample text and the confidence level of the first predicted category of each first sample text.

[0025] Optionally, the first prediction module is further configured to: Through the input layer, the following processing is performed for each word in each of the first sample texts: Each word is subjected to word embedding processing to obtain the word vector corresponding to each word; Perform position embedding processing on each word to obtain the position vector corresponding to each word; Each word is subjected to language embedding processing to obtain the language vector corresponding to each word; Each word is subjected to paragraph embedding processing to obtain a paragraph vector corresponding to each word; The word vector, position vector, language vector, and paragraph vector corresponding to each word are fused to obtain the representation vector corresponding to each word.

[0026] Optionally, the first prediction module is further configured to: Each word is queried in a shared vocabulary to obtain a word vector corresponding to each word. Multiple words in the shared vocabulary that have the same meaning but different languages correspond to the same word vector.

[0027] According to one aspect of this application, a text classification apparatus is provided, comprising: The text acquisition module is used to acquire the text to be classified. The third prediction module is used to input the text to be classified into the second text classification model to obtain the fourth predicted category of the text to be classified. The second text classification model is trained using the training method described above.

[0028] According to one aspect of this application, a computer-readable storage medium is provided, the computer-readable storage medium storing a computer program adapted for loading by a processor to execute the above-described text classification model training method or text classification method.

[0029] According to one aspect of this application, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the above-described text classification model training method or text classification method.

[0030] According to one aspect of this application, a computer program product is provided, comprising a computer program stored in a storage medium. A processor of a computer device reads the computer program from the storage medium and executes the computer program to implement the training method or text classification method of the above-described text classification model.

[0031] In this application, multiple first sample texts are first obtained, and each first sample text is input into a first text classification model to obtain the first predicted category and the confidence level of the first predicted category for each first sample text. Then, based on the confidence level, difficult-to-classify texts that are difficult for the first text classification model to classify are selected, clarifying the core improvement direction for subsequent optimization of the first text classification model. Then, a pre-trained language model is introduced, and classification prompts and difficult-to-classify texts are input into the pre-trained language model to obtain the second predicted category and the first classification reason for the second predicted category of the difficult-to-classify texts. In this way, the second predicted category that fits the true semantics of the difficult-to-classify texts can be obtained by relying on the semantic understanding ability of the pre-trained language model. That is, the second predicted category can be regarded as the label category of the difficult-to-classify texts, and the classification prompts can guide the pre-trained language model to analyze and summarize the classification basis and semantic reasoning logic of the difficult-to-classify texts, thereby realizing the interpretability of the second predicted category. Based on the primary classification reason, similar texts that are similar to the difficult-to-classify text are recalled from multiple texts to be recalled. The texts to be recalled represent the first sample texts excluding the difficult-to-classify texts. This approach can recall texts with similar classification logic to the difficult-to-classify texts based on the classification reason, thereby improving the accuracy of recalling similar texts. Since similar texts are similar to difficult-to-classify texts, the same text classification logic can be applied; that is, the second predicted category can also be considered as the label category of similar texts. Finally, a first-classification sample is constructed based on the difficult-to-classify texts and the second predicted category, and a second-classification sample is constructed based on the similar texts and the second predicted category. A first-classification dataset is constructed based on the first-classification sample and the second-classification sample, and the first text classification model is trained on the first-classification dataset to obtain the second text classification model. This application can construct a first-classification dataset that is difficult for the first text classification model to classify, and perform targeted training on the first text classification model, thereby effectively improving the accuracy of text classification. Simultaneously, the entire process can achieve an automated closed loop from problem identification to problem resolution, without manual intervention, improving the training efficiency of the text classification model and enabling rapid iteration.

[0032] Other features and advantages of this application will be set forth in the following description and will be apparent in part from the description or may be learned by practicing the application. The objectives and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the description, claims and drawings. Attached Figure Description

[0033] The accompanying drawings are used to provide a further understanding of the technical solutions of this application and constitute a part of the specification. They are used together with the embodiments of this application to explain the technical solutions of this application and do not constitute a limitation on the technical solutions of this application. Figure 1 This is a schematic diagram of the architecture of the training system for the text classification model provided in this application embodiment; Figure 2 This is a schematic diagram of a scenario in which the training method of the classification model provided in this application is applied to a game session scenario; Figure 3 This is a schematic diagram of a scenario in which the training method of the text classification model provided in this application is applied to an instant messaging scenario; Figure 4 This is a schematic diagram of a scenario in which the training method of the text classification model provided in this application is applied to a social platform. Figure 5 This is a flowchart illustrating a training method for a text classification model provided in an embodiment of this application. Figure 6 This is a schematic diagram of the structure of the first text classification model provided in the embodiments of this application; Figure 7 This is a flowchart illustrating the process of determining similar texts based on the classification reasons of difficult-to-classify texts and texts to be recalled, as provided in an embodiment of this application. Figure 8 yes Figure 7 A specific flowchart of step 730 is shown; Figure 9 This is a schematic diagram of the process for determining the second semantic similarity provided in an embodiment of this application; Figure 10 This is another schematic diagram illustrating the process of determining the second semantic similarity provided in the embodiments of this application; Figure 11 This is a flowchart illustrating the process of training a third text classification model to obtain a first text classification model, as provided in an embodiment of this application. Figure 12 yes Figure 11 A flowchart illustrating step 1120 in the process; Figure 13 This is a schematic diagram of the process for determining adjacent conversation text provided in an embodiment of this application; Figure 14 This is another schematic diagram illustrating the process of determining adjacent conversation text provided in the embodiments of this application; Figure 15 yes Figure 11 A detailed flowchart of step 1110 in the process; Figure 16 This is a schematic diagram of the process for determining non-adjacent conversation text provided in an embodiment of this application; Figure 17 This is a flowchart illustrating a text classification method provided in an embodiment of this application; Figure 18 This is another flowchart illustrating the training method of the text classification model provided in this application embodiment; Figure 19 This is a schematic diagram of the structure of a training device for a text classification model provided in an embodiment of this application; Figure 20 This is a schematic diagram of the structure of the text classification device provided in the embodiments of this application; Figure 21 This is a schematic diagram of the structure of a terminal provided in an embodiment of this application; Figure 22 This is a schematic diagram of the structure of a server provided in an embodiment of this application. Detailed Implementation

[0034] To enable those skilled in the art to better understand the solutions of this application, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0035] It should be noted that while some processes described in the specification, claims, and accompanying drawings include multiple steps appearing in a specific order, it should be clearly understood that these steps may not be performed in the order they appear herein, or may be performed in parallel. The step numbers are merely used to distinguish different steps and do not themselves represent any execution order. Furthermore, descriptions such as "first," "second," or "objective" in this document are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. "Multiple" in this document refers to at least two.

[0036] It is worth noting that in the specific embodiments of this application, text and other related data are involved. When the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use, and processing of related data must comply with relevant laws, regulations, and standards. For example, when an embodiment of this application needs to obtain text data, the user's separate permission or consent can be obtained through pop-up windows or redirection to a confirmation page. After obtaining the user's separate permission or consent, the necessary text data for enabling the embodiment of this application to operate normally can then be obtained.

[0037] Text classification, a core task of natural language processing, aims to automatically map input text to a predefined category system and is widely used in various application scenarios. For example, in game conversation scenarios, it is necessary to identify whether the conversation text is normal or malicious; if it is malicious, then actions such as blocking or muting should be taken to maintain the game environment.

[0038] With the development of deep learning technology, text classification models are now commonly used for text classification. To improve text classification accuracy, these models need to be trained beforehand. However, current solutions rely on manual labeling of text categories, making the training phase of text classification models extremely time-consuming and labor-intensive, resulting in low training efficiency.

[0039] In view of this, embodiments of this application provide a training method for a text classification model. First, multiple first sample texts are acquired, and each first sample text is input into a first text classification model to obtain the first predicted category and the confidence level of each first sample text's first predicted category. Then, based on the confidence level, difficult-to-classify texts are selected, clarifying the core improvement direction for subsequent optimization of the first text classification model. Next, a pre-trained language model is introduced, and classification prompts and difficult-to-classify texts are input into the pre-trained language model to obtain the second predicted category and the first classification reason for the second predicted category. Thus, relying on the semantic understanding capability of the pre-trained language model, a second predicted category that closely matches the true semantics of the difficult-to-classify text can be obtained. That is, the second predicted category can be regarded as the label category of the difficult-to-classify text, and the classification prompts can guide the pre-trained language model to analyze and summarize the classification basis and semantic reasoning logic of the difficult-to-classify text, achieving interpretability of the second predicted category. Based on the primary classification reason, similar texts that are similar to the difficult-to-classify text are recalled from multiple texts to be recalled. The texts to be recalled represent the first sample texts excluding the difficult-to-classify texts. This approach can recall texts with similar classification logic to the difficult-to-classify texts based on the classification reason, thereby improving the accuracy of recalling similar texts. Since similar texts are similar to difficult-to-classify texts, the same text classification logic can be applied; that is, the second predicted category can also be considered as the label category of similar texts. Finally, a first-classification sample is constructed based on the difficult-to-classify texts and the second predicted category, and a second-classification sample is constructed based on the similar texts and the second predicted category. A first-classification dataset is constructed based on the first-classification sample and the second-classification sample, and the first text classification model is trained on the first-classification dataset to obtain the second text classification model. This application can construct a first-classification dataset that is difficult for the first text classification model to classify, and can train the first text classification model accordingly, thereby effectively improving the accuracy of text classification. Simultaneously, the entire process can achieve an automated closed loop from problem discovery to problem resolution, without manual intervention, improving the training efficiency of the text classification model and enabling rapid iteration. An example will be provided below.

[0040] Please see Figure 1 , Figure 1 This is a schematic diagram of a training system for a text classification model provided in this application embodiment. It includes a terminal 140, an Internet connection 130, a gateway 120, a server 110, etc.

[0041] Terminal 140 is a device used to display a text input interface, allowing users to input first sample text or text to be classified. It can take various forms, including desktop computers, laptops, tablets, PDAs (personal digital assistants), mobile phones, in-vehicle terminals, smart TVs, and dedicated terminals. Furthermore, it can be a single device or a collection of multiple devices. For example, multiple desktop computers connected via a local area network, sharing a single monitor, can work collaboratively to form terminal 140. Terminal 140 can communicate with the Internet 130 via wired or wireless means to exchange data. Users can include engineers training text classification models, algorithm developers of text classification models, maintenance personnel of text classification models, and users of conversational applications or software (e.g., gamers, social media users, smart device users, or conversation participants).

[0042] Server 110 refers to a computer system capable of providing certain services to terminal 140. Compared to ordinary terminal 140, server 110 has higher requirements in terms of stability, security, and performance. Server 110 can be a single high-performance computer in a network platform, a cluster of multiple high-performance computers, a portion of a single high-performance computer (e.g., a virtual machine), or a combination of portions of multiple high-performance computers (e.g., virtual machines). Server 110 can also communicate with the Internet 130 via wired or wireless means to exchange data.

[0043] Gateway 120, also known as an internetwork connector or protocol converter, is a computer system or device that acts as a translator, enabling network interconnection at the transport layer. It bridges the gap between two systems using different communication protocols, data formats, languages, or even completely different architectures. Gateways can also provide filtering and security functions. Messages sent from terminal 140 to server 110 are forwarded to the corresponding server 110 via gateway 120. Messages sent from server 110 to terminal 140 are also forwarded to the corresponding terminal 140 via gateway 120.

[0044] The embodiments of this application can be applied in various scenarios, such as Figure 2 The game session scene shown Figure 3 The instant messaging scenario shown Figure 4 The social media platform scenario shown.

[0045] (a) Game conversation scenarios Players of online games (e.g., MOBAs, turn-based games, open-world games) need to communicate via text for tactical communication, gameplay, and chat. Within the context of game conversations, many game terms (e.g., "team up," "tower push") have significantly different meanings compared to other contexts. Furthermore, there are instances where contextual combinations (e.g., sending three separate sentences containing "teammate," "really," "idiot") can create indirect malicious textual expressions. Therefore, it's crucial to specifically identify whether conversational text is normal or malicious. However, current solutions rely on manual labeling of text, making the training phase of text classification models for game conversations extremely time-consuming and inefficient. Therefore, a training method that improves the efficiency of model training for game conversations is needed.

[0046] The embodiments of this application can achieve this scenario. For example... Figure 2 As shown, firstly, multiple texts from the game session (e.g., texts containing game terms like "team up," "tower push," etc., and texts sent through contextual combinations such as "teammate," "really," and "idiot") are obtained from the game interface as multiple first sample texts. Each first sample text is input into a first text classification model to obtain the first predicted category and the confidence level of the first predicted category for each first sample text. Then, based on the confidence levels of the first predicted categories of the multiple first sample texts, the multiple first sample texts are filtered to obtain difficult-to-classify texts. Classification prompts and difficult-to-classify texts are then input into a pre-trained language model to obtain the second predicted category and the first classification reason for the second predicted category of the difficult-to-classify texts. The classification prompts are used to guide the pre-trained language model in text classification and classification reason inference. Next, based on the first classification reason, similar texts that are similar to the difficult-classified text are recalled from multiple texts to be recalled. The texts to be recalled represent the first sample texts other than the difficult-classified texts. A first-classification sample is constructed based on the difficult-classified text and the second predicted category, and a second-classification sample is constructed based on the similar texts and the second predicted category. A first-classification dataset is then constructed based on the first-classification sample and the second-classification sample. Finally, the first text classification model is trained on the first-classification dataset to obtain a second text classification model. Based on the second text classification model, it is used to identify whether the text to be classified in the game session is malicious. If the text to be classified is detected as malicious, actions such as hiding or issuing an alert are taken.

[0047] In this way, this application can construct a first classification dataset that is difficult for the first text classification model to classify in the context of game conversation scenarios, and conduct targeted training of the first text classification model based on the first classification dataset, thereby effectively improving the recognition accuracy of whether game conversation text is malicious text. Furthermore, the automated closed loop from discovering difficult-to-identify game conversation text to training the model for that game conversation text does not require manual intervention, which can improve the training efficiency of the text classification model in game conversation scenarios.

[0048] (ii) Instant Messaging Scenarios Instant Messaging (IM) refers to real-time online communication via the internet. Users (i.e., recipients) can send and receive text messages through devices (mobile phones, computers, tablets, etc.). Real-time text transmission methods include one-on-one private chats, group chats, group communication, and conversational streams, with virtually no latency. During instant messaging, users may modify text content (e.g., using homophones, abbreviations, sentence splitting, or symbol interference) to package and send malicious semantics. Related text classification models struggle to accurately classify such text, and due to the diverse methods of text modification and the limited sample size, it is difficult to specifically train existing text classification models.

[0049] The embodiments of this application can solve this problem. For example... Figure 3As shown, firstly, multiple text messages sent by the target (e.g., the following text messages contain malicious semantics: "disturbing the peace at midnight," "you really are," "noisier than a cuckoo," "loudspeaker," etc.) are obtained from the instant messaging interface and used as multiple first sample texts. Each first sample text is then input into a first text classification model to obtain the first predicted category and the confidence level of the first predicted category for each first sample text. Next, based on the confidence levels of the first predicted categories of the multiple first sample texts, the multiple first sample texts are filtered to obtain hard-classification texts. Classification prompts and hard-classification texts are then input into a pre-trained language model to obtain the second predicted category and the first classification reason for the second predicted category of the hard-classification texts. The classification prompts are used to guide the pre-trained language model in text classification and classification reason inference. Next, based on the first classification reason, similar texts similar to the difficult-classified text are recalled from multiple texts to be recalled. The texts to be recalled represent the first sample texts other than the difficult-classified texts. A first-classification sample is constructed based on the difficult-classified text and the second predicted category, and a second-classification sample is constructed based on the similar texts and the second predicted category. A first-classification dataset is constructed based on the first-classification sample and the second-classification sample. Finally, the first text classification model is trained on the first-classification dataset to obtain a second text classification model. The second text classification model is then used to detect whether the text to be classified contains malicious semantics (e.g., the following text information contains malicious semantics: "not sleeping at midnight", "Are you", "loudspeaker", etc.). If malicious text is detected, the user is muted based on the text to be classified.

[0050] This application can collect real-time messages from users' instant messaging interfaces as samples to construct a first-classification dataset that is difficult for the first text classification model to classify, and then train the first text classification model in a targeted manner, thereby effectively improving the accuracy of text classification. At the same time, it enables human participation in the entire training process, realizing an automated closed loop from problem discovery to problem solving, thereby improving the training efficiency of the text classification model and enabling rapid iteration.

[0051] (III) Social Platform Scenarios Social media platforms often feature content requiring interaction. For example, viewers can interact via text in the comments section of pre-recorded videos or articles. For live video streams, viewers can engage in real-time text interaction through public chat, live chat, comments, gifts, and live chat. However, text interactions on these platforms can contain sarcastic or malicious text related to specific events within the content, or malicious text pieced together by multiple viewers. Existing text classification models struggle to identify such malicious semantics, especially with small sample sizes or requiring contextual understanding.

[0052] The embodiments of this application can solve this problem. For example... Figure 4 As shown, for a live video stream, malicious text spliced from multiple viewers is obtained as a first sample text (e.g., viewer 1 (i.e., the object) sends comment 1 "The streamer is," viewer 2 sends comment 2 "Stupid," and viewer 3 sends comment 3 "Egg"). After obtaining multiple first sample texts, each first sample text is input into a first text classification model to obtain the first predicted category and the confidence level of the first predicted category of each first sample text. Based on the confidence levels of the first predicted categories of the multiple first sample texts, the multiple first sample texts are filtered to obtain hard-classified texts. Then, classification prompts and hard-classified texts are input into a pre-trained language model to obtain the second predicted category and the first classification reason of the second predicted category of the hard-classified texts. The classification prompts are used to guide the pre-trained language model to perform text classification and classification reason inference. Based on the first classification reason, similar texts similar to the hard-classified texts are recalled from multiple texts to be recalled. The texts to be recalled represent the first sample texts other than the hard-classified texts. Subsequently, a first classification sample is constructed based on the difficult-to-classify text and the second predicted category, and a second classification sample is constructed based on the similar text and the second predicted category. A first classification dataset is then constructed based on the first and second classification samples. The first text classification model is trained on this dataset to obtain the second text classification model. Finally, text sent by viewers to another live video stream is used as the text to be classified and sent to the second text classification model for detection. The detection result of the text to be classified is obtained. If the detection result indicates that the text to be classified is malicious, then actions such as retraction or hiding are taken.

[0053] For social media platforms, this application constructs a first-classification dataset that is difficult for the first text classification model to classify, and then trains the first text classification model specifically for this purpose. This effectively improves the accuracy of text classification and enables high-precision real-time detection of whether the text to be classified is malicious. Furthermore, the training process of the first text classification model is based on audience comments during actual social media platform applications, achieving an automated closed loop from problem discovery to problem resolution without human intervention. This improves the training efficiency of the text classification model and enables rapid iteration.

[0054] It is worth noting that, Figure 1 The training system for the text classification model shown and Figures 2 to 4The scenarios shown are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of text classification model training technology and the emergence of new business scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.

[0055] In this embodiment, the description will be from the perspective of the training device for the text classification model. Specifically, the training device for the text classification model can be integrated into a computer device that has a storage unit and is equipped with a microprocessor and has computing power. The computer device can be a server or a terminal, and there is no limitation on this.

[0056] Please see Figure 5 , Figure 5 This is a flowchart illustrating the training method of the text classification model provided in this application embodiment. The training method of the text classification model can be implemented by a server and / or a terminal. Figure 5 The training methods for the text classification model shown include: Step 510: Obtain multiple first sample texts; Step 520: Input each first sample text into the first text classification model to obtain the first predicted category of each first sample text and the confidence level of the first predicted category of each first sample text; Step 530: Based on the confidence scores of the first predicted categories of the multiple first sample texts, filter the multiple first sample texts to obtain the difficult classification texts; Step 540: Input the classification prompt words and the difficult classification text into the pre-trained language model to obtain the second predicted category of the difficult classification text and the first classification reason of the second predicted category of the difficult classification text. The classification prompt words are used to guide the pre-trained language model to perform text classification and classification reason inference. Step 550: Based on the first classification reason of the second predicted category of the difficult-classified text, recall similar texts that are similar to the difficult-classified text from multiple texts to be recalled. The texts to be recalled represent the first sample texts other than the difficult-classified text. Step 560: Construct a first classification sample based on the difficult-to-classify text and the second predicted category, construct a second classification sample based on the similar text and the second predicted category, and construct a first classification dataset based on the first classification sample and the second classification sample; Step 570: Train the first text classification model based on the first classification dataset to obtain the second text classification model.

[0057] First, steps 510-570 above will be described in detail.

[0058] In step 510, the type of the first sample text is not limited; it can be text obtained from content sharing scenarios (such as content sharing platforms), office scenarios (such as office systems), conversation scenarios, etc. Taking a conversation scenario as an example, the first sample text can be conversation text obtained in the conversation scenario. The first sample text can include at least one conversation text for training the first text classification model. The first sample text can include, but is not limited to, any of the following: a single conversation text, multiple conversation texts with a contextual relationship, or multiple conversation texts without a contextual relationship. For example, a single conversation text can be: Conversation Text 1 "My friend is very good at playing games." Multiple conversation texts with a contextual relationship can be: Conversation Text 1 "My friend is very good at playing games," Conversation Text 2 "I asked him to help me fight monsters." Multiple conversation texts without a contextual relationship can be: Conversation Text 1 "My friend is very good at playing games," Conversation Text 3 "You guys continue, I'll go get the takeout." It is worth noting that the conversation text involved in this application embodiment includes one or multiple sentences sent consecutively. A sentence refers to a statement sent by the object in a single instance. For example, conversation texts 1-3 above each contain only one sentence, while conversation text 4, "The weather is nice today, but unfortunately it's raining. Also, my teammates are terrible at games, I won't play with them next time," contains multiple sentences sent consecutively, specifically including the sentences "The weather is nice today, but unfortunately it's raining" and "Also, my teammates are terrible at games, I won't play with them next time." The first sample text can be obtained in batches from application scenarios such as instant messaging, live streaming, and game dialogues. Furthermore, it can be determined by concatenating or rewriting the collected conversation text; this application does not impose specific limitations.

[0059] In step 520, the first text classification model is a model capable of classifying conversational text, possessing certain text classification and confidence output capabilities. The text classification involved in this application embodiment includes, but is not limited to, sentiment analysis, topic / domain classification, sensitive information identification (such as determining whether text contains personal privacy), and malicious semantic classification. Malicious semantic classification will be used as an example in the following description. The first text classification model can be, but is not limited to, any of the following: XLM-BERT, BERT, XLM-RoBERTa, etc. Taking XLM-BERT as an example, the structure of the first text classification model can include: an input layer, a backbone network, and an output layer. The input layer is used to encode features of the text input to the first text classification model. The specific feature encoding process of the input layer can be as follows: The input layer determines the word vectors (TokenEmbedding), position vectors (PositionEmbedding), language vectors (LanguageEmbedding), and segment vectors (SegmentEmbedding) of the text input to the first text classification model. Then, the word vectors, position vectors, language vectors, and segment vectors are summed to obtain the representation vectors corresponding to multiple words in each first sample text. These representation vectors can be input into the backbone network. For example, the tokens belonging to conversation text 1 are labeled as SegmentA, and the tokens belonging to conversation text 2 are labeled as SegmentB. Multiple encoding layers (e.g., multi-layer bidirectional Transformer encoding layers) can be used as the backbone network, and an output layer can be connected above the top output of the multi-layer bidirectional Transformer encoding layers. The output layer can contain at least one classifier to output the confidence of the first predicted category and the first predicted category of each first sample text. Figure 6As shown, for example, a first text classification model may include an input layer, an output layer, and N Transformer encoding layers (e.g., N=8, 12, 64, etc.). Each Transformer encoding layer contains a self-attention layer based on a multi-head self-attention mechanism and a feedforward neural network. The self-attention layer, through a self-attention mechanism, can simultaneously capture the dependencies between each token (i.e., the token itself) and other tokens in the input text of the first text classification model (e.g., the first sample text), thereby capturing long-distance dependency features and understanding contextual semantics of the input text. Before performing text classification, the first text classification model can be pre-trained. The pre-training process may include, but is not limited to: selecting a first conversational text from a conversational scenario; selecting a second conversational text from the conversational scenario that has a contextual relationship with the first conversational text; combining the first and second conversational texts into a second sample text and obtaining the label category corresponding to the second sample text; constructing a third classification sample based on the second sample text and the label category, and constructing a second classification dataset based on the third classification sample; training the third text classification model based on the second classification dataset to obtain the first text classification model. The first predicted category is the result of the first text classification model classifying each first sample text. It can be binary classification, for example, selecting one from the candidate categories "normal" or "malicious" as the first predicted category; or it can be multi-class classification, for example, selecting one from the candidate categories "normal," "insult," "advertisement," "other malicious," etc. The confidence score of the first predicted category for each first sample text describes the reliability of each first sample text belonging to the first predicted category, and its value ranges from [0, 1]. Specifically, the closer the confidence score of the first predicted category for a first sample text is to 1, the more likely the first sample text belongs to the first predicted category. Conversely, the closer the confidence score of the first predicted category for a first sample text is to 0, the less likely the first sample text belongs to the first predicted category. After inputting a first sample text into the first text classification model, the model predicts the confidence scores of the first sample text belonging to various candidate categories and determines the candidate category with the highest confidence score as the first predicted category. The advantage of using the first text classification model for text classification is that it enables the first text classification model to quickly learn features for difficult samples and similar texts, thereby improving the incremental optimization capability of the first text classification model while ensuring its basic classification ability.

[0060] In step 530, the screening process refers to the process of filtering, sorting, etc. the first sample text based on confidence to obtain difficult classification texts. The screening methods may include but are not limited to at least one of the following: regarding the first sample text with a confidence within a specific confidence interval (e.g., [0, 0.55], [0.5, 0.6), [0.45, 0.65], etc.) as a difficult classification text, or selecting the top k (e.g., 5, 10, 20, etc.) first sample texts from low to high (or from high to low) confidence as difficult classification texts, etc. The difficult classification texts are the first sample texts that the first text classification model has difficulty in determining whether they are malicious texts, and may be homophone variant texts, interference symbol texts, semantic disassembled texts, ambiguous texts, or niche expression texts, etc. Among them, the homophone variant text refers to the text obtained by making homophone variations to part of the text. For example, modifying "永远的神" (forever god) into the initials abbreviation "yyds". The interference symbol text refers to the text with symbols inserted within the sentence, such as "笨#蛋" (stupid # egg). The semantic disassembled text refers to splitting a text with a complete semantics into multiple texts. For example, splitting 1 sentence into 8 sentences: "这" (this), "把" (hold), "真" (really), "是" (be), "一" (one), "群" (group), "笨" (stupid), "蛋" (egg). The ambiguous text refers to the text that has multiple interpretations itself or has different meanings in different contexts. For example, "对线" (lane) may refer to confronting the enemy on a certain route in the game scenario, while in the social platform scenario, it may refer to arguing or debating with someone. The niche expression text refers to the exclusive expression text that is only used in a specific group, vertical circle, dialect area, or small language, and is difficult to identify. For example, "憨憨" (silly) in Southwestern Mandarin, etc.

[0061] In step 540, the classification prompt is the text input by the user or input into the pre-trained language model based on a preset template, which is used to guide the pre-trained language model for text classification. The classification prompt may include but are not limited to at least one of the following: task description, input sample, knowledge base, or output requirement, etc. For example, the classification prompt can be "

Task description

[0062]

Input sample

Knowledge base

Output requirement

[0063] In step 550, after determining the first classification reason, similar texts similar to the difficult-classified text can be recalled from multiple texts to be recalled based on the first classification reason of the second predicted category of the difficult-classified text. For example, keywords belonging to the second predicted category of the difficult-classified text are extracted from the difficult-classified text based on the first classification reason. Then, keyword recognition is performed on the texts to be recalled, and the texts to be recalled containing the keywords are considered similar texts to the difficult-classified text. Another example is that classification prompts and texts to be recalled are input into a pre-trained language model to obtain the third predicted category of the texts to be recalled and the second classification reason for the third predicted category of each text to be recalled. The second classification reason and the first classification reason are matched. If the second classification reason and the first classification reason match, the text to be recalled corresponding to the second classification reason is considered similar text to the difficult-classified text. The matching process can be implemented through keyword recognition, semantic similarity calculation, etc. When the classification prompts contain descriptive text and attribution categories, the classification prompts can be constructed as follows: "[Output Requirements] Please output in the following format: 1. Judgment Reason: [Briefly explain the judgment basis]; 2. Misjudgment Attribution Analysis: Optional attribution types: A. Homophonic variant attack (e.g., bd → idiot); B. Disruptive symbol attacks (e.g., "idiot"). C. Semantic deconstruction attacks (e.g., splitting malicious words into multiple sentences); D. Ambiguity in context (e.g., a single sentence may seem normal, but multiple sentences combined may appear malicious); E. Misinterpretation of game slang (e.g., "laning" is a normal description in the game); F. Data for less commonly spoken languages is sparse; G. Other Reasons: [Detailed Explanation]”. The “Reason for Judgment” section outputs descriptive text. The text to be recalled represents the first sample text excluding the difficult-classified text. Using the first sample text excluding the difficult-classified text as the text to be recalled expands the sample size within limited data resources, thereby improving the utilization rate of the already collected data. Similar text is the text to be recalled that is similar to the difficult-classified text, sharing similarities. Recall is the operation of extracting similar text from the text to be recalled, which can be achieved through feature similarity calculation, attention mechanisms, etc. For example, the similarity between the features of the text to be recalled and the features of the difficult-classified text can be calculated, and the text to be recalled with high similarity can be selected as the recall sample. This method is typically used in scenarios where rapid determination of the recall sample is required. Alternatively, the recall sample and the difficult-classified text can be input into a model built based on an attention mechanism to obtain the result of whether the text to be recalled can be used as a recall sample. This method is typically used in scenarios where accurate determination of the recall sample is required.

[0064] In step 560, the first classification sample is the difficult-to-classify text after LLM annotation. The first classification sample can be constructed by annotating the difficult-to-classify text with the second predicted category. The second classification sample is the text obtained by annotating similar texts with the second predicted category of the difficult-to-classify text. The second classification sample can be constructed by annotating similar texts with the second predicted category of the difficult-to-classify text. The first classification dataset is a dataset containing both the first and second classification samples; that is, the second predicted category is used as the label category for the difficult-to-classify text, and the second predicted category is used as the label category for the similar texts.

[0065] In step 570, after constructing the first classification dataset, the first text classification model can be trained based on the first classification dataset to obtain the second text classification model. The second text classification model is a model obtained by training the first text classification model based on the first classification dataset, and it has the ability to classify texts that are difficult to classify.

[0066] The process of training the first text classification model based on the first classification dataset can involve adjusting all layers of the first text classification model, or it can employ a layered fine-tuning training strategy. A layered fine-tuning training strategy could involve freezing the parameters of the input layer and backbone network of the first text classification model and updating only the parameters of the output layer, or it could involve freezing the parameters of the input layer and multiple encoding layers of the first text classification model and updating the parameters of the output layer and at least one other encoding layer. This application does not impose any limitations on this approach.

[0067] In steps 510-570 above, multiple first sample texts are first obtained, and each first sample text is input into the first text classification model to obtain the first predicted category and the confidence level of the first predicted category for each first sample text. Then, based on the confidence level, difficult-to-classify texts that are difficult for the first text classification model to classify are selected, clarifying the core improvement direction for subsequent optimization of the first text classification model. Then, a pre-trained language model is introduced, and classification prompts and difficult-to-classify texts are input into the pre-trained language model to obtain the second predicted category and the first classification reason for the second predicted category of the difficult-to-classify texts. In this way, the second predicted category that fits the true semantics of the difficult-to-classify texts can be obtained by relying on the semantic understanding ability of the pre-trained language model. That is, the second predicted category can be regarded as the label category of the difficult-to-classify texts, and the classification prompts can guide the pre-trained language model to analyze and summarize the classification basis and semantic reasoning logic of the difficult-to-classify texts, realizing the interpretability of the second predicted category. Based on the primary classification reason, similar texts that are similar to the difficult-to-classify texts are recalled from multiple texts to be recalled. The texts to be recalled represent the first sample texts excluding the difficult-to-classify texts. This approach can recall texts with similar classification logic to the difficult-to-classify texts based on the classification reason, thereby improving the accuracy of recalling similar texts. Since similar texts are similar to difficult-to-classify texts, the same text classification logic can be applied; that is, the second predicted category can also be considered as the label category of similar texts. Finally, a first-classification sample is constructed based on the difficult-to-classify texts and the second predicted category, and a second-classification sample is constructed based on the similar texts and the second predicted category. A first-classification dataset is constructed based on the first-classification sample and the second-classification sample, and the first text classification model is trained on the first-classification dataset to obtain the second text classification model. This application can construct a first-classification dataset that is difficult for the first text classification model to classify, and perform targeted training on the first text classification model, thereby effectively improving the accuracy of text classification. Simultaneously, the entire process can achieve an automated closed loop from problem identification to problem resolution, without manual intervention, improving the training efficiency of the text classification model and enabling rapid iteration.

[0068] like Figure 7 As shown, in some embodiments, the process of recalling similar texts similar to the difficult-to-classify text from a plurality of texts to be recalled based on the first classification reason of the second predicted category of the difficult-to-classify text includes a process 700 of determining similar texts based on the classification reasons of the difficult-to-classify text and the texts to be recalled. The process 700 of determining similar texts based on the classification reasons of the difficult-to-classify text and the texts to be recalled includes: Step 710: Input the classification prompt words and each text to be recalled into the pre-trained language model to obtain the third predicted category of each text to be recalled and the second classification reason of the third predicted category of each text to be recalled. Step 720: Match the first classification reason of the second predicted category of the difficult-to-classify text and the second classification reason of the third predicted category of each text to be recalled to obtain the classification reason matching result between the difficult-to-classify text and each text to be recalled; Step 730: Based on the classification cause matching results between the difficult-classified text and multiple texts to be recalled, recall similar texts that are similar to the difficult-classified text from among the multiple texts to be recalled.

[0069] Steps 710-730 are described in detail below.

[0070] In step 710, the third predicted category is the result obtained by the pre-trained language model from classifying the text to be recalled, which is similar to the second predicted category. It can be binary or multi-class classification. For specific implementation, refer to the embodiments related to the second predicted category. The second classification reason refers to the classification basis and semantic reasoning logic given by the pre-trained language model when determining the third predicted category of the text to be recalled. It is used to describe the reason why the text to be recalled is determined to have the third predicted category. The classification prompt words and each text to be recalled are input into the pre-trained language model to obtain the third predicted category of each text to be recalled and the second classification reason of the third predicted category of each text to be recalled. The process is the same as the specific embodiments for determining the second predicted category and the first classification reason described above.

[0071] In step 720, the matching process refers to comparing the semantic features and / or classification logic of the first classification cause and the second classification cause, and obtaining the classification cause matching result between the difficult-to-classify text and each text to be recalled. Taking the first classification cause and the second classification cause as a descriptive text as an example, the matching process can be as follows: First, for the difficult sample, extract the semantic features of the first classification cause to obtain the first classification cause vector; and for each text to be recalled, extract the semantic features of the second classification cause to obtain the second classification cause vector corresponding to each text to be recalled. Then, calculate the similarity between the first classification cause vector and each second classification cause vector to obtain the classification cause similarity between the difficult-to-classify text and each text to be recalled. Next, use the classification cause similarity between the difficult-to-classify text and each text to be recalled as the classification cause matching result between the difficult-to-classify text and each text to be recalled. Taking classification logic as an example, the classification cause matching result between the difficult-to-classify text and each text to be recalled can include: "match" or "not match". When the primary and secondary causes are attribution categories, secondary causes that are the same as or partially the same as the attribution category of the primary cause can be filtered out, and the classification cause matching result between the difficult-to-classify text and each text to be recalled is set to "match". For secondary causes that are completely different from the attribution category of the primary cause, the classification cause matching result between the difficult-to-classify text and each text to be recalled is set to "not match". Please refer to the following text for a detailed description of the primary and secondary cause vectors.

[0072] In step 730, after determining the classification cause matching result between the difficult-to-classify text and each text to be recalled, similar texts similar to the difficult-to-classify text can be recalled from among the multiple texts to be recalled based on the classification cause matching results between the difficult-to-classify text and the multiple texts to be recalled. Taking classification cause similarity as an example, similar texts can be determined by combining the first semantic similarity between the difficult-to-classify text and the texts to be recalled, as well as the classification cause similarity. Alternatively, texts to be recalled that are higher than a preset classification cause similarity threshold can be directly used as similar texts. Or, the texts to be recalled can be sorted from high to low according to the classification cause similarity to obtain a cause similarity sequence, and a specific number (e.g., 50, 100, etc.) of texts to be recalled can be selected sequentially from the first text in the cause similarity sequence as similar texts. Taking attribution categories as an example, texts whose classification cause matching results indicate a "match" can be directly used as similar texts. Alternatively, for texts whose classification cause matching results indicate a "match," the first semantic similarity between the hard-classified text and the text to be recalled can be updated to obtain a second semantic similarity, and similar texts can be selected from the text to be recalled based on the second semantic similarity. Please refer to the following text for a detailed description of the first and second semantic similarities.

[0073] Steps 710-730 of this application match the first classification reason of the second predicted category of the difficult-to-classify text and the second classification reason of the third predicted category of each text to be recalled, to obtain the classification reason matching result between the difficult-to-classify text and each text to be recalled. This can perform deep matching between the difficult-to-classify text and each text to be recalled based on the classification logic and characteristics of malicious semantics, rather than traditional literal or shallow semantic matching. In this way, in the process of recalling similar texts based on the classification reason matching result, it can effectively filter texts to be recalled that are literally similar but have different classification logic, or have the same classification logic but have large literal differences, thereby improving the accuracy of similar text selection.

[0074] like Figure 8 As shown, in some embodiments, step 730 includes: Step 810: Determine the first semantic similarity between the difficult-classified text and each text to be recalled based on the first semantic features extracted from the difficult-classified text by the first text classification model and the second semantic features extracted from each text to be recalled by the first text classification model. Step 820: Based on the classification cause matching results between the difficult-classified text and each text to be recalled, update the first semantic similarity between the difficult-classified text and each text to be recalled to obtain the second semantic similarity between the difficult-classified text and each text to be recalled. Step 830: Based on the second semantic similarity between the difficult-classified text and multiple texts to be recalled, recall similar texts that are similar to the difficult-classified text from among the multiple texts to be recalled.

[0075] Steps 810-830 are described in detail below.

[0076] In step 810, the first semantic feature is a semantic feature extracted from the difficult-to-classify text, used to describe the deep semantic characteristics of the difficult-to-classify text. The process of extracting the first semantic feature and the second semantic feature can be implemented through a first text classification model. For example, the difficult-to-classify text is input into the first text classification model, semantic features are extracted through a multi-layer bidirectional Transformer encoding layer, and the top-level output of the multi-layer bidirectional Transformer encoding layer is used as the first semantic feature. The second semantic feature is a semantic feature extracted from the text to be recalled, used to describe the deep semantic characteristics of the text to be recalled. The extraction process of the second semantic feature is similar to the extraction process of the first semantic feature, and will not be described again in this application. The first semantic similarity is the result of calculating the similarity between the first semantic feature and the second semantic feature, and the value range can be [0,1]. The calculation method of the first semantic similarity can include, but is not limited to, any of the following: cosine similarity, Euclidean distance, Pearson correlation coefficient, etc., and this application does not impose any restrictions.

[0077] In step 820, updating the first semantic similarity between the difficult-classified text and each text to be recalled involves modifying or weighting the first semantic similarity based on the classification cause matching result. For example, if the classification cause matching result indicates that the difficult-classified text matches a text to be recalled, the first semantic similarity between the two texts is increased. If the classification cause matching result indicates that the difficult-classified text does not match a text to be recalled, the first semantic similarity between the two texts is decreased or left unchanged. The second semantic similarity is obtained after updating the first semantic similarity. It takes into account both the semantic and classification cause similarities between the difficult-classified text and the text to be recalled, and can comprehensively reflect the similarity between them. The value range of the second semantic similarity can be [0,1], and its calculation method is similar to that of the first semantic similarity.

[0078] In step 830, similar texts similar to the difficult-classified text can be recalled from among the multiple texts to be recalled based on the second semantic similarity between the difficult-classified text and each of the multiple texts to be recalled. Specifically, the texts to be recalled can be sorted from high to low according to the second semantic similarity to obtain a sequence of texts to be recalled. Then, the texts ranked in the top k positions of the sequence are selected as similar texts, where k is a positive integer greater than 0 and less than the total number of texts to be recalled. The specific value of k can be a constant or a specific multiple of the number of difficult samples (e.g., 3-5 times). Alternatively, the specific implementation process can be: selecting texts to be recalled with a second semantic similarity greater than a preset similar text selection threshold as similar texts, where the preset similar text selection threshold is between 0 and 1, for example, any value between 0.75 and 0.85.

[0079] Steps 810-830 of this application, by fusing the semantic features between the difficult-classified text and the text to be recalled, as well as the classification reason matching results between the difficult-classified text and each text to be recalled, can combine the degree of similarity between the difficult-classified text and the text to be recalled in terms of semantics and classification reasons during the process of recalling similar texts. This enables the selection of similar texts to no longer rely solely on shallow semantic features, but also on the malicious semantic type to which the difficult-classified text belongs, ultimately improving the recall accuracy.

[0080] In some embodiments, step 820 includes: When the classification reason matching result between the difficult classification text and each text to be recalled is a successful match, the first semantic similarity between the difficult classification text and each text to be recalled is amplified to obtain the second semantic similarity between the difficult classification text and each text to be recalled. Training methods for text classification models also include: When the classification reason matching result between the difficult classification text and each text to be recalled fails, the first semantic similarity between the difficult classification text and each text to be recalled is determined as the second semantic similarity between the difficult classification text and each text to be recalled.

[0081] A successful match between the classification cause matching result of the difficult-to-classify text and each text to be recalled means that the classification cause matching result indicates a "match". Augmentation processing refers to calculation operations such as weighting, amplifying, or increasing the first semantic similarity. Augmentation processing of the first semantic similarity can be achieved by: pre-setting a similarity augmentation value for the texts to be recalled whose classification cause matching result indicates a "match"; then, when it is necessary to augment the first similarity, adding or multiplying the first semantic similarity between the difficult-to-classify text and the texts to be recalled based on the similarity augmentation value to obtain the second semantic similarity between the difficult-to-classify text and the texts to be recalled; and selecting similar texts from the texts to be recalled based on the second semantic similarity. Specifically, if the similarity augmentation value and the first semantic similarity are multiplied, the similarity augmentation value is greater than 1.

[0082] A failure to match the classification cause between a difficult-to-classify text and each text to be recalled means that the classification cause matching result indicates "no match". When a failure to match the classification cause between a difficult-to-classify text and each text to be recalled means that the first semantic similarity between the difficult-to-classify text and each text to be recalled is directly determined as the second semantic similarity between the difficult-to-classify text and each text to be recalled.

[0083] like Figure 9 As shown, taking difficult-to-classify text 1, text to be recalled 1, and text to be recalled 2 as examples, the first semantic similarity 1 between difficult-to-classify text 1 and text to be recalled 1 is 0.77, and the first semantic similarity 2 between difficult-to-classify text 1 and text to be recalled 2 is 0.72. The classification reason matching result 1 between classified text 1 and text to be recalled 1 is "not matched", and the classification reason matching result 2 between classified text 1 and text to be recalled 2 is "matched". The similarity increment value is 0.2. The update method is to add the first semantic similarity 2 and the similarity increment value, without modifying the first semantic similarity 1. After updating the first semantic similarity 2, the second semantic similarity 2 between difficult-to-classify text 1 and text to be recalled 2 is 0.92, and the second semantic similarity 1 is the same as the first semantic similarity 1, which is 0.77. Finally, since the semantic similarity of the second semantic similarity 2 is high, the text to be recalled 2 corresponding to the second semantic similarity 2 is selected as the similar text.

[0084] In other embodiments, when the classification cause matching result between the difficult-to-classify text and each text to be recalled fails, the first semantic similarity between the difficult-to-classify text and each text to be recalled can be reduced to obtain a second semantic similarity between the difficult-to-classify text and each text to be recalled. Reduction processing refers to calculation operations such as weighting, reducing, or decreasing the first semantic similarity. The method for reducing the first semantic similarity can be as follows: a similarity reduction value is pre-set for the text to be recalled whose classification cause matching result indicates "non-match". Then, when the first similarity needs to be reduced, the first semantic similarity between the difficult-to-classify text and the text to be recalled is subtracted or multiplied according to the similarity reduction value to obtain the second semantic similarity between the difficult-to-classify text and the text to be recalled. Similar texts are then selected from the text to be recalled based on the second semantic similarity. Wherein, if the similarity reduction value and the first semantic similarity are multiplied, the similarity increase value is a positive number less than 1. Specifically, if the result of updating the first similarity based on the similarity increase value is greater than 1, then 1 is taken as the second semantic similarity; if the result of updating the first similarity based on the similarity decrease value is less than 0, then 0 is taken as the second semantic similarity.

[0085] like Figure 10 As shown, taking the aforementioned difficult-to-classify text 1, recall text 1, and recall text 2 as examples, the similarity reduction value is 0.1. The update method is to add the first semantic similarity 1 and the similarity increase value, and subtract the first semantic similarity 2 and the similarity reduction value. After updating the first semantic similarity 1, the second semantic similarity 1 between difficult-to-classify text 1 and recall text 1 is 0.67, and the second semantic similarity 2 between difficult-to-classify text 1 and recall text 2 is 0.92. Finally, since the semantic similarity of the second semantic similarity 2 is higher, the recall text 2 corresponding to the second semantic similarity 2 is selected as the similar text.

[0086] The advantage of this application's embodiments lies in that, by amplifying the first semantic similarity between the difficult-classified text and each text to be recalled, which successfully matches the difficult-classified text in terms of classification reasons, a second semantic similarity is obtained between the difficult-classified text and each text to be recalled. Furthermore, for texts to be recalled that fail to match the difficult-classified text in terms of classification reasons, the first semantic similarity between the difficult-classified text and each text to be recalled is determined as the second semantic similarity between the difficult-classified text and each text to be recalled. This amplification process strengthens the first semantic similarity of texts to be recalled that share the same classification logic as the difficult-classified text, thereby highlighting the correlation between similar samples. Moreover, by not strengthening the first semantic similarity of texts to be recalled that fail to match, it prevents texts of different semantic categories from being misjudged as similar texts, ultimately improving the accuracy and reliability of the recall process.

[0087] In some embodiments, step 720 includes: Embed the first classification reason of the second predicted category of the difficult-classified text to obtain the first classification reason vector corresponding to the difficult-classified text; The second classification reason of the third predicted category of each text to be recalled is embedded to obtain the second classification reason vector corresponding to each text to be recalled; Vector matching is performed on the first classification cause vector corresponding to the difficult-to-classify text and the second classification cause vector corresponding to each text to be recalled to obtain the classification cause matching result between the difficult-to-classify text and each text to be recalled.

[0088] Embedding processing refers to the process of converting natural language text into low-dimensional dense vectors. The process of embedding the first classification cause of the second predicted category of difficult-to-classify text to obtain the first classification cause vector corresponding to the difficult-to-classify text can be achieved by inputting the first classification cause vector into a text encoder or text embedding model, such as a BERT encoder, XLM-BERT cross-language text encoder, Sentence-BERT (SBERT) sentence embedding model, RoBERTa text encoder, or the encoding layer built into the first text classification model. The first classification cause vector is the vector obtained after embedding the first classification cause and is used to represent the cause of the second predicted category of the difficult-to-classify text in the feature space. The second classification cause vector is the vector obtained after embedding the second classification cause and is used to represent the cause of the third predicted category of the text to be recalled in the feature space. Embedding the second classification cause of the third predicted category of each text to be recalled to obtain the second classification cause vector corresponding to each text to be recalled is similar to the method of determining the first classification cause vector, and will not be elaborated further in this application. Vector matching processing refers to the process of determining whether the first classification cause vector and the second classification cause vector match, which can be achieved by calculating vector similarity. The specific implementation process is as follows: Calculate the similarity between the first and second category cause vectors (e.g., cosine similarity, Euclidean distance, Pearson correlation coefficient, etc.), and then compare the similarity with a preset vector matching threshold. If the similarity between the first and second category cause vectors is higher than the vector matching threshold, the category cause matching result is determined to be a successful match; otherwise, the category cause matching result is determined to be a failed match. The category cause matching result is used to describe whether the first and second category cause vectors match successfully. Its specific form can be a string or a number, for example, the string "match" or the number "1" indicates that the first and second category cause vectors match successfully, and the string "no match" or the number "0" indicates that the first and second category cause vectors fail to match.

[0089] Furthermore, if the pre-trained language model first determines the first classification cause in vector form when predicting the second predicted category, and then converts it into textual form, the vector form of the first classification cause can be directly output from the pre-trained language model as the first classification cause vector. Similarly, the vector form of the second classification cause determined by the pre-trained language model when predicting the third predicted category can also be used as the second classification cause vector. Based on this, vector matching is directly performed on the first classification cause vector corresponding to the difficult-to-classify text and the second classification cause vector corresponding to each text to be recalled to obtain the classification cause matching results between the difficult-to-classify text and each text to be recalled.

[0090] The advantage of this application's embodiments is that by converting the first and second classification reasons into vectors and performing vector matching, the classification reasons of difficult-to-classify texts and all texts to be recalled can be represented in the same vector space. This enables classification reason matching based on deep semantics, thereby eliminating matching biases caused by differences in text expression, rather than simply matching classification reasons based on literal meaning. It can effectively identify texts to be recalled that have different expressions from difficult-to-classify texts but share the same deep-seated logic in their classification reasons, thus significantly improving the accuracy of recalling similar texts.

[0091] In some embodiments, step 720 includes: Literal matching is performed on the first classification reason of the second predicted category of the difficult-to-classify text and the second classification reason of the third predicted category of each text to be recalled to obtain the classification reason matching result between the difficult-to-classify text and each text to be recalled.

[0092] Literal matching processing refers to the direct comparison of literal matching conditions such as characters, keywords, and sentence structures between the text of the first category cause and the text of the second category cause. For example, does the first and second category cause use the same homophones or symbols? Do they both contain the key description "split to send malicious content" or "contextually implies attack intent"? Or do they both contain the sentence structure "split... into..."? Furthermore, to facilitate literal matching processing, specific literal matching conditions in the text to be recalled can be stored in a literal matching condition library. For example, each second category cause text can be treated as a literal matching condition and stored in the library; common characters, keywords, and sentence structures can be extracted from the first category cause of difficult-to-classify text and the second category cause of similar text and stored as literal matching conditions in the library; other methods can also be used to set literal matching conditions in the library.

[0093] The advantage of this application's embodiments lies in that, by performing literal-level matching of the classification reasons of the difficult-to-classify text and the text to be recalled, it is possible to quickly recall similar texts with the same malicious semantic type as the difficult-to-classify text, without requiring deep reasoning, thus reducing the consumption of computational resources. Simultaneously, the recall prompts can be flexibly adapted to different scenarios such as games, instant messaging, and social platforms, guiding the model to focus on scenario-based similarity standards and ensuring that the recalled similar texts are highly similar to the difficult-to-classify texts in terms of the logical similarity of their classification reasons.

[0094] In some embodiments, the training method for the text classification model further includes: The recall prompts, difficult-classified texts, and multiple texts to be recalled are input into a pre-trained language model to obtain similar texts that are similar to the difficult-classified texts. Among them, recall prompts are used to guide the pre-trained language model to recall texts from multiple texts to be recalled based on the difficult-to-classify texts.

[0095] Recall prompts are words used to guide a pre-trained language model to recall texts from multiple texts to be recalled based on difficult-to-classify texts. Recall prompts can be constructed as follows: "

Task Description

[0096] [Input Sample] Text to be recalled: {Text content of the text to be recalled}; Difficulty-categorized text: {The text content of the difficult-categorized text}; Output Requirements Please output in the following format: 1. Similar text: [Text content of similar text]". Based on this, after inputting the recall prompts, hard-classified texts, and multiple texts to be recalled into the pre-trained language model, the pre-trained language model performs deep feature extraction, association, and comparison based on the hard-classified texts and each text to be recalled. It removes texts to be recalled that are logically unrelated to the malicious semantic type of the hard-classified texts, and selects and outputs similar texts of the same category as the hard-classified texts.

[0097] The advantages of this application's embodiments lie in the fact that, by leveraging the semantic understanding and logical reasoning capabilities of a pre-trained language model to recall similar texts, it can ensure that similar texts match in terms of malicious semantic types based on semantic understanding of the text content. This, in turn, ensures that the recalled similar texts are highly consistent with the core features of malicious semantic types and the logical reasons for classification of difficult-to-classify texts, thus improving the accuracy of the recall. Furthermore, the pre-trained language model enables automated batch recall of similar texts without manual intervention, thereby improving the efficiency and automation of the recall process.

[0098] It is worth noting that this embodiment can be implemented in conjunction with step 550 above. That is, the similar text in step 560 includes both similar texts similar to the difficult-classified text obtained by inputting the recall prompt words, the difficult-classified text, and multiple texts to be recalled into the pre-trained language model, and similar texts similar to the difficult-classified text recalled from multiple texts to be recalled based on the first classification reason of the second predicted category of the difficult-classified text. In other words, taking the union of the similar texts recalled through the two methods helps to expand the size of the first classification dataset. Of course, in some embodiments, the intersection of the similar texts recalled through the two methods can also be taken to further ensure the similarity between the final similar texts and the difficult-classified texts.

[0099] In some embodiments, the first text classification model includes an input layer, a backbone network, and an output layer. The output layer includes at least one classification head, and the backbone network includes N cascaded encoding layers, where N is an integer greater than 2. All modules in the first text classification model can be trained, or only some modules in the first text classification model can be incrementally fine-tuned. Step 570 includes: Based on the first classification dataset, the encoding layer and classification head in the first text classification model are trained to obtain the second text classification model; The coding layers to be trained include the nth coding layer to the Nth coding layer, where n is an integer greater than 1 and not exceeding N.

[0100] The input layer encodes the text input to the first text classification model, transforming it into a vector structure capable of semantic feature extraction. In this embodiment, the input layer processes the text input to the first text classification model as follows: First, it determines the token embedding and position embedding for each word in the text. Then, it sums the token embedding and position embedding to obtain the representation vector for each word. The representation vectors corresponding to multiple words in the text are then input into the backbone network. When calculating the representation vector for each word, in addition to referencing the token embedding and position embedding, it can also refer to the language embedding and / or segment embedding. The backbone network is the core feature extraction module of the first text classification model, used to transform the text input to the first text classification model into semantic features usable for classification. The backbone network consists of N cascaded encoding layers, where N is an integer greater than 2. The encoding layer is the basic building block of the backbone network. It can employ a Transformer structure to extract semantic features from the text input to the first text classification model layer by layer. The lower-level encoding layers extract basic semantics (such as character and lexical features), while the higher-level encoding layers extract deeper semantics (such as contextual relationships and the logic of malicious semantics). The encoding layers to be trained are those requiring parameter fine-tuning, including the nth to Nth encoding layers, where n is an integer greater than 1 and not exceeding N. The classification head is the unit in the first text classification model used to classify text based on semantic features and output the predicted category; it is located in the output layer of the first text classification model. Taking N=12 as an example, when training the encoding layers to be trained and the classification head of the first text classification model based on the first classification dataset to obtain the second text classification model, the last 6 Transformer encoding layers can be used as the encoding layers to be trained. Then, the input layer and the first 6 Transformer encoding layers of the first text classification model are frozen, and only the parameters of the classifier in the model's output layer and the encoding layers to be trained are updated. During training, a mini-batch gradient descent optimization method can be used, setting the batch size to 32 to 64 and the learning rate to [missing information]. to The training epochs are set to 1 to 2.

[0101] The advantage of this application embodiment is that by training only the encoding layer and the output layer classification head used for classification that are logically related to context and malicious semantics, the first text classification model can quickly learn features for difficult samples and similar texts. This improves the incremental optimization capability of the first text classification model while saving the time and computing resources required to train the first text classification model. In this way, the model iteration speed can be increased from "weekly" to "dayly" or even "hourly".

[0102] In some embodiments, text classification, as a core task of natural language processing, aims to automatically map input text to a preset category system and is widely used in various application scenarios. For example, in game conversation scenarios, it is necessary to identify whether the conversation text is normal or malicious; if it is malicious, then blocking or muting should be performed to maintain the game environment. With the development of deep learning technology, text classification models are now commonly used to implement text classification.

[0103] In the solutions provided by related technologies, text classification models typically classify the text of each sentence sent in a conversational scenario. However, in real-world applications, text expressions are diverse and semantic structures are complex. For example, in game conversations, situations such as "sentence-segmentation attacks" and "sentence-skipping insults" frequently occur. The solutions provided by related technologies are prone to misjudging these situations (e.g., misclassifying malicious intent as normal), resulting in low accuracy in text classification.

[0104] Based on this, such as Figure 11 As shown, before step 510, this application further includes a process 1100 of training a third text classification model to obtain a first text classification model. During the training of the third text classification model, all modules in the third text classification model can be trained. The process 1100 of training the third text classification model to obtain the first text classification model includes: Step 1110: Select the first conversation text from the conversation scenario; Step 1120: Select a second conversation text from the conversation scenario that has a contextual relationship with the first conversation text; Step 1130: Combine the first conversation text and the second conversation text into the second sample text, and obtain the tag category corresponding to the second sample text; Step 1140: Construct a third category sample based on the second sample text and label category, and construct a second category dataset based on the third category sample; Step 1150: Train the third text classification model based on the second classification dataset to obtain the first text classification model.

[0105] Steps 1110-1150 are described in detail below.

[0106] In step 1110, a conversation scenario refers to a scenario of conversational text interaction involving multiple rounds of interaction and contextual association. It is the application scenario in which the conversational text exists. For example, a conversation scenario could be an online game conversation, an instant messaging conversation, a video bullet screen, or a webpage comment section. For instance, if the first conversational text is text within a conversation scenario, it could be a single sentence within the conversation scenario, or it could be a combination of multiple sentences sent consecutively within the conversation scenario.

[0107] In step 1120, similar to the first conversation text, the second conversation text can be a single sentence sent in the conversation scenario, or it can be a combination of multiple sentences sent consecutively in the conversation scenario. The connection between the second sample text and the first sample text lies in their contextual relationship. This contextual relationship refers to a logical association. For example, if the second and first conversation texts are sent by different objects, they can be consecutive in time. If they are sent by the same object, they are not necessarily consecutive in time, but they are consecutive for that object. For example, there may be conversation texts sent by other objects between the second and first conversation texts sent by the same object. The second conversation text can contain one sentence or multiple sentences. The specific implementation of selecting the second conversation text from the conversation scenario is similar to the specific implementation of selecting the first conversation text from the conversation scenario, and will not be repeated here.

[0108] In step 1130, the first and second conversation texts can be combined into a second sample text, and the label category corresponding to the second sample text can be obtained. The label category corresponding to the second sample text is used to represent the category to which the second sample text as a whole belongs. Taking malicious semantic classification as an example, the label category can be normal or malicious. The label category corresponding to the second sample text can be labeled manually or by a pre-trained language model. For example, the second sample text and context classification prompts can be input into a pre-trained language model to obtain the label category corresponding to the second sample text. The context classification prompts are such as "You are a conversation text semantic classification expert. Given conversation texts with contextual logical relationships, please combine the overall semantics of these conversation texts and the conversation scene context to make an overall classification judgment and output the label category. Optional categories include...". The label category corresponding to the second sample text can be represented by a string (e.g., "normal semantics") or a number. Specifically, the label category corresponding to the second sample text in numerical form can be represented by {text category}. For example, if the value of the text category is "1", it means that the second sample text represents normal semantic conversation text, and if the value of the text category is "0", it means that the second sample text represents malicious semantic conversation text.

[0109] In step 1140, the third-class samples are samples composed of the second-class sample text and the corresponding label categories. The second-class dataset is a dataset constructed from multiple third-class samples, used to train the third-class text classification model and improve its ability to classify text by incorporating contextual relationships.

[0110] In step 1150, the third text classification model is trained based on the second classification dataset to obtain the first text classification model. Existing text classification models can only understand the context of a single sentence and have difficulty understanding the context of multiple sentences, making it difficult to identify malicious semantic types such as "sentence segmentation attacks," "sentence-separated insults," and "conversation stream malice." Among them, "sentence segmentation attacks" refer to splitting malicious words into multiple sentences and sending them, for example, splitting "this," "put," "real," "is," "one," "group," "stupid," and "egg" into eight sentences and sending them. "Intermittent insults" refers to inserting obscure malicious content into normal sentences, such as "This game had a terrible rhythm," "A teammate was AFK the whole time," and "The overall operation was completely terrible." Here, "This game had a terrible rhythm" and "The overall operation was completely terrible" are malicious semantics, while "A teammate was AFK the whole time" is a normal semantic description of a teammate's performance. "Conversational flow malice" refers to gradually constructing malicious semantics through multiple rounds of dialogue, such as "I tried my best in this game," "But my teammates," "They're really a bunch of idiots," and "Idiots." The aforementioned conversational texts with contextual relationships can be used as second sample texts, and a second classification dataset can be constructed based on these second sample texts. Then, a third text classification model can be trained using this dataset to improve the third text classification model's ability to understand the contextual relationships of multiple sentences and its ability to classify text based on contextual relationships. The training process for the third text classification model can be as follows: input the third classification samples into the third text classification model, make the third text classification model perform a classification task, obtain the output training classification result, determine the loss value for training the third text classification model based on the training classification result and the label category corresponding to the second sample text, and train the third text classification model based on the classification loss value.

[0111] In some embodiments, training the third text classification model can be a full fine-tuning process, that is, training all modules in the third text classification model. This can significantly improve the model's basic capabilities.

[0112] In this embodiment, steps 1110-1150 first select a first conversational text from the conversational context. Then, based on the first conversational text, a second conversational text with a contextual relationship to the first conversational text is selected from the conversational context. The first and second conversational texts are combined to form a second sample text, and the label category corresponding to the second sample text is obtained. Next, a third classification sample is constructed based on the second sample text and its corresponding label category, and a second classification dataset is constructed based on the third classification sample. Finally, the third text classification model is trained on the second classification dataset to obtain the first text classification model. Therefore, the third text classification model can learn the ability to understand contextual relationships and classify text based on contextual relationships during the training process, based on the third classification samples in the second classification dataset.

[0113] like Figure 12 As shown, in some embodiments, the first session text belongs to the first session stream in a session scenario. The first session stream is a sequence of consecutive session texts ordered by sending time within a selected dialogue channel. For example, in a game session scenario, a session stream is all the sessions in a single game match; or, in an instant messaging scenario, a session stream can refer to all the sessions between two accounts, or all the sessions in a group. Step 1120 includes: Step 1210: Select the adjacent conversation text that is adjacent to the first conversation text from the first conversation stream; Step 1220: Determine the transmission time interval between the first session text and the adjacent session text; Step 1230: When the sending time interval is less than the time threshold, the adjacent session text is determined as the second session text that has a contextual relationship with the first session text.

[0114] Steps 1210-1220 are described in detail below.

[0115] In step 1210, adjacent session texts adjacent to the first session text are selected from the first session stream. Adjacent session texts refer to session texts that are consecutive to the first session text in terms of transmission time. Adjacent session texts can be session texts transmitted before the first session text, session texts transmitted after the first session text, or a combination of both. The number of adjacent session texts is not limited; there can be one or more. Each session text in the first session stream is numbered according to the chronological order of its transmission time. Based on this numbering, adjacent session texts adjacent to the first session text can be selected. For example... Figure 13As shown, the first session stream 1 includes first session text 1, session text 2, session text 3 and session text 4. The sequence number of first session text 1 in the first session stream is 19, the sequence number of session text 2 in the first session stream is 10, the sequence number of session text 3 in the first session stream is 20, and the sequence number of session text 4 in the first session stream is 21. Then, session text 3 is selected as the adjacent session text of first session text 1.

[0116] In step 1220, each session text in the first session stream has a timestamp, thus the transmission time interval between the first session text and adjacent session texts can be determined. For example, if the timestamp of the first session text 1 is "January 3, 2026, 11:00:00" and the timestamp of session text 3 is "January 3, 2026, 11:00:08", then the transmission time interval between the first session text 1 and session text 3 is 8 seconds.

[0117] In step 1230, when the transmission time interval is less than a time threshold, the adjacent session text is identified as the second session text that has a contextual relationship with the first session text. The time threshold is used to determine whether the first session text and the adjacent session text have a contextual relationship; that is, the time threshold is the maximum allowed time interval to constitute a valid context. If the transmission time interval is less than or equal to the time threshold, it indicates that the semantic logic between the first session text and the adjacent session text is coherent; otherwise, it indicates that the semantic logic between the first session text and the adjacent session text is not coherent. This is because in actual sessions, if the time interval between the first session text and the adjacent session text is too long, the content contained in the first session text and the adjacent session text belongs to different topics and does not have coherent semantic logic. In this case, the first session text and the adjacent session text do not constitute a valid context. Therefore, it is necessary to determine whether the first session text and the adjacent session text constitute a valid context based on the time threshold. For example, regarding the aforementioned conversation text 3 and conversation text 4, if the timestamp of conversation text 4 is "January 3, 2026, 11:10:08", then the transmission time interval between conversation text 3 and conversation text 4 is 10 minutes. If the time threshold is 45 seconds, then the transmission time interval between conversation text 3 and conversation text 4 is greater than the time threshold, and conversation text 3 and conversation text 4 do not constitute a valid context. However, the transmission time interval between the first conversation text 1 and conversation text 3 is 8 seconds, so the transmission time interval between the first conversation text 1 and conversation text 3 is less than the time threshold, and the first conversation text 1 and conversation text 3 do not constitute a valid context. Therefore, conversation text 3 is selected as the second conversation text.

[0118] In this embodiment, through steps 1210-1220, firstly, adjacent session texts adjacent to the first session text are selected from the first session stream, and the transmission time interval between the first session text and the adjacent session text is determined. Then, if the transmission time interval is less than a time threshold, the adjacent session text is determined as a second session text that has a contextual relationship with the first session text. Therefore, by setting a time threshold, the time interval between the first session text and the adjacent session text can be filtered to ensure that the selected second session text and the first session text discuss the same topic, that is, to ensure that the second session text and the first session text have a valid contextual relationship.

[0119] In some embodiments, since malicious semantics can be sent to a single object or multiple objects in the same session scenario, the second session text can be selected based on the single session party corresponding to a single object and the session parties corresponding to multiple objects, respectively. Therefore, step 1120 includes: Perform at least one of the following processes: Select a second conversation text from the conversation context that has a contextual relationship with the first conversation text and belongs to the same conversation party as the first conversation text; Select a second conversation text from the conversation context that has a contextual relationship with the first conversation text and belongs to a different conversation party than the first conversation text.

[0120] A session party refers to an independent message-sending entity within a session, distinguishable by session object ID, device ID, role ID, etc. The same session party shares at least one of the same session object ID, device ID, or role ID; different session parties have different session object IDs, device IDs, or role IDs. The specific implementation for selecting a second session text from a session scenario that has a contextual relationship with the first session text and belongs to the same session party is as follows: First, for the first session stream within the same session scenario, session texts sent by the same session object are filtered. Then, the filtered session texts are sorted, and the session text adjacent to the first session text in the reordered sequence is selected as the second session text. Selecting a second session text from a session scenario that has a contextual relationship with the first session text and belongs to a different session party than the first session text can be directly determined using the methods described in steps 1210-1220. For example... Figure 14As shown, in the first session flow 1, among the first session text 2 from the first session party, and the session text 5 and session text 7 from the first session party, only session text 5 belongs to the second session text of the first session text 2; for the session text 6 and session text 7 from the second session party, and the session text 8 from the third session party, only session text 6 belongs to the second session text of the first session text 2.

[0121] The advantage of this application's embodiments lies in that, for the same conversational party, a second conversational text that has a contextual relationship with the first conversational text and belongs to the same conversational party is selected from the conversational scenario; for multiple conversational parties, a second conversational text that has a contextual relationship with the first conversational text and belongs to different conversational parties is selected from the conversational scenario. Therefore, the selection of the second conversational text can cover scenarios such as continuous expressions from the same conversational party and interactions between multiple conversational parties. This allows the third text classification model to comprehensively learn malicious semantic logic such as sentence segmentation attacks or expression splitting from the same conversational party, as well as malicious semantic logic such as relay insults and conversational malice across conversational parties, thereby improving the third text classification model's ability to understand complex contextual semantic patterns.

[0122] like Figure 15 As shown, in some embodiments, the session scenario includes multiple session streams, and step 1110 includes: Step 1510: Determine the session density corresponding to each of the multiple session streams; Step 1520: Based on the session density corresponding to each session stream, filter the multiple session streams to obtain the first session stream; Step 1530: Select the first session text from the first session stream.

[0123] Steps 1510-1530 are described in detail below.

[0124] In step 1510, conversation density refers to the numerical value obtained by counting the number of sentences within a fixed time period (such as the most recent hour) in a conversation stream, used to describe the density of conversations within a conversation stream. Compared to a conversation stream with low conversation density, a conversation stream with high conversation density can select more first and second conversation texts. Therefore, a first conversation stream can be selected from multiple conversation streams based on conversation density.

[0125] In step 1520, the multiple session streams are filtered according to their respective session densities to obtain a first session stream. Specifically, this can be achieved by selecting the session stream with the highest session density as the first session stream, or by selecting the session stream with a session density higher than a preset session density threshold as the first session stream.

[0126] In step 1530, a first session text can be selected from the first session stream. Specifically, this can be achieved by randomly selecting a session text from the first session stream as the first session text.

[0127] In this embodiment, steps 1510-1530 first determine the session density corresponding to multiple session streams, and then filter the multiple session streams according to their respective session densities to obtain a first session stream. Finally, a first session text is selected from the first session stream. Therefore, this embodiment can select a high-density session stream as the first session stream based on session density, thereby selecting session text with a larger amount of information from the same session stream as the first session text, i.e., selecting high-quality session text, thus improving the quality of sample construction. Moreover, a high-density session stream contains more session text, which expands the selection range of the first session text and thus increases the data volume of the second sample text.

[0128] In some embodiments, step 1150 includes: Construct a mask prediction dataset based on the first session text; Based on the mask prediction dataset and the second classification dataset, the third text classification model is jointly trained to obtain the first text classification model.

[0129] A masked prediction dataset refers to a dataset constructed for training contextual semantic reasoning by masking a portion of the characters, subwords, or words in the first conversation text. After determining the masked prediction dataset and the second classification dataset, a third text classification model can be jointly trained using these datasets to obtain the first text classification model. The specific construction process of the masked prediction dataset may include: randomly masking 15% of the tokens in the first conversation text to obtain a masked conversation text; then, using the first conversation text and each corresponding masked conversation text as a masked conversation sample; and finally, constructing the masked prediction dataset based on multiple masked conversation samples. The third text classification model is jointly trained based on the mask prediction dataset and the second classification dataset. The specific training process can be achieved by setting multiple classification heads in the output layer. For example, a mask language model classification head (MLMHead) is set to determine the mask prediction loss value of the third text classification model for performing the mask language model classification task based on the mask prediction dataset. By setting a text classification head, the classification loss value of the third text classification model for performing the mask language model classification task is determined based on the second classification dataset. Finally, the prediction loss value and the classification loss value are fused, and the third text classification model is trained based on the fused loss value.

[0130] The advantage of this application's embodiments lies in that, by constructing a mask prediction dataset, it provides the model with a mask language model classification task, guiding the third text classification model to deeply learn the contextual semantic relationships, lexical collocations, and contextual logic of conversational text. This enables the third text classification model to more accurately parse text with incomplete semantics and ambiguous expressions, thereby improving the contextual understanding ability of the third text classification model. Moreover, by jointly training the third text classification model based on the mask prediction dataset and the second classification dataset, a first text classification model is obtained, which can combine the mask language model classification task and the text classification task, simultaneously improving the contextual understanding ability and text classification ability of the third text classification model.

[0131] In some embodiments, a third text classification model is jointly trained based on a mask prediction dataset and a second classification dataset to obtain a first text classification model, including: Based on the mask prediction dataset, determine the mask prediction loss value of the third text classification model when performing the mask prediction task; Based on the second classification dataset, determine the classification loss value of the third text classification model when performing the classification task; The mask prediction loss and classification loss are fused together to obtain the fused loss value. The third text classification model is trained based on the fusion loss value to obtain the first text classification model.

[0132] Performing a mask prediction task (MLM) refers to the task of a third-party text classification model receiving masked session samples from a mask prediction dataset and predicting the original semantic units (words, characters, etc.) at the masked positions in the masked session samples. The mask prediction loss value is a quantitative measure of the deviation between the prediction result and the original unmasked text (label) when performing the mask prediction task. The value ranges from 0 to 1. A mask prediction loss value closer to 0 indicates higher mask prediction accuracy and stronger semantic understanding ability of the third-party text classification model; a mask prediction loss value closer to 1 indicates lower mask prediction accuracy and weaker semantic understanding ability. The classification task refers to the task of a third-party text classification model receiving samples from a second-party classification dataset and determining the category (e.g., malicious / normal) of the samples in the second-party classification dataset. The classification loss value is a quantitative measure of the deviation between the prediction result of the third text classification model and the labels in the second classification dataset when performing a classification task. The value ranges from 0 to 1. A classification loss value closer to 0 indicates higher classification accuracy and stronger context discrimination ability of the third text classification model; a classification loss value closer to 1 indicates greater classification bias and insufficient understanding of the context. In the first text classification model obtained after training, only the classification task of determining whether samples in the second classification dataset contain malicious semantics is retained. The mask prediction loss value and the classification loss value are fused to obtain the fused loss value. This can be achieved by directly adding the classification loss value and the fused loss value. The direct addition of the classification loss value and the mask prediction loss value can be expressed by the following formula: L1 = LMLM + LC; Where L1 is the fusion loss value, LMLM is the loss function value of the MLM task (Loss(MLM)), and LC is the loss function value of the task of whether the sample contains malicious semantics (Loss(Classification)).

[0133] In addition, the classification loss value and the mask prediction loss value can be weighted and summed, and the specific process can be expressed by the following formula: L1 = a·LMLM + (1-a)·LC; Where a is the weighting coefficient, and optionally, a = 0.7.

[0134] The advantage of this embodiment is that by using the fusion loss value obtained by fusing the mask prediction loss value and the classification loss value, the third text classification model is jointly trained to obtain the first text classification model. This can combine the mask language model classification task and the text classification task, and can train the third text classification model at the same time, thus improving the efficiency of training the third text classification model.

[0135] In some embodiments, before training the third text classification model based on the fusion loss value, the fourth text classification model may perform a next sentence prediction (NSP) task, and the fourth text classification model may be trained based on the result of the fourth text classification model performing the next sentence prediction (NSP) task, including: Select a third conversation text from the conversation context that has no contextual relationship with the first conversation text; Combine the first and third conversation texts into the third sample text; Based on the second sample text and the context relationship label, construct the next sentence prediction positive sample. The context relationship label is used to indicate that there is a context relationship between the first session text and the second session text in the second sample text. Based on the third sample text and the non-contextual relationship label, construct the next sentence prediction negative sample. The non-contextual relationship label is used to indicate that there is a non-contextual relationship between the first session text and the third session text in the third sample text. Construct a next sentence prediction dataset based on the positive and negative samples predicted for the next sentence. The fourth text classification model is trained using the next sentence prediction dataset to obtain the third text classification model.

[0136] Similar to the second conversation text, the third conversation text can be a single sentence sent within a conversation, or it can be a combination of multiple sentences sent consecutively within a conversation. The connection between the third sample text and the first sample text lies in the existence of a non-contextual relationship between them. For example, the third and first conversation texts may not be sequential in time and have no semantic connection; or they may be located in different conversations. The third sample text is a sample composed of the first and third conversation texts without a contextual relationship. It is used to construct negative samples for the next sentence prediction based on the non-contextual relationship labels, thereby enabling the fourth text classification model to learn the non-contextual semantic relationships between conversation texts. The contextual relationship label indicates that the first and second conversation texts in the second sample text have a contextual relationship. The positive sample for the next sentence prediction is a sample constructed based on the second sample text and the contextual relationship label. It is used during the training of the fourth text classification model to allow the model to learn the semantic and logical characteristics of the first and second conversation texts with a contextual relationship. The non-contextual relationship label indicates that the first and third conversation texts in the third sample text do not have a contextual relationship. The next-sentence prediction negative samples are constructed from the third sample text and non-contextual labels. These samples are used during the training of the fourth text classification model to allow it to learn the semantic and logical characteristics of the first and third conversational texts, which lack contextual relationships. Subsequently, a next-sentence prediction dataset can be constructed based on the next-sentence prediction positive and negative samples. The fourth text classification model is then trained using this dataset to obtain the third text classification model. Specifically, the training process can be as follows: based on the next-sentence prediction dataset, determine the next-sentence prediction loss value (NSP loss value) for the fourth text classification model when performing the NSP task, and then train the fourth text classification model based on the NSP loss value. This training effectively enhances the fourth text classification model's ability to understand the context of conversational text.

[0137] In some embodiments, training the fourth text classification model can be full fine-tuning, that is, training all modules in the fourth text classification model.

[0138] The advantage of this embodiment is that, firstly, based on the second sample text and contextual relationship labels, a positive sample for next sentence prediction is constructed. The contextual relationship labels indicate that there is a contextual relationship between the first and second conversational texts in the second sample text. Secondly, based on the third sample text and non-contextual relationship labels, a negative sample for next sentence prediction is constructed. The non-contextual relationship labels indicate that there is no contextual relationship between the first and third conversational texts in the third sample text. Then, based on the positive and negative samples for next sentence prediction, a next sentence prediction dataset is constructed. The fourth text classification model is then trained using this dataset to obtain the third text classification model. Therefore, this embodiment can pre-train and initialize the contextual capabilities of the fourth text classification model through the next sentence prediction task, enabling the fourth text classification model to possess context awareness and coherence discrimination capabilities in advance. This fundamentally improves the ability of the trained third text classification model to understand conversational texts containing malicious semantics, such as those involving sentence-segment attacks, abusive language, and split expressions.

[0139] In some embodiments, the first session text belongs to the first session stream in the session scenario, and step 1130 includes: Perform at least one of the following processes: Random conversation text is selected from the second conversation stream and identified as a third conversation text that has no contextual relationship with the first conversation text. The second conversation stream is different from the first conversation stream. Select non-adjacent session texts from the first session stream that are not adjacent to the first session text, and identify the non-adjacent session texts as third session texts that have no contextual relationship with the first session text.

[0140] The second conversation stream is a sequence of continuous conversation texts belonging to the same conversation scenario as the first conversation stream, but distributed across different conversation channels. Random conversation texts are randomly selected from the second conversation stream; they have no fixed position within the second stream and are unrelated to the first conversation texts. Non-adjacent conversation texts are those not adjacent to the first conversation texts and can be selected from both the first and second conversation streams. For example... Figure 16 As shown, for the first conversation text 1 mentioned above, there is also a random conversation text selected from the second conversation stream 1 in the conversation scenario: conversation text 9. The sequence number of conversation text 9 in the second conversation stream is 1. Therefore, conversation text 2, conversation text 4 and conversation text 9 can all be used as non-adjacent conversation texts of the first conversation text 1.

[0141] The advantage of this embodiment is that, firstly, random conversation text is selected from the second conversation stream, and this random conversation text is identified as a third conversation text that has no contextual relationship with the first conversation text. The second conversation stream is distinct from the first conversation stream. Then, non-adjacent conversation text that is not adjacent to the first conversation text is selected from the first conversation stream, and this non-adjacent conversation text is identified as a third conversation text that has no contextual relationship with the first conversation text. Therefore, this embodiment can construct third conversation samples based on non-adjacent conversation text, thereby enabling the fourth text classification model to learn the semantic and logical relationships of conversation texts without contextual relationships during training. This allows the third text classification model obtained after training to accurately distinguish whether the input conversation text is contextually coherent, thereby improving the contextual relationship discrimination ability of the first text classification model.

[0142] In some embodiments, a third text classification model is obtained by training a fourth text classification model based on a next-sentence prediction dataset, including: Construct a mask prediction dataset based on the first session text; The fourth text classification model is jointly trained using the mask prediction dataset and the next sentence prediction dataset to obtain the third text classification model.

[0143] The fourth text classification model is jointly trained using the mask prediction dataset and the next sentence prediction dataset to obtain the third text classification model. Specifically, this can be achieved by: determining the mask prediction loss value of the fourth text classification model when performing the mask prediction task based on the mask prediction dataset; determining the NSP loss value of the fourth text classification model when performing the NSP task based on the next sentence prediction dataset; fusing the mask prediction loss value and the NSP loss value to obtain the next task fusion loss value; and training the fourth text classification model using the next task fusion loss value to obtain the third text classification model. The determination of the next task fusion loss value can be achieved through simple addition or weighted addition. Directly adding the NSP loss value and the mask prediction loss value can be expressed by the following formula: L2 = LMLM + LNSP; Where L is the fusion loss value, LMLM is the loss function value Loss(MLM) for the MLM task, and LNSP is the loss function value Loss(NSP) for the NSP task.

[0144] In addition, the NSP loss value and the mask prediction loss value can be weighted and summed, and the specific process can be expressed by the following formula: L2 = a·LMLM + (1-a)·LNSP; Where a is the weighting coefficient, and optionally, a = 0.7.

[0145] The advantage of this embodiment is that it constructs a mask prediction dataset based on the first conversation text, and jointly trains the fourth text classification model based on the mask prediction dataset and the next sentence prediction dataset to obtain the third text classification model. Therefore, this embodiment can simultaneously train the fourth text classification model based on the mask prediction dataset and the next sentence prediction dataset, improving the efficiency of understanding the semantic context of the fourth text classification model.

[0146] In some embodiments, the first text classification model includes an input layer, a backbone network, and a classification head. Step 520 includes: Each first sample text is input into the input layer to obtain the representation vectors corresponding to multiple words in each first sample text; The representation vectors corresponding to multiple words in each first sample text are input into the backbone network to obtain the semantic features corresponding to each first sample text. The semantic features corresponding to each first sample text are input into the classification head to obtain the first predicted category of each first sample text and the confidence level of the first predicted category of each first sample text.

[0147] A word is the smallest semantic unit obtained after word segmentation of the first sample text, including vocabulary from various languages, letter abbreviations, symbol combinations, gaming slang, etc. The representation vectors corresponding to multiple words are continuous vectors of fixed dimension used to represent the semantic information of words, obtained after the input layer embeds each word in the first sample text. Semantic features are high-dimensional feature vectors used to characterize the overall semantics of the text, obtained after the backbone network deeply encodes and fuses multiple representation vectors of the first sample text. By inputting the semantic features corresponding to each first sample text into the classification head, the first predicted category and the confidence level of each first sample text's first predicted category can be obtained. The classification head is used to classify the high-dimensional feature vectors and can perform binary classification. The classification head can include the following three types: MLM classification head, NSP classification head, and text classification head. The NSP classification head and text classification head can use the same structure or different structures. The MLM classification head, NSP classification head, and text classification head are used to train the third and fourth text classification models, while only the text classification head is retained in the first text classification model. The text classification head is used to determine the first predicted category for each first sample text, as well as the confidence level of the first predicted category for each first sample text.

[0148] The advantages of this embodiment are as follows: First, by inputting each first sample text into the input layer, representation vectors corresponding to multiple words in each first sample text are obtained. This embodiment can transform the first sample text from "symbolic form" to "vector form" through the input layer, solving the problem that the first sample text cannot be directly calculated and processed by the first text classification model. At the same time, through precise word embedding encoding, it ensures that the semantic information of each word can be accurately quantified in the feature space, thereby improving the accuracy of semantic extraction and classification of the first sample text. Then, by inputting the representation vectors corresponding to multiple words in each first sample text into the backbone network, semantic features corresponding to each first sample text are obtained. Through multi-layer encoding processing of the backbone network, deep extraction and integration of semantics in the first sample text can be achieved, transforming the scattered word representation vectors into semantic features that can reflect the overall intent of the text, thereby improving the semantic understanding ability of the first text classification model for ambiguous text and adversarial text (such as pinyin abbreviations and slang). Finally, by inputting the semantic features corresponding to each first sample text into the classification head, the first predicted category of each first sample text and the confidence level of the first predicted category of each first sample text are obtained. This application can achieve accurate classification and classification reliability quantification of the first sample text.

[0149] In some embodiments, each first sample text is input to the input layer to obtain representation vectors corresponding to multiple words in each first sample text, including: Through the input layer, the following processing is performed on each word in each first sample text: Each word is embedded to obtain its corresponding word vector; Perform position embedding processing on each word to obtain the position vector corresponding to each word; Each word is processed by language embedding to obtain the language vector corresponding to each word; Each word is processed by paragraph embedding to obtain the paragraph vector corresponding to each word; The word vector, position vector, language vector, and paragraph vector corresponding to each word are fused to obtain the representation vector corresponding to each word.

[0150] Word vectors (TokenIDs) are used to segment the text input to the first text classification model into a sequence of tokens, with each token containing multiple words. Position vectors encode the location information of each token within the input text, enabling the first text classification model to perceive sequence order. Language vectors assign specific language encoding identifiers (EmbeddingIDs) to different languages (e.g., Chinese, English, Thai), allowing the first text classification model to share cross-language knowledge and learn shared game semantic features across different languages (e.g., making the semantic space mapping of "Gank" similar across different language vectors). Paragraph vectors describe whether multiple conversational texts exist in the first sample text. If multiple conversational texts exist, tokens belonging to the same conversational text are assigned the same label, while tokens belonging to different conversational texts are assigned different labels. The word vectors, position vectors, language vectors, and paragraph vectors corresponding to each word are fused, which can be achieved through vector-wise addition or weighted addition.

[0151] The advantage of this application's embodiments is that by performing word embedding, position embedding, language embedding, and paragraph embedding on words respectively, it is possible to determine the information mapping of words at the word, position, language, and paragraph levels to the feature space. Furthermore, by fusing word vectors, position vectors, language vectors, and paragraph vectors, the representation vector corresponding to each word can accurately and comprehensively describe the word at the aforementioned multiple levels, thereby improving the completeness of the information description of the representation vector corresponding to each word.

[0152] In some embodiments, word embedding processing is performed on each word to obtain the word vector corresponding to each word, including: Each word is queried in the shared vocabulary to obtain a word vector. Multiple words with the same meaning but different languages in the shared vocabulary correspond to the same word vector.

[0153] A shared vocabulary is a pre-built vocabulary list containing words from multiple languages and mapped semantically. It maps words with the same meaning but from different languages to the same word vector. Multiple words with the same meaning but from different languages in the shared vocabulary list correspond to the same word vector. Query processing refers to the process of finding the corresponding word vector in the shared vocabulary list for each word. The shared vocabulary list can be constructed as follows: First, create a shared vocabulary list (VocabularySize approximately 250,000) covering all target languages (Chinese, English, Thai, Vietnamese, etc.); then, use a tokenizer, such as the SentencePiece tokenizer, to segment the first sample text, obtaining multiple words, and then look up the corresponding word vector in the shared vocabulary list for each word.

[0154] The advantages of this application's embodiments are that, based on the query processing of each word in the shared vocabulary, word vectors corresponding to each word are obtained. Multiple words with the same meaning but different languages in the shared vocabulary correspond to the same word vector, which can map words from different languages to the same word vector. On the one hand, this simplifies the word embedding process, improves the efficiency of word vector generation, and ensures that each word can quickly match the corresponding semantic vector. On the other hand, by designing that words with the same semantic meaning in multiple languages correspond to the same word vector, a unified representation of multilingual semantics is achieved, solving the semantic bias problem of words in multiple languages in traditional word embedding, improving the model's adaptability to multilingual conversational text, and solving the problem of inaccurate semantic representation of word vectors caused by mixing multiple languages in the same conversational scenario. This enhances the generalization ability of the first text classification model in multilingual conversational scenarios.

[0155] like Figure 17 As shown, in some embodiments, this application also provides a text classification method, including: Step 1710: Obtain the text to be classified; Step 1720: Input the text to be classified into the second text classification model to obtain the fourth predicted category of the text to be classified; The second text classification model was trained using the same training method as the text classification model described above.

[0156] Steps 1710-1720 are described in detail below.

[0157] In step 1710, the type of text to be classified is not limited; it can be text obtained from content sharing scenarios (e.g., content sharing platforms), office scenarios (e.g., office systems), conversation scenarios, etc. Taking a conversation scenario as an example, the text to be classified can be conversation text obtained from a conversation scenario, which needs to be classified to determine whether it contains malicious semantics. The text to be classified can be a single sentence or multiple sentences sent consecutively. For example, in a game scenario, each sentence can be tested as a text to be classified, or N sentences sent consecutively can be concatenated into a single text to be classified and tested again, thereby improving the comprehensiveness of the detection.

[0158] In some embodiments, obtaining the text to be classified includes: selecting a fourth conversation text from the conversation context; selecting a fifth conversation text that has a contextual relationship with the fourth conversation text from the conversation context; and combining the fourth conversation text and the fifth conversation text into the text to be classified.

[0159] For example, in game conversation scenarios, situations such as "sentence-segmenting attacks" and "insults across sentences" frequently occur. To address these situations, a fourth conversation text can be selected from the conversation scenario, along with a fifth conversation text that has a contextual relationship with the fourth. Then, the fourth and fifth conversation texts are combined into a single text to be classified. This allows for a more accurate overall text classification, avoiding the bias that can result from classifying only the fourth or fifth conversation text.

[0160] In some embodiments, the fourth session text belongs to the third session stream in the session scenario. Selecting a fifth session text that has a contextual relationship with the fourth session text from the session scenario includes: selecting an adjacent session text from the third session stream that is adjacent to the fourth session text; determining the transmission time interval between the fourth session text and the adjacent adjacent session text; and when the transmission time interval between the fourth session text and the adjacent adjacent session text is less than a time threshold, determining the adjacent adjacent session text as the fifth session text that has a contextual relationship with the fourth session text. Thus, through the constraint of the time threshold, it can be ensured that the fourth session text and the fifth session text discuss the same topic, that is, to ensure that the fourth session text and the fifth session text have a valid contextual relationship.

[0161] In some embodiments, selecting a fifth conversational text that has a contextual relationship with the fourth conversational text from the conversational context includes performing at least one of the following processes: selecting a fifth conversational text that has a contextual relationship with the fourth conversational text and belongs to the same conversational party as the fourth conversational text; or selecting a fifth conversational text that has a contextual relationship with the fourth conversational text and belongs to different conversational parties than the fourth conversational text. This approach can cover scenarios such as continuous expressions from the same conversational party and interactions between multiple conversational parties, thereby accurately identifying situations such as continuous abuse from the same conversational party and relay abuse across conversational parties, avoiding missed detections.

[0162] In step 1720, the text to be classified is input into the second text classification model to obtain the fourth predicted category of the text to be classified; wherein, the second text classification model is trained according to the training method of the text classification model described above. The fourth predicted category is a binary category, used to describe whether the text to be classified contains malicious semantics, and the specific implementation can be referred to the embodiment of the first predicted category. The process of the second text classification model classifying the text to be classified is similar to the process of the first text classification model classifying the first sample text, and the specific implementation of step 520 can be referred to.

[0163] In this embodiment, through steps 1710-1720, the text to be classified is first obtained, and then the text to be classified is input into a second text classification model to obtain a fourth predicted category for the text to be classified. The second text classification model is trained using the aforementioned training method for text classification models. Therefore, this embodiment can classify the text to be classified based on a second text classification model with improved text classification capabilities for difficult-to-classify texts, thereby improving the accuracy of text classification.

[0164] It is worth noting that, Figure 5 Steps 510-570 shown can be performed after the first text classification model is launched, that is, the first sample text can be regarded as the text to be classified; the second text classification model is obtained by training the first text classification model based on the first classification dataset, which is equivalent to the model being launched again.

[0165] like Figure 18 As shown below, an exemplary application of this application embodiment in a real-world application scenario will be described. In a scenario of text classification for game sessions, this application embodiment proposes an end-to-end adversarial iterative system based on big-small model collaboration. This system trains the text classification model through three interdependent, data-flow-closed-loop modules (cross-language foundation, LLM active learning, and inference feature retrieval). The specific coordination and collaborative workflow of the three modules is as follows: (a) Construction of Module 1 1. Selection of a cross-language base model A unified foundational model based on XLM-BERT (Cross-lingual Language Model - BERT, i.e., a third-party text classification model) is constructed to replace the traditional multilingual independent model architecture. The specific construction method includes four aspects: network architecture design, input representation construction, shared vocabulary strategy, and output layer adaptation. The network architecture design includes an XLM-BERT model comprising an input layer, a backbone network, and an output layer. A multi-layer bidirectional Transformer encoder is used as the backbone network. The model contains N TransformerBlock layers (N=12), each layer including a multi-head self-attention mechanism and a feedforward neural network. Through the self-attention mechanism, the XLM-BERT model can simultaneously capture the dependencies between each token and other tokens in the text, thereby gaining a deeper understanding of long-distance dependencies and contextual semantics in the game context. The input representation construction part includes: To adapt to the "Next Sentence Prediction (NSP) Task Based on Game Logic Awareness" proposed in this application embodiment, and the multilingual nature of game conversations, the input layer of the XLM-BERT model consists of the following four vector summations: TokenEmbedding (word vector, obtained by segmenting the input game text into a sequence of subwords, and then determining the token sequence based on the subword sequence); PositionEmbedding (position vector, used to represent the position information of the encoded token in the sentence, enabling the XLM-BERT model to have sequence order awareness); LanguageEmbedding (language vector, assigning specific EmbeddingIDs to sentences (i.e., conversation text) in different languages (such as Chinese, English, and Thai); SegmentEmbedding (segment vector, used to support next sentence prediction). The next sentence prediction task distinguishes between the input "Sentence A" (i.e., the first conversation text) and "Sentence B" (the second or third conversation text). For example, the token belonging to the current sentence (i.e., "Sentence A") is labeled Segment A, and the token belonging to the next sentence (i.e., "Sentence B") is labeled Segment B. LanguageEmbedding is key to the XLM-BERT model's ability to achieve "cross-language knowledge sharing," enabling it to learn shared game semantic features across different languages (e.g., the semantic space mapping of "Gank" is similar across different language vectors). The shared vocabulary strategy includes: constructing a unified SentencePiece tokenizer and creating a shared vocabulary (VocabularySize, including approximately 250,000 words) covering all target languages (Chinese, English, Thai, Vietnamese, etc.).Its advantages lie in the fact that, by sharing a vocabulary, the XLM-BERT model can map words expressing the same meaning in different languages to the same TokenID (e.g., the same game terms "Buff", "Bug", "Carry"; or "Buff" and "gain effect"), thereby directly achieving zero-shot cross-language transfer and solving the problem of data sparsity in less commonly spoken languages. The output layer adaptation part includes setting up an output layer above the top-level output of the multi-layer bidirectional Transformer encoder. The output layer includes two independently constructed classification heads: the Masked Language Model Classification Head (MLMHead) and the NSP / ClassificationHead classification head. The Masked Language Model Classification Head (MLMHead) is used to output the probability distribution of the masked words. The NSP output head in the NSP / ClassificationHead classification head performs the NSP task, and the ClassificationHead output head performs the ClassificationHead task. The NSP / ClassificationHead classification heads can use the same type of classification head, differing only in parameters, to perform different tasks. Specifically, when performing the NSP task, the NSP output head outputs the binary classification probability of "whether the context is coherent". The ClassificationHead output header takes the vector representation of the first [CLS] token, performs the classification task, and outputs the binary classification probability of "whether the input text is malicious". The NSP / ClassificationHead classification header includes a fully connected layer and a Softmax activation function.

[0166] 2. Sample Construction Collect multilingual unlabeled text data from various game scenarios, including but not limited to: game public chat communication records, private chat records, in-game emails, game forum posts, and game customer service communication records. The unlabeled text data covers at least 10 major languages (Chinese, English, Thai, Vietnamese, Indonesian, Portuguese, Spanish, Japanese, Korean, Arabic, etc.).

[0167] 3. Training task design The training task design comprises three parts: a masked language modeling (MLM) task, a game logic-aware next-sentence prediction (NSP) task, and a classification task. The MLM task sets the paragraph vector values to 0, allowing the XLM-BERT model to use a single sentence from a game (i.e., the first conversation text) as a sample. Then, 15% of the tokens in this single sentence are randomly masked, and the masked sentence is input into the XLM-BERT model. The model is trained to predict the masked tokens based on context and learn multilingual semantic representations. The NSP task first structures the original unlabeled text to obtain log data, enabling data preprocessing and conversation stream construction. Then, the log data is grouped according to [Game Match ID (Match_ID)], and within each group, it is sorted in ascending order according to [Sentence Timestamp (Timestamp)], forming several independent ordered conversation stream sequences S={M1,M2,……,Mn}, where S represents an independent ordered conversation stream sequence, M1 represents the earliest generated conversation text in the group, M2 represents the second generated conversation text in the group, and Mn represents the nth generated conversation text in the group, thus achieving sorting and indexing of the log data. Next, to ensure training quality, game matches with a conversation density reaching a preset threshold (e.g., >5 messages per minute) are selected first, while game matches with no dialogue or very little dialogue are removed to achieve noise filtering. Subsequently, methods for constructing positive samples (i.e., next sentence prediction positive samples) (labeled IsNext), negative samples (i.e., next sentence prediction negative samples) (labeled NotNext), and multi-task joint training methods are designed. The positive sample construction method aims to enable the XLM-BERT model to learn true contextual coherence, particularly for multiple context-related conversational texts of the same user and multiple context-related conversational texts between different users. The positive sample construction method includes: 1) Extraction Logic: Within the ordered sequence of the same Match_ID group, select statements T_i with consecutive index positions as sentence A (i.e., the first conversation text), and select statements T_{i+1} with a contextual relationship to sentence A as sentence B (i.e., the second conversation text). Furthermore, for cases where the same player splits malicious words into multiple sentences for transmission, such as "this", "hand", "real", "is", "one", "group", "stupid", and "egg", these sentences are sequentially combined into sentence A and sentence B. For example, sentence A is "This hand is really", and sentence B is "A group of stupid eggs". The criteria for determining if a sentence is the second conversation text (simultaneously satisfying the following conditions) include: game consistency, i.e., sentence A and sentence B belong to the same Match_ID; and temporal continuity, i.e., the difference between the transmission time Time_B of sentence B and the transmission time Time_A of sentence A is less than a preset time threshold △t (e.g., △t <= 45 seconds). If the time interval is greater than the time threshold, even if sentence A and sentence B are adjacent, there may be a logical break. Therefore, sentences A and B are not suitable for constructing positive samples.

[0168] 2) Scenario-specific logic, including: sentence-segmentation attack scenarios and interactive scenarios. Sentence-segmentation attack scenarios allow User_A = User_B (i.e., two consecutive sentences sent by the same user), which helps the model learn malicious semantics that have been split (e.g., sentence A: "stupid", sentence B: "egg"). Interactive scenarios allow User_A ≠ User_B (i.e., dialogue between different users), used to learn malicious semantics related to alternating sentences of profanity or context.

[0169] To enhance the XLM-BERT model's ability to discriminate semantic incoherence, a hierarchical sampling strategy is employed to construct negative samples. The negative sample construction methods include: 1) Specific extraction method and order: Strategy 1: Random Negative Samples Across Conversation Streams (approximately 50%). The random negative samples across conversation streams are extracted as follows: A Match_ID different from sentence A is randomly selected, and any sentence from that Match_ID is randomly selected as sentence C (i.e., the third conversation text). The logic for determining random negative samples across conversation streams is that, due to the different Match_IDs, sentence C must not be a logical successor to sentence A. Strategy 1 helps the model distinguish topic differences across different contexts.

[0170] Strategy Two: Restricting Negative Samples by Jumping Within the Same Stream (approximately 50%). The extraction method for negative samples by jumping within the same stream is as follows: In the Match_ID sequence identical to sentence A, extract the statement at index T_{i+k} as sentence C (where k is the jump step size, ranging from k>=2 to k<-1). The logic for determining negative samples by jumping within the same stream is that although sentences A and C discuss the same game (possibly sharing terms like "tower pushing" or "set"), they lack direct logical coherence due to the presence of other statements in between. The purpose of filtering negative samples by jumping within the same stream is to force the XLM-BERT model to not only understand the "topic" (i.e., all input texts to the XLM-BERT model are game-related) but also the "logical flow" (i.e., whether multiple input texts to the XLM-BERT model are adjacent and have a contextual relationship).

[0171] The construction of positive and negative samples follows a specific extraction order and rules. First, during the training data generation phase, the preprocessed conversation stream is traversed to determine sentence A, thus establishing the anchor point. Then, for each sentence A, positive samples are constructed in a 50% ratio, and negative samples in a 50% ratio, achieving probabilistic routing. Next, if negative samples are determined to be constructed, either "Strategy 1 (cross-conversation)" or "Strategy 2 (same-conversation skipping)" is randomly selected in a 1:1 ratio. By mixing negative samples constructed using either Strategy 1 or Strategy 2, the XLM-BERT model avoids taking shortcuts (i.e., it avoids the XLM-BERT model distinguishing positive and negative samples solely by judging whether they contain the same names / terms). For positive and negative samples, the labeling method is as follows: if sentences A and B satisfy the judgment condition in the "positive sample construction method," then a label Label = 1 (IsNext, i.e., contextual relationship label) is assigned. If sentences A and C satisfy the logic in the "negative sample construction method" (i.e., different games or the same game but skipping samples), then the label Label=0 (NotNext, i.e., non-contextual relationship label). Therefore, through the design of the above training task, it is clear how to select sentences A, B, and C: without clearly stating "group first and then sort by time," it is impossible to achieve contextual relationship understanding. The addition of a time threshold (Δt) makes the selection method of sentences A, B, and C more consistent with the conversational logic of objects in real games, because two sentences separated by 10 minutes in a game usually have no contextual relationship. Furthermore, this application embodiment clarifies how to construct negative samples: the general BERT model can only achieve cross-document negative sample construction, while this application embodiment adds difficult classification text of "same game but not consecutive," therefore, the XLM-BERT model can identify "sentence breaks"—because it is trained to distinguish between "adjacent" positive samples and "although in the same game but not adjacent" negative samples. Furthermore, the embodiments of this application also clarify the sample selection logic of "same object / different object": it particularly emphasizes that positive samples allow User_A=User_B, which is uncommon in traditional NLP dialogue tasks (usually dialogue is A saying a sentence and B saying a sentence), but this is precisely the key to detecting single-person sentence segmentation attacks.

[0172] Finally, through training with positive and negative samples, the XLM-BERT model learns the contextual coherence of game dialogues, the logical connections in conversation flow, and the semantic consistency in multi-turn dialogues. This process can be achieved through multi-task joint training: setting the loss function of the XLM-BERT model as L=α. LMLM+(1-α) LNSP, where α is the weight coefficient, for example α=0.7, LMLM is the loss function value Loss(MLM) for the MLM task, and LNSP is the loss function value Loss(NSP) for the NSP task. The Adam optimizer is used, with a linear warmup strategy as the learning rate, and the XLM-BERT model is trained for 3-5 epochs.

[0173] Then, the XLM-BERT model can be trained based on the second sample text (including the first and second conversation texts) and their corresponding label categories. This process can be achieved through multi-task joint training: setting the loss function of the XLM-BERT model as: L=α LMLM+(1-α) LC, where α is the weight coefficient, for example α=0.7, LMLM is the loss function value for the MLM task (Loss(MLM)), and LC is the loss function value for the Classification task. The Adam optimizer is used, with a linear warmup strategy as the learning rate, and the XLM-BERT model is trained for 3-5 epochs.

[0174] (II) Cold Start and Execution During this process, the multilingual unified foundation model (XLM-BERT) built based on Module 1 is initialized and deployed. Specifically, the XLM-BERT model serves as the system's "online inference engine," responsible for processing massive amounts of real-time game session data. In this stage, the XLM-BERT model built in Module 1 must not only output text classification labels (normal / malicious) but also the prediction confidence score. This confidence score is the key signal that triggers the start of Module 2.

[0175] (III) Construction of Module Two: Module 2 is used for active learning closed loop based on large model-driven learning, including: uncertainty sampling module, LLM expert judgment module, automatic misjudgment analysis and similar text recall module, and incremental fine-tuning module.

[0176] 1. Uncertainty Sampling Module The XLM-BERT model was deployed using the model prediction service of the big data platform (allblue). The platform uploaded the offline model files to the server and automatically completed the background deployment of the service, allowing external machines to access the model service and obtain the prediction results via HTTP requests. The specific inference process of the XLM-BERT model involves vectorizing the text and predicting class probability values based on the forward propagation of a neural network. First, sample preprocessing was performed: depending on the error type to be identified, statements made during the game were concatenated (e.g., merging multiple statements into sentence A and sentence B as a single sample) or only one statement was considered as a single sample, and placed into an unlabeled data pool. Then, the deployed XLM-BERT model inferred from each sample in the unlabeled data pool, obtaining the predicted label and confidence score for each sample. Next, samples with confidence scores below a threshold (e.g., 0.6) were selected as difficult classification texts, and sorted by confidence score from low to high. The Top-K difficult classification texts were selected as samples for training the XLM-BERT model (e.g., K=1000).

[0177] 2. LLM Expert Judgment Module In the LLM expert judgment module, classification prompts are first constructed according to the prompt design method. Then, a general pre-trained language model (such as GPT-4, Gemini Pro, Claude-3, etc.) is selected, and all samples and classification prompts from the unlabeled data pool are input into the pre-trained language model. Then, the labels, confidence scores, rationales, and attribution analyses output by the pre-trained language model are obtained. The construction of classification prompts includes the following elements:

Task Description

[0178] [Input Sample] Text: {Text content to be judged} Language: {The language of the text}

knowledge base

[0179] 3. Automatic False Positive Analysis and Similar Text Recall Module The automatic misjudgment analysis and similar text recall module is used for inference feature extraction and similar text recall based on inference features. Inference feature extraction includes: first, extracting optional attribution types as structured error type labels from the attribution analysis results of the pre-trained language model. For each sample in the unlabeled data pool, constructing an error feature vector based on the structured error type labels: [whether it's a homophonic attack, symbol interference, semantic decomposition, contextual ambiguity, gaming slang, or a minority language]. The similar text recall process based on inference features includes: first, for samples in the unlabeled data pool (i.e., the text to be recalled), using the XLM-BERT model to convert the text content of the samples in the unlabeled data pool into semantic representation vectors (i.e., second classification cause vectors); and for hard-classified texts, using the XLM-BERT model to convert the text content of hard-classified texts into extracted semantic representation vectors (i.e., first classification cause vectors). Then, for each difficult-classified text, the semantic similarity between the difficult-classified text and each sample in the unlabeled data pool is calculated based on the first and second classification cause vectors. Samples from the unlabeled data pool with high similarity are selected, and their semantic similarity is weighted. Next, based on the weighted semantic similarity, samples from the unlabeled data pool whose similarity can be judged by the semantic similarity threshold are selected as similar texts to achieve similar text recall. The semantic similarity threshold can be selected from the following range: 0.75-0.85, and the number of similar texts is 3-5 times that of the difficult-classified texts.

[0180] 4. Incremental Fine-tuning Module The incremental fine-tuning module is used to incrementally fine-tune the XLM-BERT model based on the constructed dataset. The dataset is constructed by incrementally fine-tuning the XLM-BERT model using hard-classified text corrected by the pre-trained language model (using the labels output by the pre-trained language model as gold labels) and recalled similar texts (inheriting the labels from the hard-classified texts). The incremental fine-tuning strategy can be to freeze the bottom layers of the XLM-BERT model's Transformer encoders (e.g., freeze the first 6 layers) and only fine-tune the top-level classification head and some higher-level Transformer encoders. The learning rate is set as follows: to The training epochs are set to 1 to 2, and the batch size is set to 32 to 64.

[0181] (iv) Diagnosis and Teaching When the XLM-BERT model built in Module 1 encounters low-confidence samples (i.e., difficult-to-classify text, such as novel variant attacks), the system automatically intercepts them and transmits them to the uncertainty sampling module in Module 2. Module 2 introduces a general pre-trained language model (LLM) as a "teacher." The LLM not only accurately labels the difficult-to-classify text (correcting errors in Module 1), but more importantly, it outputs misclassification attribution analysis through classification cue word engineering (e.g., "This is because 'bd' was used as the initials of 'stupid' in pinyin"). This step transforms the "black box" error attribution analysis process into "interpretable" structured features, providing input for Module 3.

[0182] (V) Construction of Module Three Module 3 is used for similar text retrieval based on inference features, comprising three parts: error feature extraction, feature-driven recall, and incremental fine-tuning. The error feature extraction part extracts "inference features" from the discrimination reasons output by the pre-trained language model, such as semantic features like "split to send malicious content" and "contextual implied attack intent." The feature-driven recall part uses the extracted inference features to perform matching and retrieval in the unlabeled pool. The technical implementation includes: Method A: Use sentence-embedding to vectorize the inference features of the hard-classified text and the unlabeled pool samples. Based on the inference feature vector of the hard-classified text, perform semantic retrieval on the inference feature vector of the unlabeled pool samples to obtain unlabeled pool samples similar to the hard-classified text. Method B: Have the pre-trained language model label the features of unlabeled samples, and select unlabeled samples with similar inference features as similar text; Method C: Construct a feature rule base, and recall similar samples by matching the inference features of hard-classified texts and samples in the unlabeled pool. The rules can be: enumeration of inference features, construction of rules based on the commonalities of inference features, treating each inference feature as a rule, or matching through sentence syntax, etc.

[0183] The incremental fine-tuning part is used to incrementally fine-tune the XLM-BERT model. The specific implementation process is as follows: First, the difficult-classified texts corrected by the pre-trained language model and the recalled similar texts are merged to construct an incremental training set. Then, the XLM-BERT model is incrementally fine-tuned based on the incremental training set (without destroying its original capabilities). Finally, the XLM-BERT model is evaluated based on the validation set to determine whether it can now classify difficult-classified texts, and the XLM-BERT model that passes the evaluation is deployed online.

[0184] Based on the recall scheme provided in Module 3 above, the XLM-BERT model can be improved to achieve the following technical effects: 1) Expand from single adversarial examples (i.e., difficult-to-classify texts) to batch discovery of similar examples; 2) Improve recall accuracy; 3) Achieve automatic mining of adversarial examples and automatic iteration of the XLM-BERT model.

[0185] (V) Discovery and Feedback Module 3 utilizes the "misjudgment attribution features" output by Module 2 to perform feature-driven similar text recall in a massive unlabeled data pool. Traditional methods can only recall samples that are "literally similar," but by combining the analysis from Module 2, Module 3 can recall samples that are "logically similar" (e.g., recalling all samples containing "initial pinyin abbreviation + offensive intent," even if the characters are completely different).

[0186] (vi) Closed-loop iteration The "hard-classified text corrected in Module 2" and the "similar samples recalled in Module 3" are merged to construct a high-value incremental training set. Finally, this incremental dataset is used to perform incremental fine-tuning on the XLM-BERT model in Module 1. The updated Module 1 model is then re-launched, having learned to recognize this new type of attack, thus completing an automated closed loop from "problem discovery" to "problem resolution" without human intervention.

[0187] To facilitate better implementation of the text classification model training method provided in this application embodiment, this application embodiment also provides an apparatus based on the above-described text classification model training method. The meanings of the terms used are the same as in the above-described text classification model training method, and specific implementation details can be found in the description of the method embodiment.

[0188] Please see Figure 19 , Figure 19 This is a schematic diagram of the structure of a text classification model training device 1900 provided in an embodiment of this application. The text classification model training device 1900 is applied to a computer device and includes: The sample acquisition module 1910 is used to acquire multiple first sample texts; The first prediction module 1920 is used to input each first sample text into the first text classification model to obtain the first predicted category of each first sample text and the confidence level of the first predicted category of each first sample text. The sample filtering module 1930 is used to filter multiple first sample texts based on the confidence of their respective first predicted categories to obtain difficult classification texts. The second prediction module 1940 is used to input classification prompt words and difficult-to-classify text into the pre-trained language model to obtain the second predicted category of the difficult-to-classify text and the first classification reason of the second predicted category of the difficult-to-classify text. The classification prompt words are used to guide the pre-trained language model to perform text classification and classification reason inference. The sample recall module 1950 is used to recall similar texts that are similar to the difficult-classified text from multiple texts to be recalled based on the first classification reason of the second predicted category of the difficult-classified text. The text to be recalled represents the first sample text other than the difficult-classified text. The dataset construction module 1960 is used to construct first classification samples based on difficult-to-classify text and second predicted category, construct second classification samples based on similar text and second predicted category, and construct first classification dataset based on first classification samples and second classification samples. The first training module 1970 is used to train the first text classification model based on the first classification dataset to obtain the second text classification model.

[0189] In some embodiments, the second prediction module 1940 is further configured to: The classification prompts and each text to be recalled are input into the pre-trained language model to obtain the third predicted category of each text to be recalled and the second classification reason of the third predicted category of each text to be recalled. The first classification reason for the second predicted category of the difficult-to-classify text and the second classification reason for the third predicted category of each text to be recalled are matched to obtain the classification reason matching result between the difficult-to-classify text and each text to be recalled. Based on the classification cause matching results between the difficult-classified text and multiple texts to be recalled, similar texts that are similar to the difficult-classified text are recalled from the multiple texts to be recalled.

[0190] In some embodiments, the second prediction module 1940 is further configured to: Based on the first semantic features extracted from the difficult-classified text by the first text classification model and the second semantic features extracted from each text to be recalled by the first text classification model, the first semantic similarity between the difficult-classified text and each text to be recalled is determined. Based on the classification cause matching results between the difficult-classified text and each text to be recalled, the first semantic similarity between the difficult-classified text and each text to be recalled is updated to obtain the second semantic similarity between the difficult-classified text and each text to be recalled. Based on the second semantic similarity between the difficult-classified text and multiple texts to be recalled, similar texts that are similar to the difficult-classified text are recalled from the multiple texts to be recalled.

[0191] In some embodiments, the second prediction module 1940 is further configured to: When the classification reason matching result between the difficult classification text and each text to be recalled is a successful match, the first semantic similarity between the difficult classification text and each text to be recalled is amplified to obtain the second semantic similarity between the difficult classification text and each text to be recalled. When the classification reason matching result between the difficult classification text and each text to be recalled fails, the first semantic similarity between the difficult classification text and each text to be recalled is determined as the second semantic similarity between the difficult classification text and each text to be recalled.

[0192] In some embodiments, the second prediction module 1940 is further configured to: Embed the first classification reason of the second predicted category of the difficult-classified text to obtain the first classification reason vector corresponding to the difficult-classified text; The second classification reason of the third predicted category of each text to be recalled is embedded to obtain the second classification reason vector corresponding to each text to be recalled; Vector matching is performed on the first classification cause vector corresponding to the difficult-to-classify text and the second classification cause vector corresponding to each text to be recalled to obtain the classification cause matching result between the difficult-to-classify text and each text to be recalled.

[0193] In some embodiments, the second prediction module 1940 is further configured to: Literal matching is performed on the first classification reason of the second predicted category of the difficult-to-classify text and the second classification reason of the third predicted category of each text to be recalled to obtain the classification reason matching result between the difficult-to-classify text and each text to be recalled.

[0194] In some embodiments, the sample recall module 1950 is further configured to: The recall prompts, difficult-classified texts, and multiple texts to be recalled are input into a pre-trained language model to obtain similar texts that are similar to the difficult-classified texts. Among them, recall prompts are used to guide the pre-trained language model to recall texts from multiple texts to be recalled based on the difficult-to-classify texts.

[0195] In some embodiments, the first text classification model includes a backbone network and a classification head. The backbone network includes N cascaded encoding layers, where N is an integer greater than 2. The first training module 1970 is further used for: Based on the first classification dataset, the encoding layer and classification head in the first text classification model are trained to obtain the second text classification model; The coding layers to be trained include the nth coding layer to the Nth coding layer, where n is an integer greater than 1 and not exceeding N.

[0196] In some embodiments, the training method for the text classification model is applied to a conversational scenario. The training apparatus for the text classification model further includes a second training module (not shown), which is used for: Select the first conversation text from the conversation context; Select a second conversation text that has a contextual relationship with the first conversation text from the conversation context; Combine the first and second conversation texts into a second sample text, and obtain the tag category corresponding to the second sample text; Construct a third category sample based on the second sample text and label category, and construct a second category dataset based on the third category sample; The third text classification model is trained based on the second classification dataset to obtain the first text classification model.

[0197] In some embodiments, the first conversation text belongs to a first conversation stream in a conversation scenario, and the second training module is further used for: Select the adjacent session text that is adjacent to the first session text from the first session stream; Determine the transmission time interval between the first session text and the adjacent session text; When the transmission time interval is less than the time threshold, the adjacent session text is identified as the second session text that has a contextual relationship with the first session text.

[0198] In some embodiments, the second training module is further configured to: Perform at least one of the following processes: Select a second conversation text from the conversation context that has a contextual relationship with the first conversation text and belongs to the same conversation party as the first conversation text; Select a second conversation text from the conversation context that has a contextual relationship with the first conversation text and belongs to a different conversation party than the first conversation text.

[0199] In some embodiments, the session scenario includes multiple session streams, and the second training module is further configured to: Determine the session density corresponding to each of the multiple session streams; Based on the session density corresponding to each of the multiple session streams, the multiple session streams are filtered to obtain the first session stream; Select the first session text from the first session stream.

[0200] In some embodiments, the second training module is further configured to: Construct a mask prediction dataset based on the first session text; Based on the mask prediction dataset and the second classification dataset, the third text classification model is jointly trained to obtain the first text classification model.

[0201] In some embodiments, the second training module is further configured to: Based on the mask prediction dataset, determine the mask prediction loss value of the third text classification model when performing the mask prediction task; Based on the second classification dataset, determine the classification loss value of the third text classification model when performing the classification task; The mask prediction loss and classification loss are fused together to obtain the fused loss value. The third text classification model is trained based on the fusion loss value to obtain the first text classification model.

[0202] In some embodiments, the second training module is further configured to: Select a third conversation text from the conversation context that has no contextual relationship with the first conversation text; Combine the first and third conversation texts into the third sample text; Based on the second sample text and the context relationship label, construct the next sentence prediction positive sample. The context relationship label is used to indicate that there is a context relationship between the first session text and the second session text in the second sample text. Based on the third sample text and the non-contextual relationship label, construct the next sentence prediction negative sample. The non-contextual relationship label is used to indicate that there is a non-contextual relationship between the first session text and the third session text in the third sample text. Construct a next sentence prediction dataset based on the positive and negative samples predicted for the next sentence. The fourth text classification model is trained using the next sentence prediction dataset to obtain the third text classification model.

[0203] In some embodiments, the first conversation text belongs to a first conversation stream in a conversation scenario, and the second training module is further used for: Perform at least one of the following processes: Random conversation text is selected from the second conversation stream and identified as a third conversation text that has no contextual relationship with the first conversation text. The second conversation stream is different from the first conversation stream. Select non-adjacent session texts from the first session stream that are not adjacent to the first session text, and identify the non-adjacent session texts as third session texts that have no contextual relationship with the first session text.

[0204] In some embodiments, the second training module is further configured to: Construct a mask prediction dataset based on the first session text; The fourth text classification model is jointly trained using the mask prediction dataset and the next sentence prediction dataset to obtain the third text classification model.

[0205] In some embodiments, the first prediction module 1920 is further configured to: Each first sample text is input into the input layer to obtain the representation vectors corresponding to multiple words in each first sample text; The representation vectors corresponding to multiple words in each first sample text are input into the backbone network to obtain the semantic features corresponding to each first sample text. The semantic features corresponding to each first sample text are input into the classification head to obtain the first predicted category of each first sample text and the confidence level of the first predicted category of each first sample text.

[0206] In some embodiments, the first prediction module 1920 is further configured to: Through the input layer, the following processing is performed on each word in each first sample text: Each word is embedded to obtain its corresponding word vector; Perform position embedding processing on each word to obtain the position vector corresponding to each word; Each word is processed by language embedding to obtain the language vector corresponding to each word; Each word is processed by paragraph embedding to obtain the paragraph vector corresponding to each word; The word vector, position vector, language vector, and paragraph vector corresponding to each word are fused to obtain the representation vector corresponding to each word.

[0207] In some embodiments, the first prediction module 1920 is further configured to: Each word is queried in the shared vocabulary to obtain a word vector. Multiple words with the same meaning but different languages in the shared vocabulary correspond to the same word vector.

[0208] According to one aspect of this application, a text classification device (not shown) is also provided, comprising: The text acquisition module (not shown in the figure) is used to acquire the text to be classified. The third prediction module (not shown in the figure) is used to input the text to be classified into the second text classification model to obtain the fourth predicted category of the text to be classified. The second text classification model was trained using the same training method as the text classification model described above.

[0209] For details on the implementation of each of the above modules, please refer to the previous examples, which will not be repeated here.

[0210] To facilitate better implementation of the text classification method provided in this application, this application also provides an apparatus based on the above-described text classification method. The meanings of the terms used are the same as in the text classification method described above, and specific implementation details can be found in the descriptions within the method embodiments.

[0211] Please see Figure 20 , Figure 20 This is a schematic diagram of the structure of a text classification device 2000 provided in an embodiment of this application. The text classification device 2000 is applied to a computer device and includes: The text acquisition module 2010 is used to acquire the text to be classified. The third prediction module 2020 is used to input the text to be classified into the second text classification model to obtain the fourth predicted category of the text to be classified. The second text classification model was trained using the same training method as the text classification model described above.

[0212] For details on the implementation of each of the above modules, please refer to the previous examples, which will not be repeated here.

[0213] Reference Figure 21 , Figure 21 To implement the structural block diagram of a portion of the terminal 140 in this application embodiment, the terminal 140 includes: a radio frequency (RF) circuit 2110, a memory 2115, an input unit 2130, a display unit 2140, a sensor 2150, an audio circuit 2160, a wireless fidelity (WiFi) module 2170, a processor 2180, and a power supply 2190, among other components. Those skilled in the art will understand that... Figure 21The terminal 140 structure shown does not constitute a limitation on a mobile phone or computer, and may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0214] The RF circuit 2110 can be used to receive and transmit signals during information transmission or calls. In particular, it receives downlink information from the base station and processes it with the processor 2180; in addition, it transmits uplink data to the base station.

[0215] The memory 2115 can be used to store software programs and modules. The processor 2180 executes various functional applications and related processing of the terminal by running the software programs and modules stored in the memory 2115.

[0216] The input unit 2130 can be used to receive input numeric or character information, and to generate key signal inputs related to the settings and function control of the terminal. Specifically, the input unit 2130 may include a touch panel 2131 and other input devices 2132.

[0217] Display unit 2140 can be used to display input or provided information, as well as various menus of the terminal. Display unit 2140 may include display panel 2141.

[0218] Audio circuitry 2160, speaker 2161, and microphone 2162 provide an audio interface.

[0219] In this embodiment, the processor 2180 included in the terminal 140 can execute the text classification model training method or text classification method of the previous embodiment.

[0220] The terminal 140 in this application embodiment includes, but is not limited to, mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle terminals, and aircraft. This application embodiment can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, smart transportation, and assisted driving.

[0221] Figure 22This is a partial structural block diagram of a server 110 implementing an embodiment of this application. The server 110 can vary significantly due to different configurations or performance, and may include one or more Central Processing Units (CPUs) 2222 (e.g., one or more processors) and a memory 2232, and one or more storage media 2230 (e.g., one or more mass storage devices) for storing application programs 2242 or data 2244. The memory 2232 and storage media 2230 can be temporary or persistent storage. The program stored in the storage media 2230 may include one or more modules (not shown in the diagram), each module including a series of instruction operations on the server 110. Furthermore, the CPU 2222 may be configured to communicate with the storage media 2230 and execute the series of instruction operations in the storage media 2230 on the server 110.

[0222] Server 110 may also include one or more power supplies 2226, one or more wired or wireless network interfaces 2250, one or more input / output interfaces 2258, and / or one or more operating systems 2241, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, etc.

[0223] The central processing unit 2222 in server 110 can be used to execute the training method or text classification method of the text classification model in the embodiments of this application.

[0224] This application also provides a computer-readable storage medium for storing program code, which is used to execute the training method or text classification method of the text classification model in the foregoing embodiments.

[0225] This application also provides a computer program product, which includes a computer program. A processor of a computer device reads and executes the computer program, causing the computer device to perform a training method or a text classification method that implements the above-described text classification model.

[0226] Furthermore, the terms “comprising” and “including”, and any variations thereof, are intended to cover non-exclusive inclusion, such that a process, method, system, product, or apparatus that includes a series of steps or units is not necessarily limited to those steps or units that are explicitly listed, but may include other steps or units that are not explicitly listed or that are inherent to such process, method, product, or apparatus.

[0227] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.

[0228] It should be understood that in the description of the embodiments of this application, "multiple" means two or more, "greater than", "less than", "exceeding" etc. are understood to exclude the number itself, and "above", "below", "within" etc. are understood to include the number itself.

[0229] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, or indirect coupling or communication connection between apparatuses or units, and may be electrical, mechanical, or other forms.

[0230] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of the embodiments of this application, depending on actual needs.

[0231] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0232] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0233] It should also be understood that the various implementation methods provided in this application can be combined arbitrarily to achieve different technical effects.

[0234] In the embodiments of this application, the terms "module" or "unit" refer to a computer program or part of a computer program that has a predetermined function and works with other related parts to achieve a predetermined goal, and can be implemented wholly or partially using software, hardware (such as processing circuitry or memory), or a combination thereof. Similarly, a processor (or multiple processors or memory) can be used to implement one or more modules or units. Furthermore, each module or unit can be part of an overall module or unit that includes the functionality of that module or unit.

[0235] The above is a detailed description of the embodiments of this application. However, this application is not limited to the above embodiments. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of this application. All such equivalent modifications or substitutions are included within the scope defined by the claims of this application.

Claims

1. A training method for a text classification model, characterized in that, include: Obtain multiple first sample texts; Each of the first sample texts is input into the first text classification model to obtain the first predicted category of each first sample text and the confidence level of the first predicted category of each first sample text. Based on the confidence level of each of the first sample texts in the first predicted category, the multiple first sample texts are filtered to obtain difficult-to-classify texts; The classification prompt words and the difficult-to-classify text are input into the pre-trained language model to obtain the second predicted category of the difficult-to-classify text and the first classification reason of the second predicted category of the difficult-to-classify text. The classification prompt words are used to guide the pre-trained language model to perform text classification and classification reason inference. Based on the first classification reason of the second predicted category of the difficult-classified text, similar texts that are similar to the difficult-classified text are recalled from a plurality of texts to be recalled, wherein the texts to be recalled represent the first sample texts other than the difficult-classified text; A first classification sample is constructed based on the difficult-to-classify text and the second predicted category; a second classification sample is constructed based on the similar text and the second predicted category; and a first classification dataset is constructed based on the first classification sample and the second classification sample. The first text classification model is trained based on the first classification dataset to obtain the second text classification model.

2. The training method for the text classification model according to claim 1, characterized in that, The step of recalling similar texts to the difficult-to-classify text from a plurality of texts to be recalled based on the first classification reason of the second predicted category of the difficult-to-classify text includes: The classification prompt words and each of the texts to be recalled are input into the pre-trained language model to obtain the third predicted category of each text to be recalled and the second classification reason of the third predicted category of each text to be recalled. The first classification reason of the second predicted category of the difficult-to-classify text and the second classification reason of the third predicted category of each of the texts to be recalled are matched to obtain the classification reason matching result between the difficult-to-classify text and each of the texts to be recalled; Based on the classification reason matching results between the difficult-classified text and the plurality of texts to be recalled, similar texts that are similar to the difficult-classified text are recalled from the plurality of texts to be recalled.

3. The training method for the text classification model according to claim 2, characterized in that, The step of recalling similar texts that are similar to the difficult-to-classify text from among the multiple texts to be recalled, based on the classification cause matching results between the difficult-to-classify text and the multiple texts to be recalled, includes: Based on the first semantic features extracted from the difficult-classified text by the first text classification model and the second semantic features extracted from each of the texts to be recalled by the first text classification model, a first semantic similarity between the difficult-classified text and each of the texts to be recalled is determined. Based on the classification reason matching result between the difficult-classified text and each of the texts to be recalled, the first semantic similarity between the difficult-classified text and each of the texts to be recalled is updated to obtain the second semantic similarity between the difficult-classified text and each of the texts to be recalled. Based on the second semantic similarity between the difficult-classified text and the plurality of texts to be recalled, similar texts that are similar to the difficult-classified text are recalled from the plurality of texts to be recalled.

4. The training method for the text classification model according to claim 3, characterized in that, The step of updating the first semantic similarity between the difficult-classified text and each text to be recalled based on the classification reason matching result between the difficult-classified text and each text to be recalled, to obtain the second semantic similarity between the difficult-classified text and each text to be recalled, includes: When the classification reason matching result between the difficult-classified text and each of the texts to be recalled is a successful match, the first semantic similarity between the difficult-classified text and each of the texts to be recalled is amplified to obtain the second semantic similarity between the difficult-classified text and each of the texts to be recalled. The training method for the text classification model also includes: When the classification reason matching result between the difficult-classified text and each of the texts to be recalled fails, the first semantic similarity between the difficult-classified text and each of the texts to be recalled is determined as the second semantic similarity between the difficult-classified text and each of the texts to be recalled.

5. The training method for the text classification model according to claim 2, characterized in that, The process of matching the first classification reason of the second predicted category of the difficult-to-classify text with the second classification reason of the third predicted category of each of the texts to be recalled, to obtain the classification reason matching result between the difficult-to-classify text and each of the texts to be recalled, includes: The first classification reason of the second predicted category of the difficult-to-classify text is embedded to obtain the first classification reason vector corresponding to the difficult-to-classify text; The second classification reason of the third predicted category of each text to be recalled is embedded to obtain the second classification reason vector corresponding to each text to be recalled; Vector matching is performed on the first classification reason vector corresponding to the difficult-to-classify text and the second classification reason vector corresponding to each of the texts to be recalled to obtain the classification reason matching result between the difficult-to-classify text and each of the texts to be recalled.

6. The training method for the text classification model according to claim 2, characterized in that, The process of matching the first classification reason of the second predicted category of the difficult-to-classify text with the second classification reason of the third predicted category of each of the texts to be recalled, to obtain the classification reason matching result between the difficult-to-classify text and each of the texts to be recalled, includes: The first classification reason of the second predicted category of the difficult-to-classify text and the second classification reason of the third predicted category of each of the texts to be recalled are subjected to literal matching processing to obtain the classification reason matching result between the difficult-to-classify text and each of the texts to be recalled.

7. The training method for the text classification model according to claim 1, characterized in that, The training method for the text classification model also includes: The recall prompt, the difficult-classified text, and multiple texts to be recalled are input into the pre-trained language model to obtain similar texts that are similar to the difficult-classified text. The recall prompt words are used to guide the pre-trained language model to recall texts from multiple texts to be recalled based on the difficult-classified texts.

8. The training method for the text classification model according to any one of claims 1 to 7, characterized in that, The training method for the text classification model is applied to a conversational scenario. Before inputting each first sample text into the first text classification model, the training method for the text classification model further includes: Select the first conversation text from the conversation scenario; Select a second conversation text that has a contextual relationship with the first conversation text from the conversation scenario; Combine the first conversation text and the second conversation text into a second sample text, and obtain the tag category corresponding to the second sample text; A third category sample is constructed based on the second sample text and the label category, and a second category dataset is constructed based on the third category sample; The third text classification model is trained based on the second classification dataset to obtain the first text classification model.

9. The training method for the text classification model according to claim 8, characterized in that, The first conversation text belongs to the first conversation stream in the conversation scenario, and the step of selecting a second conversation text that has a contextual relationship with the first conversation text from the conversation scenario includes: Select adjacent conversation texts from the first conversation stream that are adjacent to the first conversation text; Determine the transmission time interval between the first session text and the adjacent session text; When the transmission time interval is less than a time threshold, the adjacent session text is determined to be the second session text that has a contextual relationship with the first session text.

10. The training method for the text classification model according to claim 8, characterized in that, Selecting a second conversation text from the conversation scenario that has a contextual relationship with the first conversation text includes: Perform at least one of the following processes: Select a second conversation text from the conversation scenario that has a contextual relationship with the first conversation text and belongs to the same conversation party as the first conversation text; Select a second conversation text from the conversation scenario that has a contextual relationship with the first conversation text and belongs to a different conversation party than the first conversation text.

11. The training method for the text classification model according to claim 8, characterized in that, The step of training the third text classification model based on the second classification dataset to obtain the first text classification model includes: Construct a mask prediction dataset based on the first session text; The third text classification model is jointly trained based on the mask prediction dataset and the second classification dataset to obtain the first text classification model.

12. The training method for the text classification model according to claim 8, characterized in that, Before training the third text classification model based on the second classification dataset, the training method for the text classification model further includes: Select a third conversation text from the conversation scenario that has no contextual relationship with the first conversation text; The first conversation text and the third conversation text are combined into a third sample text; Based on the second sample text and the context relationship label, construct the next sentence prediction positive sample, wherein the context relationship label is used to indicate that there is a context relationship between the first session text and the second session text in the second sample text; Based on the third sample text and the non-contextual relationship label, construct the next sentence prediction negative sample, where the non-contextual relationship label is used to indicate that the first conversation text in the third sample text has a non-contextual relationship with the third conversation text. Construct a next sentence prediction dataset based on the next sentence prediction positive samples and the next sentence prediction negative samples; The fourth text classification model is trained based on the next sentence prediction dataset to obtain the third text classification model.

13. The training method for the text classification model according to claim 12, characterized in that, The first conversation text belongs to the first conversation stream in the conversation scenario. Selecting a third conversation text from the conversation scenario that has no contextual relationship with the first conversation text includes: Perform at least one of the following processes: Random conversation text is selected from the second conversation stream, and the random conversation text is determined to be the third conversation text that has a non-contextual relationship with the first conversation text. The second conversation stream is different from the first conversation stream. Select non-adjacent session texts from the first session stream that are not adjacent to the first session text, and determine the non-adjacent session texts as the third session texts that have no contextual relationship with the first session text.

14. A text classification method, characterized in that, include: Get the text to be categorized; The text to be classified is input into the second text classification model to obtain the fourth predicted category of the text to be classified. The second text classification model is trained using the text classification model training method according to any one of claims 1 to 13.

15. A training device for a text classification model, characterized in that, include: The sample acquisition module is used to acquire multiple first sample texts; The first prediction module is used to input each first sample text into the first text classification model to obtain the first predicted category of each first sample text and the confidence level of the first predicted category of each first sample text. The sample filtering module is used to filter multiple first sample texts according to the confidence level of their respective first predicted categories to obtain difficult-classification texts. The second prediction module is used to input classification prompt words and the difficult-to-classify text into a pre-trained language model to obtain the second predicted category of the difficult-to-classify text and the first classification reason of the second predicted category of the difficult-to-classify text. The classification prompt words are used to guide the pre-trained language model to perform text classification and classification reason inference. The sample recall module is used to recall similar texts that are similar to the difficult-classified text from a plurality of texts to be recalled, based on the first classification reason of the second predicted category of the difficult-classified text, wherein the texts to be recalled represent the first sample texts other than the difficult-classified text. The dataset construction module is used to construct a first classification sample based on the difficult-to-classify text and the second predicted category, construct a second classification sample based on the similar text and the second predicted category, and construct a first classification dataset based on the first classification sample and the second classification sample. The first training module is used to train the first text classification model based on the first classification dataset to obtain the second text classification model.

16. A text classification device, characterized in that, include: The text acquisition module is used to acquire the text to be classified. The third prediction module is used to input the text to be classified into the second text classification model to obtain the fourth predicted category of the text to be classified. The second text classification model is trained using the text classification model training method according to any one of claims 1 to 13.

17. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program adapted for loading by a processor to execute the training method of the text classification model according to any one of claims 1 to 13, or the text classification method according to claim 14.

18. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the training method of the text classification model according to any one of claims 1 to 13, or the text classification method according to claim 14.

19. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the training method of the text classification model according to any one of claims 1 to 13, or the text classification method according to claim 14.