Lightweight graph convolutional network-based hierarchical text classification method and system
By combining a lightweight graph convolutional network with a BERT encoder, the problems of high computational cost and model complexity in hierarchical text classification are solved, achieving efficient and accurate text classification, which is particularly suitable for text data with complex label systems.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANDONG COMP SCI CENTNAT SUPERCOMP CENT IN JINAN
- Filing Date
- 2024-05-27
- Publication Date
- 2026-06-26
AI Technical Summary
Existing deep learning models suffer from high computational costs, difficulty in capturing hierarchical relationships in hierarchical text classification, and a tendency to overfit, resulting in inadequate classification performance.
We employ a lightweight graph convolutional network, combined with a BERT encoder and attention mechanism. Through a skip link mechanism and a label-aware positive sample generation strategy, we simplify the network structure, reduce computational resource requirements, and enhance the model's generalization ability and classification accuracy for new data.
It significantly improves the efficiency and accuracy of hierarchical text classification, reduces computational costs, enhances the model's ability to operate in resource-constrained environments, and improves adaptability to new data and classification accuracy.
Smart Images

Figure CN118503425B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of machine learning and natural language processing, and in particular to a hierarchical text classification method and system based on lightweight graph convolutional networks. Background Technology
[0002] In the field of Natural Language Processing (NLP), text classification is a fundamental task involving assigning text documents to one or more predefined categories. It is crucial for various applications such as information retrieval, content organization, and sentiment analysis. With the rapid growth of digital information, especially on the internet and in enterprise databases, the demand for efficient and accurate text classification technologies is increasing daily.
[0003] Hierarchical text classification is a special form of text classification that deals with labels that have a hierarchical structure. For example, in news classification, an article can be categorized as "sports," and further subdivided into "basketball" or "football." This form of classification is more complex than traditional flat classification because it requires the model to recognize and understand the hierarchical relationships between labels.
[0004] In recent years, although deep learning technology has been widely applied to text classification tasks and has achieved remarkable results in many scenarios, hierarchical text classification still faces multiple challenges:
[0005] 1. High computational cost: Existing deep learning models, especially graph-based neural networks, typically have many parameters and require a large amount of computational resources, which is particularly evident when dealing with large-scale datasets.
[0006] 2. Model complexity: The complexity of hierarchical text classification requires the model to capture and utilize the hierarchical relationship between labels, but traditional models often struggle to effectively represent this complex structure, resulting in insufficient classification performance.
[0007] 3. Overfitting and generalization issues: Deep neural network models are prone to overfitting on training data, especially in hierarchical structures with very fine labels, and their performance degrades when generalizing to new data. Summary of the Invention
[0008] To address the shortcomings of existing technologies, this invention provides a hierarchical text classification method and system based on lightweight graph convolutional networks. This method and system can handle complex hierarchical label systems with lower computational cost, while improving classification efficiency and accuracy.
[0009] On the one hand, a hierarchical text classification method based on lightweight graph convolutional networks is provided, including:
[0010] Obtain the news text data to be classified, and the corresponding hierarchical tags for the news text data;
[0011] The acquired data is input into the trained text classification network, which outputs the text classification results.
[0012] The trained text classification network employs a first BERT encoder to encode the news text data to be classified, obtaining the text feature representation; a lightweight graph convolutional network is used to extract features from the hierarchical labels, obtaining the label text representation; an attention mechanism layer is used to process the text representation and the label text representation, obtaining the label-aware positive sample; a second BERT encoder is used to process the label-aware positive sample, obtaining the label-aware positive sample representation; and a classifier is used to classify the text feature representation and the label-aware positive sample representation, obtaining the text classification result.
[0013] On the other hand, a hierarchical text classification system based on lightweight graph convolutional networks is provided, including:
[0014] The acquisition module is configured to acquire the news text data to be classified and the corresponding hierarchical tags of the news text data.
[0015] The classification module is configured to: input the acquired data into the trained text classification network and output the text classification result;
[0016] The trained text classification network employs a first BERT encoder to encode the news text data to be classified, obtaining the text feature representation; a lightweight graph convolutional network is used to extract features from the hierarchical labels, obtaining the label text representation; an attention mechanism layer is used to process the text representation and the label text representation, obtaining the label-aware positive sample; a second BERT encoder is used to process the label-aware positive sample, obtaining the label-aware positive sample representation; and a classifier is used to classify the text feature representation and the label-aware positive sample representation, obtaining the text classification result.
[0017] Furthermore, an electronic device is also provided, including:
[0018] Memory, used for non-transitory storage of computer-readable instructions; and
[0019] Processor, for executing the computer-readable instructions,
[0020] When the computer-readable instructions are executed by the processor, they perform the method described in the first aspect above.
[0021] In another aspect, a storage medium is also provided for non-transitory storage of computer-readable instructions, wherein when the non-transitory computer-readable instructions are executed by a computer, the instructions of the method described in the first aspect are executed.
[0022] In another aspect, a computer program product is also provided, including a computer program that, when run on one or more processors, is used to implement the method described in the first aspect above.
[0023] The above technical solution has the following advantages or beneficial effects:
[0024] This invention significantly improves processing speed and reduces computational resource requirements through a lightweight graph convolutional network design. By eliminating weight matrix multiplication operations and nonlinear activation functions, the simplified network structure can quickly process large-scale datasets at a lower computational cost, enabling the model to run efficiently even in resource-constrained environments.
[0025] This invention introduces a skip link mechanism to help prevent the oversmoothing problem common in deep networks, enabling the model to maintain the diversity of features among nodes in each layer, thereby enhancing the model's ability to generalize to new and unseen data. This is particularly important for practical applications, improving the model's adaptability and robustness to variable real-world data.
[0026] The label-aware positive sample generation strategy of this invention further optimizes the model's understanding of each category, especially in multi-label classification scenarios, enabling more accurate text recognition and classification. By finely adjusting the relevance between labels and text, the model can not only capture explicit label features but also discern implicit semantic relationships, thereby improving overall classification accuracy.
[0027] This invention's technical solution is applicable to various scenarios, such as news classification, legal document analysis, and medical literature classification, and is particularly suitable for processing text data with complex structures and rich tag levels. Enterprises and research institutions can utilize this technology to improve the automation level of text data processing, reduce labor costs, and increase decision-making efficiency.
[0028] In summary, the hierarchical text classification method and system based on lightweight graph convolutional networks of the present invention greatly improve the efficiency and accuracy of hierarchical text classification, and provide new solutions and practical tools for processing complex text data. Attached Figure Description
[0029] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.
[0030] Figure 1 This is a flowchart of the method in Example 1;
[0031] Figure 2 This is a schematic diagram of the text classification network structure in Example 1;
[0032] Figure 3 This is a schematic diagram of the internal structure of the lightweight graph convolutional network in Example 1. Detailed Implementation
[0033] It should be noted that the following detailed descriptions are exemplary and intended to provide further illustration of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0034] Example 1
[0035] This embodiment provides a hierarchical text classification method based on lightweight graph convolutional networks;
[0036] like Figure 1 As shown, a hierarchical text classification method based on lightweight graph convolutional networks includes:
[0037] S101: Obtain the news text data to be classified, and the corresponding hierarchical tags of the news text data;
[0038] S102: Input the acquired data into the trained text classification network and output the text classification result;
[0039] The trained text classification network employs a first BERT encoder to encode the news text data to be classified, obtaining the text feature representation; a lightweight graph convolutional network is used to extract features from the hierarchical labels, obtaining the label text representation; an attention mechanism layer is used to process the text representation and the label text representation, obtaining the label-aware positive sample; a second BERT encoder is used to process the label-aware positive sample, obtaining the label-aware positive sample representation; and a classifier is used to classify the text feature representation and the label-aware positive sample representation, obtaining the text classification result.
[0040] Further, in step S101: obtain the news text data to be classified and the corresponding hierarchical tags of the news text data, and use a crawling algorithm to crawl the news text data and the hierarchical tags from the Internet.
[0041] Furthermore, the hierarchical tags include: primary tags and secondary tags for the news text; the content represented by the secondary tags belongs to the content represented by the primary tags. For example, the primary tag is "sports," and the secondary tags are "football" or "basketball."
[0042] Furthermore, such as Figure 2As shown, step S102: The acquired data is input into the trained text classification network, and the text classification result is output. The trained text classification network includes:
[0043] First BERT encoder, lightweight graph convolutional network, attention mechanism layer, second BERT encoder and classifier;
[0044] In this system, the output of the first BERT encoder and the output of the lightweight graph convolutional network are both connected to the input of the attention mechanism layer. The output of the attention mechanism layer serves as the input of the second BERT encoder. The outputs of both the first and second BERT encoders are connected to the input of the classifier.
[0045] Furthermore, the first BERT encoder is used to encode the news text data to be classified, resulting in a text feature representation, including:
[0046] The input text sequence is encoded using the pre-trained first BERT encoder, and the text sequence is represented as:
[0047] x = {[CLS], x1, x2, ..., x} n-2 ,[SEP]};
[0048] Where n represents the length of the sequence, and [CLS] and [SEP] are markers used in the first BERT encoder to indicate the start and end of the sequence.
[0049] The first BERT encoder converts each tag in the sequence into a corresponding hidden state vector, thus obtaining the hidden representation of the entire sequence:
[0050] H = BERT(x);
[0051] The hidden state h marked with "CLS" at the beginning of the sequence [CLS] This is used to capture the global semantic information of the entire input sequence. Therefore, the hidden state h of the first label [CLS] is used. [CLS] As a feature vector representing the entire sequence:
[0052] h x =h [CLS] .
[0053] Furthermore, such as Figure 3 As shown, the lightweight graph convolutional network has the following structure:
[0054] Input layer: The input includes node features and adjacency matrix. Node features represent the nodes in the graph, and the adjacency matrix represents the connection relationships between nodes.
[0055] Lightweight graph convolutional networks include: multiple parallel layers (e.g., H1, H2, H3); each layer contains multiple jump connections (e.g., jump1, jump2, jump3), each jump connection is used to propagate node features to different distances, capture node information in different ranges, and generate a feature representation;
[0056] Finally, the feature representations of each skip connection are aggregated by summation to obtain the aggregated feature representations.
[0057] Furthermore, a lightweight graph convolutional network is used to extract features from the hierarchical labels to obtain the label text representations, including:
[0058] Treat both first-level and second-level labels in the tag as nodes in a graph structure. If there is a relationship between two labels, then there is an edge connecting the corresponding nodes; otherwise, there is no edge connecting the corresponding nodes.
[0059] Define the original adjacency matrix A to represent the connection relationships between nodes in the graph structure. If there is an edge between node i and node j, define A as follows: ij =1, otherwise A ij =0.
[0060] To ensure that each node can receive its own information and to maintain the influence of its own characteristics during information propagation, the identity matrix I is added to A to form a new adjacency matrix.
[0061]
[0062] Normalization is performed using the inverse square root of the degree matrix D to address the scaling issue during information aggregation. The degree matrix D is a diagonal matrix, where D... ij It equals the degree of node i (including self-loops). This processing ensures that all nodes are treated equally during information transmission, preventing highly efficient nodes from excessively influencing information aggregation. The normalized adjacency matrix S is calculated as follows:
[0063]
[0064] in, It is the normalized form of the degree matrix D.
[0065] Lightweight graph convolutional networks simplify the weight matrix multiplication and non-linear activation functions in traditional graph convolutional networks, directly using the normalized adjacency matrix S multiplied by the node feature matrix Y. (k-1) To update the feature representation and obtain the next layer feature representation Y (k) :
[0066] Y (k) =SY(k-1) ;
[0067] Among them, Y (k-1) It is the output feature matrix of the previous layer's graph convolution, containing the node features after processing through the first k-1 layers, Y (0) This represents the initial node feature matrix, which is the original feature of each node in the graph before any graph convolution operation is performed.
[0068] Furthermore, a lightweight graph convolutional network is used to extract features from the hierarchical labels to obtain the label text representations, including:
[0069] A skip link mechanism is introduced to overcome the oversmoothing problem in deep graph convolutional networks and enhance the model's feature learning ability. This mechanism allows information to be passed directly from the input layer or any intermediate layer to the output layer to preserve sufficient feature diversity. Specifically, the skip link mechanism adds the sum of powers of multiple adjacency matrices to the product of the input feature matrix to retain more original feature information, as expressed in the following expression:
[0070] Y (k) =SY (k-1) +S 2 Y (k-1) +S 3 Y (k-1) =(S+S) 2 +S 3 )Y (k-1) ;
[0071] Furthermore, an attention mechanism layer is employed to process the text representation of the main text and the text representation of the labels, resulting in label-aware positive samples, including:
[0072] Given an input text sequence x = {[CLS], x1, x2, ..., x...} n-2 The first BERT encoder transforms the input token sequence x into a series of embedding vectors {e1, e2, ..., e}. n}:
[0073] {e1, e2, ..., e n} = BERT(x);
[0074] First, the attention weights between the label embedding and the label features are calculated to determine the importance of the label on the label, providing guidance for subsequent label-aware positive sample generation.
[0075]
[0076] Among them, e i W represents the embedding vector of the i-th token in the input sequence. Qq represents the query vector weight matrix. i y represents the query vector obtained after weight calculation of the token embedding. j W represents the embedding vector of the label features. K The weight matrix represents the key vector, k. j A represents the key vector obtained after weight calculation of the label embedding. ij d represents the attention weight between the i-th token and the j-th label. h Represents the vector dimension. Two weight matrices W. Q and W K The query and keyword are mapped to a common space, and the dimensions of the two weight matrices are d. h ×d h .
[0077] Use the Gumbel-Softmax function to apply A ij To address this, the Gumbel-Softmax function incorporates Gumbel noise to simulate the process of sampling from a discrete distribution, while maintaining the differentiability of the operation:
[0078] P ij =gumbel_softmax(A i1 A i2 A ik ) j ;
[0079] Among them, P ij This indicates that after the Gumbel-Softmax function operation, token e i Classified to tag y j The probability of A. i1 A i2 A ik It is token e i The raw attention score or similarity score for all possible labels, where j is the label index.
[0080] Next, for multi-label classification problems, a token is associated with multiple labels. The probabilities of the token with all its real labels are summed to obtain the overall probability of the token with respect to the set of real labels, as shown in the following formula:
[0081] P i =∑ j∈y P ij ;
[0082] Among them, P i Indicates a given token e i For the overall probability of its true label set y, by traversing token e i The real tag set y, and the token e iBelonging to each tag y j The probability P ij Adding them together, we get j as the index of the tag, y as the actual tag set of the token, and P. ij It is token e i Classified to tag y j The probability of.
[0083] By adding the probabilities, we obtain token e. i A comprehensive measure of relevance to the true label set y is used to make more nuanced decisions in subsequent processing, determining which tokens to select and ignore based on the obtained probabilities; this decision is based on the following rules:
[0084]
[0085] If token e i The overall probability P i If the threshold γ is exceeded, the token will be retained; otherwise, it will be replaced with token 0, which has an all-zero embedding used to maintain the position information of the key token. These are generated label-aware positive samples.
[0086] Then the generated label-aware positive samples The feed is sent to the second BERT encoder for further processing and learning, denoted as:
[0087]
[0088] in, This represents the feature representation of a label-aware positive sample.
[0089] Specifically, the entire process places particular emphasis on focusing on tokens that are highly relevant to the tags during text generation. This method not only uses tag information to guide the overall text generation but also carefully considers which specific text parts are most closely related to particular tags, thereby optimizing and emphasizing these parts during the generation process.
[0090] Furthermore, the classifier is used to classify the text feature representation and the label-aware positive sample representation to obtain the text classification result, including:
[0091] The purpose of using a linear transformation layer as a classifier is to map the high-dimensional features of the output text of the first and second BERT encoders to an output space that is related to the number of labels.
[0092] Further, in step S102: the acquired data is input into the trained text classification network, and the text classification result is output. The training process of the trained text classification network includes:
[0093] Construct a training set, which consists of news text with known news text tags;
[0094] The training set is input into the text classification network to train the network. Training is stopped when the total loss function value of the network no longer decreases, or when the number of iterations reaches the set number, and the trained text classification network is obtained.
[0095] Furthermore, the total loss function of the network uses the binary cross-entropy loss function (BCELoss) to measure the difference between the model output and the true label.
[0096] For text i and label j, the loss calculation formula is as follows:
[0097]
[0098]
[0099] Among them, y ij p represents the actual value of label j on text i. ij This represents the probability that text i belongs to label j, as predicted by the model. In this way, the loss function can penalize cases where the model's predictions are inconsistent with the true labels.
[0100] To enhance the model's ability to understand text content highly relevant to labels, the generated label-aware positive samples are processed using the same predefined classifier, and the same binary cross-entropy loss function is applied to calculate the result.
[0101]
[0102]
[0103] The final objective function is the classification loss LC for the original data and the classification loss for the label-aware positive samples. The combination of can be formally represented as:
[0104]
[0105] Here, λ is an adjustable weight coefficient. This combined loss function is designed to optimize the model's understanding of the original text and its related text content, thereby achieving higher accuracy and robustness in multi-label classification tasks.
[0106] The model is implemented in the PyTorch framework, and the optimizer chosen is Adam. The Adam optimizer has the characteristic of adaptive learning rate adjustment, which can automatically adjust the learning rate according to different gradient conditions, which helps to improve the training efficiency and convergence speed of the model.
[0107] To prevent overfitting, training is terminated early if performance does not improve after 6 consecutive epochs.
[0108] For the threshold parameter γ and loss weight λ in the graph encoder, a grid search is used to select them on the development set.
[0109] The model is trained using the training set. At the end of each training epoch, the model's performance is evaluated on the development set using macro-F1 score and micro-F1 score as evaluation metrics.
[0110] For example, the publicly available Web of Science (WOS) dataset is used, which includes research paper abstracts from various disciplines. A pre-trained BERT encoder is used to perform deep text encoding on the paper abstracts to extract key semantic features. The maximum sequence length of BERT is set to 512, and the batch size is 16. A lightweight graphical convolutional network is applied for hierarchical processing, with a threshold γ of 0.03 on the WOS dataset. The PyTorch framework is used, with Adam as the optimizer, an initial learning rate of 3e-5, and a loss weight λ of 0.3. To prevent overfitting, training is terminated early if performance does not improve after 6 consecutive epochs. The final model achieves a macro F1 score of 87.47 and a micro F1 score of 81.45 on the WOS dataset. Compared to traditional BERT classification models, our model improves the F1 score by approximately 1%, demonstrating its effectiveness in handling hierarchical text classification. Simultaneously, our model shows a significant improvement in data processing speed. In tasks involving the same amount of data, it is 20% faster than other graph encoders such as GCN and GAT, making the model more efficient when processing large-scale datasets.
[0111] Furthermore, the RCV1-V2 news classification corpus was used, containing a wide range of news articles covering multiple topics and events, suitable for complex hierarchical classification. A pre-trained BERT encoder was used to perform deep text encoding on the news articles, extracting key semantic features. The maximum sequence length of BERT was set to 512, and the batch size to 16 to accommodate the length and complexity of news texts. A lightweight graph convolutional network was applied to handle the hierarchical structure, with particular attention paid to setting a threshold γ of 0.05 suitable for news data to ensure the model can effectively handle fine-grained differences in news categories. The model was trained using the PyTorch framework, with Adam selected as the optimizer. The initial learning rate was set to 3e-5, and the loss weight λ was set to 0.4 on both datasets. Training was terminated early if performance did not improve after 6 consecutive epochs to prevent overfitting. On the RCV1-V2 dataset, the model achieved a macro F1 score of 87.49 and a micro F1 score of 67.89, demonstrating excellent classification ability. Compared to the traditional BERT classification model, our model improves the F1 score by approximately 2%, validating the effectiveness of lightweight graph convolutional networks in hierarchical classification of news text. It is also about 25% faster than traditional models, enabling the model to process and classify large volumes of news streams more quickly, making it particularly suitable for real-time news classification and monitoring.
[0112] This invention focuses on improving the efficiency and accuracy of processing complex hierarchical labeling systems, addressing the high computational cost, model complexity, and overfitting / generalization issues in hierarchical text classification. First, a pre-trained BERT encoder is used for deep text encoding. Second, a lightweight graph convolutional network (GCNN) with weight matrix multiplication and non-linear activation functions removed is employed to handle the hierarchical structure of the text. Then, an integrated skip link mechanism enhances the model's information flow and feature learning capabilities within the deep network. Furthermore, a label-aware positive sample generation strategy enhances the model's accuracy in identifying and classifying specific categories. Finally, a linear classification layer is added, employing a binary cross-entropy loss function, comprehensively considering both the original classification loss and the positive sample classification loss to optimize model performance. Overall, this invention not only significantly improves classification processing speed but also effectively enhances the model's adaptability to new and unseen data and overall classification accuracy by introducing skip links and positive sample generation strategies. This technical solution is applicable to various applications requiring efficient text classification, such as news classification, legal document analysis, and medical literature classification, and is particularly suitable for processing large-scale and structurally complex text datasets.
[0113] Example 2
[0114] This embodiment provides a hierarchical text classification system based on lightweight graph convolutional networks, including:
[0115] The acquisition module is configured to acquire the news text data to be classified and the corresponding hierarchical tags of the news text data.
[0116] The classification module is configured to: input the acquired data into the trained text classification network and output the text classification result;
[0117] The trained text classification network employs a first BERT encoder to encode the news text data to be classified, obtaining the text feature representation; a lightweight graph convolutional network is used to extract features from the hierarchical labels, obtaining the label text representation; an attention mechanism layer is used to process the text representation and the label text representation, obtaining the label-aware positive sample; a second BERT encoder is used to process the label-aware positive sample, obtaining the label-aware positive sample representation; and a classifier is used to classify the text feature representation and the label-aware positive sample representation, obtaining the text classification result.
[0118] It should be noted that the acquisition module and classification module described above correspond to steps S101 to S102 in Embodiment 1. The examples and application scenarios implemented by these modules and their corresponding steps are the same, but they are not limited to the content disclosed in Embodiment 1. It should also be noted that these modules, as part of the system, can be executed in a computer system such as a set of computer-executable instructions.
[0119] The descriptions of each embodiment in the above embodiments have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0120] The proposed system can be implemented in other ways. For example, the system embodiments described above are merely illustrative, and the division of modules described above is only a logical functional division. In actual implementation, there may be other division methods. For example, multiple modules may be combined or integrated into another system, or some features may be ignored or not executed.
[0121] Example 3
[0122] This embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein, the processor is connected to the memory, and the one or more computer programs are stored in the memory. When the electronic device is running, the processor executes the one or more computer programs stored in the memory to cause the electronic device to perform the method described in Embodiment 1.
[0123] It should be understood that in this embodiment, the processor can be a central processing unit (CPU), or it can be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor, etc.
[0124] Memory may include read-only memory and random access memory, and provides instructions and data to the processor. A portion of memory may also include non-volatile random access memory. For example, memory may also store information about the device type.
[0125] In the implementation process, each step of the above method can be completed by the integrated logic circuits in the processor hardware or by software instructions.
[0126] The method in Embodiment 1 can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules within the processor. The software modules can reside in readily available storage media in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory; the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method. To avoid repetition, a detailed description is not provided here.
[0127] Those skilled in the art will recognize that the units and algorithm steps described in connection with the various examples of this embodiment can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this invention.
[0128] Example 4
[0129] This embodiment also provides a computer-readable storage medium for storing computer instructions, which, when executed by a processor, complete the method described in Embodiment 1.
[0130] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A hierarchical text classification method based on lightweight graph convolutional networks, characterized by: include: Obtain the news text data to be classified, and the corresponding hierarchical tags for the news text data; The acquired data is input into the trained text classification network, which outputs the text classification results. The trained text classification network includes: The system consists of a first BERT encoder, a lightweight graph convolutional network, an attention mechanism layer, a second BERT encoder, and a classifier. The outputs of the first BERT encoder and the lightweight graph convolutional network are connected to the input of the attention mechanism layer. The output of the attention mechanism layer serves as the input of the second BERT encoder. The outputs of both the first and second BERT encoders are connected to the input of the classifier. The trained text classification network employs a first BERT encoder to encode the news text data to be classified, obtaining the text feature representation. A lightweight graph convolutional network is used to extract features from the hierarchical labels, obtaining the label text representation, including: Treating both first-level and second-level labels as nodes in a graph structure, if two labels have a membership relationship, then there is an edge connecting the corresponding nodes; otherwise, there is no edge connecting the corresponding nodes. Define the original adjacency matrix A to represent the connection relationships between nodes in the graph structure. If there is an edge between node i and node j, define... ,otherwise ; Add the identity matrix I to A to form a new adjacency matrix. : ; Normalization is performed using the inverse square root of the degree matrix D to address the scaling issue during information aggregation. The degree matrix D is a diagonal matrix, where... The degree of node i is equal to the normalized adjacency matrix S, which is calculated as follows: ; in, It is the normalized form of the degree matrix D; Lightweight graph convolutional networks introduce a skip link mechanism, adding the sum of powers of multiple adjacency matrices to the product of the input feature matrix, in order to retain more original feature information. The expression is: in, It is the output feature matrix of the previous layer's graph convolution, containing the features processed by the first k layers. Node features after layer 1 processing This represents the initial node feature matrix, which is the original feature of each node in the graph before any graph convolution operation is performed. An attention mechanism layer is used to process the text representation of the main text and the text representation of the label to obtain label-aware positive samples, including: Given an input text sequence The first BERT encoder will input the token sequence. Transform into a series of embedding vectors : ; ; ; ; in, Let represent the embedding vector of the i-th token in the input sequence. This represents the query vector weight matrix. An embedding vector representing the label features. The weight matrix represents the key vector. express , This represents the attention weight between the i-th token and the j-th label. Represents the vector dimension; two weight matrices and Map queries and keywords to a common space; the dimensions of the two weight matrices are... × ; Use the Gumbel-Softmax function to... Processing: ; in, This indicates that after the Gumbel-Softmax function operation, the token... Categorized into tags The probability of; It is a token The raw attention score or similarity score for all possible labels. j It is a tag index; Next, for multi-label classification problems, a token is associated with multiple labels. The probabilities of the token with all its real labels are summed to obtain the overall probability of the token with respect to the set of real labels, as shown in the following formula: ; in, Indicates a given token For the overall probability of its true label set y, by traversing the tokens The real tag set y, and the token Belongs to each tag probability Adding them together, we get j as the index of the tag and y as the actual tag set of the token. It is a token Categorized into tags The probability of; The token is obtained by adding probabilities. A comprehensive measure of relevance to the true label set y, used to make decisions in subsequent processing, based on the following rules: ; If token overall probability If the value exceeds a predefined threshold γ, then the token is retained; otherwise, the token is discarded. Replaced with token 0, which has an all-zero embedding representation used to maintain the location information of the key token. These are generated label-aware positive samples; A second BERT encoder is used to process the label-aware positive samples to obtain the label-aware positive sample representation; a classifier is used to classify the text feature representation and the label-aware positive sample representation to obtain the text classification result.
2. The hierarchical text classification method based on lightweight graph convolutional networks as described in claim 1, characterized in that, The method employs a classifier to classify the text feature representation and the label-aware positive sample representation, obtaining text classification results, including: The purpose of using a linear transformation layer as a classifier is to map the high-dimensional features of the output text of the first and second BERT encoders to an output space that is related to the number of labels.
3. A hierarchical text classification system based on lightweight graph convolutional networks, employing the hierarchical text classification method based on lightweight graph convolutional networks as described in any one of claims 1-2, characterized in that, include: The acquisition module is configured to acquire the news text data to be classified and the corresponding hierarchical tags of the news text data. The classification module is configured to: input the acquired data into the trained text classification network and output the text classification result; The trained text classification network employs a first BERT encoder to encode the news text data to be classified, obtaining the text feature representation; a lightweight graph convolutional network is used to extract features from the hierarchical labels, obtaining the label text representation; an attention mechanism layer is used to process the text representation and the label text representation, obtaining the label-aware positive sample; a second BERT encoder is used to process the label-aware positive sample, obtaining the label-aware positive sample representation; and a classifier is used to classify the text feature representation and the label-aware positive sample representation, obtaining the text classification result.
4. An electronic device, characterized in that it comprises: Memory is used to store computer-readable instructions in a non-transitory manner. as well as Processor, for executing the computer-readable instructions, When the computer-readable instructions are executed by the processor, they perform the method described in any one of claims 1-2.
5. A storage medium characterized by being non-transitory. The system stores computer-readable instructions, wherein, when the non-transitory computer-readable instructions are executed by a computer, the instructions of the method according to any one of claims 1-2 are executed.