A Chinese sentence classification method that integrates dependency syntax and global co-occurrence information
By integrating dependency syntax and global co-occurrence information, this method addresses the shortcomings in modeling syntactic structure and global statistical knowledge in existing Chinese sentence classification technologies, achieving a more efficient Chinese sentence classification effect.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XIAN UNIV OF TECH
- Filing Date
- 2026-03-18
- Publication Date
- 2026-06-30
AI Technical Summary
Existing Chinese sentence classification methods are ineffective in handling ambiguous word segmentation boundaries and lexical constraints under specific part-of-speech conditions, and are unable to fully model syntactic structure information and global statistical knowledge.
A Chinese sentence classification method that integrates dependency syntax and global co-occurrence information is proposed. By using a pre-trained language model, sub-word-level features are aggregated into word-level contextual semantic features. Dependency relation graphs and word-part-of-speech co-occurrence graphs are constructed. Dual graph modeling is performed using a relation graph attention network and a graph convolutional network. Feature fusion is achieved through a gating mechanism and a Transformer block.
It effectively integrates the semantic information of subwords, accurately captures the internal dependency relationships and global semantic associations of sentences, improves the accuracy of Chinese sentence classification and the dimension of feature representation, and realizes the deep organic integration of multi-source features.
Smart Images

Figure CN122309738A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of natural language processing technology, specifically relating to a Chinese sentence classification method that integrates dependency syntax and global co-occurrence information. Background Technology
[0002] The rapid development of the internet and the widespread adoption of smart devices have fueled an explosive growth in textual information, making the extraction of valuable information from massive amounts of text a key research focus in the field of natural language processing. Text classification, as a core technology, involves matching predefined labels to text and primarily comprises two stages: feature extraction and classification. However, short text classification often faces the challenge of a sparse feature space due to the short character segments. Existing text classification research largely focuses on topic classification and sentiment analysis. Chinese single-sentence sentence structure classification, as a special type of text classification task, focuses on identifying specific sentence types such as subject-predicate-predicate sentences, pivotal sentences, and existential sentences. It can provide structured language information for upper-level semantic understanding and various application systems, possessing significant application value; however, related research still faces many technical shortcomings.
[0003] Current text classification methods are mostly based on deep learning. The emergence and development of pre-trained language models have significantly improved text classification accuracy. Models such as BERT, ELECTRA, and DeBERTaV3 can learn rich contextual semantic representations through large-scale corpora. Existing research often combines these pre-trained models with RNNs, CNNs, or introduces graph neural networks to model the graph structure of sentences and extract node dependencies, thereby improving the ability to mine text features. However, when these methods are applied to Chinese sentence structure classification tasks, there are still many problems that need to be solved, and they do not match the actual needs of sentence structure classification. Specifically, these problems include:
[0004] On the one hand, existing methods mostly use the sub-word-level context output by pre-trained models as features without considering the fuzzy boundaries of Chinese word segmentation; on the other hand, existing methods generally model dependency syntactic information or lexical co-occurrence information in a single way without considering lexical constraints under specific part-of-speech conditions, and at the same time, it is difficult to fully model syntactic structure information and global statistical knowledge. Summary of the Invention
[0005] The purpose of this invention is to provide a Chinese sentence classification method that integrates dependency syntax and global co-occurrence information, which solves the problem that existing technologies mainly rely on pre-trained language model context representations and are difficult to fully model syntactic structure information and global statistical knowledge.
[0006] The technical solution adopted in this invention is a Chinese sentence structure classification method that integrates dependency syntax and global co-occurrence information. First, text data is collected for preprocessing and syntactic analysis. Then, a pre-trained language model aggregates sub-word-level features into word-level contextual semantic features. Based on the syntactic analysis results, a dependency graph and a word-part-of-speech co-occurrence graph are constructed. Word-level features are used as node features and input into a graph attention network and a graph convolutional network respectively to complete dual-graph modeling, obtaining syntactic enhancement features and global co-occurrence enhancement features. Subsequently, an adaptive weighted fusion of the two types of enhancement features is performed through a gating mechanism. The fused features are concatenated with the original word-level features and input into a Transformer block to complete global self-attention interaction. Finally, pooling and linear mapping are performed on the interacted features, and the label with the highest probability is selected to achieve Chinese sentence structure classification.
[0007] The invention is further characterized in that, Includes the following steps: Step 1: Collect text data and preprocess it to build a training set. At the same time, use natural language processing tools to perform word segmentation, part-of-speech tagging and dependency parsing on the preprocessed text to obtain processed text data containing word segmentation results, part-of-speech tags and dependency relation types. Step 2: Based on the processed text data obtained in Step 1, input the text into the pre-trained language model to obtain sub-word-level context vector representations, then establish word-sub-word index mapping relationships based on the word segmentation results, and obtain word-level features through attention-weighted aggregation, which serve as input features for subsequent graph modeling; Step 3: Based on the word segmentation, part-of-speech tagging, and dependency parsing results from Step 1, construct a dependency graph and a word-part-of-speech co-occurrence graph, respectively. Step 4: The word-level features obtained in Step 2 are used as node features and input into the dependency graph and word-part-of-speech co-occurrence graph constructed in Step 3. The Relationship Graph Attention Network (Rel-GAT) is used to model the inter-word dependency relationships and type information within the text to obtain the word-level representation enhanced with dependency information. At the same time, the Global Co-occurrence Word-Part-of-Speech Statistics are injected hierarchically into the word-level representation through the Graph Convolutional Network (GCN) to obtain the word-level representation enhanced with global statistics. Step 5: Based on the two types of enhanced word-level representations obtained in Step 4, the syntactic enhanced representation and the global statistical enhanced representation are adaptively weighted through a gating mechanism to obtain the fused features; Step 6: Concatenate the fused features obtained in Step 5 with the original word-level features obtained in Step 2, input them into the Transformer block, and complete the global interaction modeling of multi-source features through the self-attention mechanism to obtain the multi-source fused feature vector. Step 7: Perform masked average pooling on the feature vectors obtained in Step 6 to obtain sentence-level semantic vectors. Map these vectors to the sentence label space through a dropout layer and linear transformation. Calculate the probability distribution of each label and take the label with the highest probability as the final sentence classification result.
[0008] In step 1, the Chinese text data was collected from the Chinese Treebank, academic literature database, and legal database. Preprocessing included text cleaning, long sentence segmentation, stop word removal, and sentence category labeling. After preprocessing, the data format was unified into JSON format. Each sample contained the original sentence, sentence category label, word segmentation result, part-of-speech tag, dependency syntax core word index, and dependency relationship type field.
[0009] In step 2, the pre-trained language model can be any one of BERT, ELECTRA, or DeBERTaV3; The attention weighted aggregation process is as follows: First, calculate the attention weight of each sub-word to its corresponding word, normalize the weight value using the softmax function, and then sum the sub-word vectors according to the normalized attention weights using matrix multiplication to obtain the word-level contextual semantic representation of the corresponding word.
[0010] In step 3, a dependency graph and a word-part-of-speech co-occurrence graph are constructed. First, a dependency graph is constructed for each text, using the word segmentation result as a node and the dependency relationship type as a directed edge. Specifically, the word segmentation result of each text is used as the graph node, meaning each node corresponds to one word, and the dependency relationship type is used as the directed edge to construct a dependency graph for each text. , where the set of nodes edge set , where n is the length of the word sequence. Represents a node and Dependency relationship types; Based on the co-occurrence frequency of word-part-of-speech units statistically analyzed from the corpus, a weighted undirected co-occurrence graph of word-part-of-speech units is constructed after stratifying by part-of-speech function. Specifically, parts of speech are first divided into three categories: functional words, predicate words, and content words. Word-part-of-speech units are terms under specific part-of-speech conditions. Then, NPMI values are introduced as weights for undirected edges to quantify the co-occurrence association strength between two word-part-of-speech units. An undirected edge is established only when two word-part-of-speech units co-occur in the context window and belong to the same functional word. The final constructed word-part-of-speech co-occurrence graph is as follows: Node set The edge set for all word-part-of-speech units in the global corpus. For a set of undirected edges with NPMI weights, the NPMI value is calculated as follows: ; In the formula, and Let x and y represent the probabilities of terms x and y appearing in all documents under specific part-of-speech conditions. This represents the probability that terms x and y appear simultaneously in all documents.
[0011] In step 4, when modeling the dependency graph, the dependency indexes are first converted into dependency embedding vectors. Then, a multi-head attention mechanism is used to model different dependency types separately. The attention weights are determined by the dependency type. The output features of each attention head are concatenated and linearly mapped. Finally, through residual connections and normalization, the word-level representation with enhanced dependency information is output. ; When modeling the word-part-of-speech co-occurrence graph, the word-level features are first mapped to the word-part-of-speech node space to obtain initial features. Then, using multiple undirected weighted adjacency matrices as structural constraints, message propagation and feature aggregation are performed in the word-part-of-speech node space. Finally, the updated node representations are mapped back to the word-level positions of the original sentence, and the globally statistically enhanced word-level representations are output. .
[0012] The feature fusion process in step 5 specifically involves adaptively fusing dependent information to enhance features through a gating mechanism. Enhanced features with global co-occurrence The calculation expression is shown below: ; ; In the formula, All are gating parameters. For element-wise multiplication, This is the gating weight.
[0013] In step 7, after performing a linear transformation on the sentence-level semantic vectors, the probability distribution of each sentence type category is calculated using the Softmax function. The formula for calculating the probability distribution is shown below: ; In the formula, These are all parameters of the fully connected layer. This is a sentence structure category label.
[0014] The beneficial effects of this invention are: (1) The Chinese sentence classification method of the present invention integrates dependency syntax and global co-occurrence information. By using attention weighting, the sub-word-level context features output by the pre-trained model are aggregated into word-level features, and a precise “word-sub-word” index mapping relationship is established. This effectively integrates the semantic information of sub-words and avoids the feature representation bias caused by directly using sub-word-level features. This lays a reliable word-level feature foundation for subsequent graph modeling and feature mining.
[0015] Meanwhile, by employing a dual-graph neural network to collaboratively mine multi-dimensional features, the shortcomings of existing single-modeling techniques are overcome. Modeling the dependency graph using Rel-GAT accurately captures the dependency relationship types between words within a sentence and local syntactic structure constraints. Modeling the "word_part-of-speech" co-occurrence graph hierarchically by part-of-speech function using GCN, and introducing NPMI to quantify the strength of lexical co-occurrence associations under part-of-speech constraints, fully mines cross-sentence and cross-document semantic associations at the global corpus level. This achieves comprehensive modeling of syntactic structure information and global statistical knowledge, while effectively mining lexical association patterns under specific part-of-speech conditions, enriching the dimensions of feature representation.
[0016] (2) This invention achieves deep organic fusion of multi-source features by combining gating dynamic fusion with Transformer self-attention multi-source feature fusion strategy. The gating mechanism can adaptively adjust the fusion ratio of syntactic enhancement features and global co-occurrence enhancement features according to feature characteristics to ensure the adaptability of feature fusion; the Transformer block performs global modeling of fused features and original word-level features through self-attention mechanism, captures the global dependency relationship between multi-source features, gives full play to the complementary role of pre-trained features, syntactic features and global co-occurrence features, and improves the semantic representation ability of the model. Attached Figure Description
[0017] Figure 1 This is a schematic diagram of the flowchart of the Chinese sentence classification method that integrates dependency syntax and global co-occurrence information according to the present invention; Figure 2 This is a comparative schematic diagram in Embodiment 6 of the present invention. Detailed Implementation
[0018] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0019] Example 1 This invention provides a Chinese sentence structure classification method that integrates dependency syntax and global co-occurrence information, such as... Figure 1As shown, text data is first collected for preprocessing and syntactic analysis. Then, a pre-trained language model is used to aggregate sub-word-level features into word-level contextual semantic features. Based on the syntactic analysis results, a dependency graph and a word-part-of-speech co-occurrence graph are constructed. Word-level features are used as node features and input into a graph attention network and a graph convolutional network respectively to complete dual-graph modeling, resulting in syntactic enhancement features and global co-occurrence enhancement features. Subsequently, the two types of enhancement features are adaptively weighted and fused through a gating mechanism. The fused features are concatenated with the original word-level features and input into a Transformer block to complete global self-attention interaction. Finally, the interacting features are pooled and linearly mapped, and the label with the highest probability is used to achieve Chinese sentence classification.
[0020] Specifically, the following steps are included: Step 1: Collect text data and preprocess it to build a training set. At the same time, use natural language processing tools (such as Stanza, LTP, etc.) to perform word segmentation, part-of-speech tagging and dependency parsing on the preprocessed text to obtain processed text data containing word segmentation results, part-of-speech tags and dependency relation types. Step 2: Based on the processed text data obtained in Step 1, input the text into the pre-trained language model to obtain sub-word-level context vector representations, then establish word-sub-word index mapping relationships based on the word segmentation results, and obtain word-level features through attention-weighted aggregation, which serve as input features for subsequent graph modeling; Step 3 consists of two parts: dependency graph construction and word-part-of-speech co-occurrence graph construction. Based on the word segmentation, part-of-speech tagging, and dependency parsing results from Step 1, a dependency graph is constructed for each input text, with each word segmentation result as a node and each node corresponding to a word. Dependency relationship types are used as directed edges. An undirected co-occurrence graph of word-part-of-speech (word_pos) is constructed based on the corpus. The word_pos unit is divided into three functional layers according to part-of-speech function: functional layer, predicate layer, and content layer. Each node represents a word under a specific part-of-speech condition, i.e., "word_pos". If two nodes co-occur in the context window and belong to the same functional layer, an undirected edge is established. Step 4: The word-level features obtained in Step 2 are used as node features and input into the dependency graph and word-part-of-speech co-occurrence graph constructed in Step 3. The Relationship Graph Attention Network (Rel-GAT) is used to model the inter-word dependency relationships and type information within the text to obtain the word-level representation enhanced with dependency information. At the same time, the Global Co-occurrence Word-Part-of-Speech Statistics are injected hierarchically into the word-level representation through the Graph Convolutional Network (GCN) to obtain the word-level representation enhanced with global statistics. Step 5: Based on the two types of enhanced word-level representations obtained in Step 4, the syntactic enhanced representation and the global statistical enhanced representation are adaptively weighted through a gating mechanism to obtain the fused features; Step 6: Concatenate the fused features obtained in Step 5 with the original word-level features obtained in Step 2, input them into the Transformer block, and complete the global interaction modeling of multi-source features through the self-attention mechanism to obtain the multi-source fused feature vector. Step 7: Perform masked average pooling on the feature vectors obtained in Step 6 to obtain sentence-level semantic vectors. Map these vectors to the sentence label space through a dropout layer and linear transformation. Calculate the probability distribution of each label and take the label with the highest probability as the final sentence classification result.
[0021] Example 2 Based on the above embodiment 1, in step 1 of this invention, the text data is collected from multiple sources such as Chinese open source datasets (e.g., Chinese Treebank CTB, CNKI academic literature database, national laws and regulations database, etc.) to ensure that the corpus covers different fields and different sentence types. Preprocessing includes text cleaning, long sentence segmentation, stop word removal, and sentence category labeling. After preprocessing, the data format is unified to JSON. Each sample contains the original sentence, sentence category label, word segmentation result, part-of-speech tag, dependency syntax core word index, and dependency relation type field. The above analysis was completed using Stanford's official Python package for natural language processing.
[0022] Example 3 In this embodiment, based on the above embodiment 1, the pre-trained language model in step 2 is any one of BERT, ELECTRA, or DeBERTaV3; The attention-weighted aggregation process is as follows: First, establish the "word-sub-word" index mapping relationship based on the word segmentation results to ensure that each complete word corresponds to a set of sub-word indexes. Second, calculate the attention weight of each sub-word to its corresponding word, normalize the weight value through the softmax function, and calculate the expression as shown in the following formula (1): (1); In the formula, For the first The word The semantic vector of each sub-word For attention weight parameters, This is the scaling factor; Next, the sub-word vectors are weighted and aggregated according to the normalized attention weights through matrix multiplication to obtain the word-level contextual semantic representation. The calculation expression is shown in equation (2) below: (2); In the formula, For the first The number of subwords contained in a word; Example 4 Based on Embodiment 1 above, step 3 of this embodiment includes the following steps: Step 3.1: Dependency graph construction. Using the word segmentation results of each text as graph nodes (each node corresponding to a word), and the dependency relationship types as directed edges, a dependency graph is constructed for each text. , where the set of nodes edge set , where n is the length of the word sequence. Represents a node and Dependency relationship types; Step 3.2: Construction of the "word_part-of-speech" co-occurrence graph. Based on the corpus statistics of the co-occurrence frequency of word items (denoted as word_pos) under specific part-of-speech conditions within the context window, the "word_part-of-speech" NPMI is introduced as the edge weight to measure the correlation between two word items under specific part-of-speech conditions. This value is between [-1,1]. When the value is 0, they are independent; when the value is greater than 0, they are correlated, and the larger the value, the higher the correlation; when the value is less than 0, there is no correlation at all. The calculation expression is shown in the following formula (3): (3); In the formula, and Let x and y represent the probabilities of terms x and y appearing in all documents under specific part-of-speech conditions. This represents the probability that terms x and y appear simultaneously in all documents. Furthermore, parts of speech are categorized by function. Stanza's part-of-speech tagging set is based on the PKU standard, functionally dividing it into function words (prepositions ADP, conjunctions CCONJ, pronouns PRON, etc.), predicates (verbs VERB, etc.), and content words (nouns NOUN, numerals NUM, etc.). Based on the above, a global undirected co-occurrence graph is constructed. Node set For all word_pos in the global corpus, if two nodes x and y co-occur in the context window and belong to the same functional layer, then an undirected edge is created with a weight of . Each part-of-speech layer corresponds to an undirected weighted adjacency matrix, and multiple adjacency matrices together form a multi-graph structure.
[0023] Example 5 Based on Example 4 above, in step 4 when modeling the dependency graph, the word-level feature vectors obtained in step 2 are first used as the dependency graph. The node features are input to the Relation Graph Attention Network (Rel-GAT). Dependency relation indices are converted into relation embedding vectors. Next, a multi-head attention mechanism is used to model each dependency relation type, with attention weights determined solely by the dependency relation type. The output features of each attention head are concatenated and linearly mapped. Finally, the output features of each Rel-GAT layer are residually connected to the input features and normalized. Dropout is applied except for the last layer, resulting in a feature vector enhanced by dependency syntax. ; When modeling the word-part-of-speech co-occurrence graph, the word-level features obtained in step 2 are mapped to the node space to obtain the initial feature representation of the word node. After obtaining the word node features, they are input into a graph convolutional network. Using multiple undirected adjacency matrices as structural constraints, message propagation and feature aggregation operations are performed in the word node space to realize the information interaction between related words within the same part-of-speech layer and the modeling of global statistical semantic relationships. Finally, the updated word node representation is remapped back to the word-level position of the current sentence to obtain the word-level feature representation that integrates global lexical statistical relationships. .
[0024] Furthermore, the calculation process of the gating mechanism in step 5 is as follows: first, the syntactic enhancement representation is processed through gating parameters. and global statistical enhancement representation Perform a linear transformation and obtain the gate weights through an activation function. Then, the two types of enhanced representations are weighted element-wise by gating weights to obtain the fused features. .
[0025] The calculation expression is shown below: (4); (5); In the formula, All are gating parameters. For element-wise multiplication, This is the gating weight.
[0026] Example 6 Based on Example 5 above, in step 7 of this invention, after performing a linear transformation on the sentence-level semantic vector, the probability distribution of each sentence type is calculated using the Softmax function. The calculation formula for the probability distribution is shown in equation (6) below: (6); In the formula, These are all parameters of the fully connected layer. This is a sentence structure category label.
[0027] In one embodiment of the present invention, to verify the effectiveness of the proposed method, comparative experiments were conducted on multiple models in a unified simulation environment. The experiments used a single NVIDIA GeForce RTX 4090 graphics card with 24GB of video memory and CUDA version 12.4; the operating system was Linux; the deep learning algorithm was implemented based on the PyTorch framework, using Python 3.8 or later to ensure code compatibility and operational stability. All models were trained and evaluated under the same training, validation, and test set partitioning conditions, and the same training strategy was employed to ensure the fairness of the experimental results.
[0028] To verify the effectiveness of the method of this invention, BERT series models were selected as comparison models, including BERT, DebertaV3, ELECTRA, ModernBERT, RoBERTa, BERT-GCN, and BERT-RGAT. BERT-GCN introduces a graph convolutional network based on lexical co-occurrence relations on top of the BERT encoding results. BERT-RGAT introduces a relational graph attention network based on dependency syntax structure on top of the BERT encoding results. The method of this invention, based on a pre-trained language model, simultaneously introduces a relational graph attention network based on dependency syntax structure and a globally undirected lexical graph convolutional network based on the training corpus. Figure 2 The results show a comparison of the classification accuracy of various models on the test set. It can be seen that the classification accuracy of the method of this invention is higher than that of other methods. This indicates that by introducing global lexical co-occurrence relations and syntactic structure information, this invention can effectively make up for the shortcomings of relying solely on contextual semantic information and exhibits better classification performance in text classification tasks.
[0029] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.
[0030] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A Chinese sentence structure classification method that integrates dependency syntax and global co-occurrence information, characterized in that, First, text data is collected for preprocessing and syntactic analysis. Then, sub-word-level features are aggregated into word-level contextual semantic features through a pre-trained language model. Based on the syntactic analysis results, a dependency graph and a word-part-of-speech co-occurrence graph are constructed. Word-level features are used as node features and input into the graph attention network and graph convolutional network respectively to complete dual-graph modeling, resulting in syntactic enhancement features and global co-occurrence enhancement features. Subsequently, the two types of enhanced features are adaptively weighted and fused through a gating mechanism. The fused features are then concatenated with the original word-level features and input into the Transformer block to complete global self-attention interaction. Finally, the interacting features are pooled and linearly mapped, and the label with the highest probability is taken to achieve Chinese sentence classification.
2. The Chinese sentence structure classification method that integrates dependency syntax and global co-occurrence information according to claim 1, characterized in that, Includes the following steps: Step 1: Collect text data and preprocess it to build a training set. At the same time, use natural language processing tools to perform word segmentation, part-of-speech tagging and dependency parsing on the preprocessed text to obtain processed text data containing word segmentation results, part-of-speech tags and dependency relation types. Step 2: Based on the processed text data obtained in Step 1, input the text into the pre-trained language model to obtain sub-word-level context vector representations, then establish word-sub-word index mapping relationships based on the word segmentation results, and obtain word-level features through attention-weighted aggregation, which serve as input features for subsequent graph modeling; Step 3: Based on the word segmentation, part-of-speech tagging, and dependency parsing results from Step 1, construct a dependency graph and a word-part-of-speech co-occurrence graph, respectively. Step 4: The word-level features obtained in Step 2 are used as node features and input into the dependency graph and word-part-of-speech co-occurrence graph constructed in Step 3. The Relationship Graph Attention Network (Rel-GAT) is used to model the inter-word dependency relationships and type information within the text to obtain the word-level representation enhanced with dependency information. At the same time, the Global Co-occurrence Word-Part-of-Speech Statistics are injected hierarchically into the word-level representation through the Graph Convolutional Network (GCN) to obtain the word-level representation enhanced with global statistics. Step 5: Based on the two types of enhanced word-level representations obtained in Step 4, the syntactic enhanced representation and the global statistical enhanced representation are adaptively weighted through a gating mechanism to obtain the fused features; Step 6: Concatenate the fused features obtained in Step 5 with the original word-level features obtained in Step 2, input them into the Transformer block, and complete the global interaction modeling of multi-source features through the self-attention mechanism to obtain the multi-source fused feature vector. Step 7: Perform masked average pooling on the feature vectors obtained in Step 6 to obtain sentence-level semantic vectors. Map these vectors to the sentence label space through a dropout layer and linear transformation. Calculate the probability distribution of each label and take the label with the highest probability as the final sentence classification result.
3. The Chinese sentence structure classification method that integrates dependency syntax and global co-occurrence information according to claim 2, characterized in that, The text data mentioned in step 1 is collected from Chinese tree databases, academic literature databases, and legal databases. The preprocessing includes text cleaning, long sentence segmentation, stop word removal, and sentence category labeling. After preprocessing, the data format is unified into JSON format. Each sample contains the original sentence, sentence category label, word segmentation result, part-of-speech tag, dependency syntax core word index, and dependency relationship type field.
4. The Chinese sentence structure classification method that integrates dependency syntax and global co-occurrence information according to claim 2, characterized in that, The pre-trained language model mentioned in step 2 can be any one of BERT, ELECTRA, or DeBERTaV3; The attention weighted aggregation process is as follows: first, calculate the attention weight of each sub-word to its corresponding word, normalize the weight value using the softmax function, and then sum the sub-word vectors according to the normalized attention weights using matrix multiplication to obtain the word-level contextual semantic representation of the corresponding word.
5. The Chinese sentence structure classification method that integrates dependency syntax and global co-occurrence information according to claim 2, characterized in that, In step 3, a dependency graph and a word-part-of-speech co-occurrence graph are constructed. First, a dependency graph is constructed for each text, using the word segmentation result as a node and the dependency relationship type as a directed edge. Specifically, the word segmentation result of each text is used as the graph node, meaning each node corresponds to one word, and the dependency relationship type is used as the directed edge to construct a dependency graph for each text. , where the set of nodes edge set , where n is the length of the word sequence. Represents a node and Dependency relationship types; Based on the co-occurrence frequency of word-part-of-speech units statistically analyzed from the corpus, a weighted undirected co-occurrence graph of word-part-of-speech units is constructed after stratifying by part-of-speech function. Specifically, parts of speech are first divided into three categories: functional words, predicate words, and content words. Word-part-of-speech units are terms under specific part-of-speech conditions. Then, NPMI values are introduced as weights for undirected edges to quantify the co-occurrence association strength between two word-part-of-speech units. An undirected edge is established only when two word-part-of-speech units co-occur in the context window and belong to the same functional word. The final constructed word-part-of-speech co-occurrence graph is as follows: Node set The edge set for all word-part-of-speech units in the global corpus. For a set of undirected edges with NPMI weights, the NPMI value is calculated as follows: ; In the formula, and Let x and y represent the probabilities of terms x and y appearing in all documents under specific part-of-speech conditions. This represents the probability that terms x and y appear simultaneously in all documents.
6. The Chinese sentence structure classification method that integrates dependency syntax and global co-occurrence information according to claim 2, characterized in that, In step 4, when modeling the dependency graph, the dependency indexes are first converted into dependency embedding vectors. Then, a multi-head attention mechanism is used to model different dependency types separately. The attention weights are determined by the dependency type. The output features of each attention head are concatenated and linearly mapped. Finally, through residual connections and normalization, the word-level representation with enhanced dependency information is output. ; When modeling the word-part-of-speech co-occurrence graph, the word-level features are first mapped to the word-part-of-speech node space to obtain initial features. Then, using multiple undirected weighted adjacency matrices as structural constraints, message propagation and feature aggregation are performed in the word-part-of-speech node space. Finally, the updated node representations are mapped back to the word-level positions of the original sentence, and the globally statistically enhanced word-level representations are output. .
7. The Chinese sentence structure classification method that integrates dependency syntax and global co-occurrence information according to claim 2, characterized in that, The feature fusion process in step 5 specifically involves adaptively fusing dependent information to enhance features through a gating mechanism. Enhanced features with global co-occurrence The calculation expression is shown below: ; ; In the formula, All are gating parameters. For element-wise multiplication, This is the gating weight.
8. The Chinese sentence structure classification method that integrates dependency syntax and global co-occurrence information according to claim 2, characterized in that, In step 7, after performing a linear transformation on the sentence-level semantic vectors, the probability distribution of each sentence type category is calculated using the Softmax function. The formula for calculating the probability distribution is shown below: ; In the formula, These are all parameters of the fully connected layer. This is a sentence structure category label.