Semantic-based text classification method and device, computer device and storage medium
By combining knowledge graphs and capsule network models in a text classification method, the problem of low accuracy in existing text classification technologies is solved, and efficient recognition and feature extraction of multi-labeled text are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA PING AN PROPERTY INSURANCE CO LTD
- Filing Date
- 2024-01-12
- Publication Date
- 2026-06-30
AI Technical Summary
In existing technologies, text classification methods struggle to accurately identify the features of highly overlapping objects, resulting in low accuracy in multi-label text classification.
We employ a semantic-based text classification method. By acquiring a classified text dataset and a knowledge graph, we utilize a knowledge-enhanced language model and a capsule network model for feature extraction and classification computation. This includes processing of the knowledge layer, embedding layer, visibility layer, and encoding layer, as well as operations on the convolutional layer, capsule layer, and classification layer.
It improves the accuracy and efficiency of text classification, effectively identifies multi-labeled text, and enhances the ability to extract text features.
Smart Images

Figure CN117874234B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the fields of artificial intelligence and financial technology, and in particular to a semantic-based text classification method, apparatus, computer device and storage medium. Background Technology
[0002] With the development of the big data era, massive amounts of text data have been generated and accumulated. This text data encompasses a wide variety of categories, and extracting useful information from this vast amount of unstructured text data has become an increasingly urgent need. In general fields, text classification of this data has a significant positive impact on big data processing.
[0003] Text classification is a crucial component of text mining applications, encompassing question classification, sentiment analysis, and topic classification. Current text classification methods often employ general-domain tools like BERT and RoBERTa for text representation, neglecting the influence of external knowledge on text semantics. Furthermore, in the text feature extraction stage, methods typically use only a single fully connected layer or employ CNNs (Convolutional Neural Networks) or RNNs (Recurrent Neural Networks) for text classification. However, these methods fail to accurately identify features of highly overlapping objects, reducing feature understanding capabilities, and are unsuitable for multi-label text, ultimately hindering the achievement of high text classification accuracy. Summary of the Invention
[0004] The purpose of this application is to propose a semantic-based text classification method, apparatus, computer device, and storage medium to solve the technical problem that existing text classification methods cannot accurately identify the features of highly overlapping objects, are not conducive to multi-label text classification, and are difficult to obtain high text classification accuracy.
[0005] To address the aforementioned technical problems, this application provides a semantic-based text classification method, employing the following technical solution:
[0006] Obtain a categorized text dataset and its corresponding knowledge graph. Divide the categorized text dataset into a training sample set and a test sample set. The categorized text dataset includes multiple categorized texts and a category label corresponding to each categorized text.
[0007] The training sample set and the knowledge graph are input into a pre-constructed knowledge-enhanced language model to obtain knowledge-enhanced text semantic feature vectors.
[0008] The text semantic feature vector is input into a pre-constructed capsule network model for classification calculation, and the classification prediction result is output.
[0009] The loss value between the predicted classification result and the classification label is calculated according to a preset loss function;
[0010] The model parameters of the knowledge-enhanced language model and the capsule network model are adjusted based on the loss value, and iterative training continues until convergence is obtained to obtain the final target model parameters. The model to be verified is then output based on the target model parameters.
[0011] The test sample set is input into the model to be verified to obtain the verification result. When the verification result meets the preset conditions, the model to be verified is determined to be a text semantic classification model.
[0012] Obtain the text to be classified, input the text to be classified into the text semantic classification model, and obtain the text classification result.
[0013] Furthermore, the knowledge-enhanced language model includes a knowledge layer, an embedding layer, a visibility layer, and an encoding layer; the step of inputting the training sample set and the knowledge graph into the pre-constructed knowledge-enhanced language model to obtain knowledge-enhanced text semantic feature vectors includes:
[0014] The knowledge in the knowledge graph is injected into the text sentences of the training sample set through the knowledge layer to form a sentence tree, and the sentence tree is then input into the embedding layer and the visibility layer respectively.
[0015] The sentence tree is embedded using the embedding layer to obtain a text position encoding vector;
[0016] The text visibility matrix of the sentence tree is constructed through the visibility layer;
[0017] The text position encoding vector and the text visibility matrix are input into the encoding layer for attention calculation, and the text semantic feature vector is output.
[0018] Furthermore, the step of injecting knowledge from the knowledge graph into the text sentences of the training sample set through the knowledge layer to form a sentence tree includes:
[0019] The knowledge query function of the knowledge layer is invoked to identify all entities corresponding to each text sentence in the training sample set, and to query the triplet corresponding to each entity in the knowledge graph.
[0020] The knowledge injection function of the knowledge layer is called to embed the triples into the corresponding positions in the text sentence, thereby obtaining a sentence tree.
[0021] Furthermore, the step of performing position embedding on the sentence tree through the embedding layer to obtain the text position encoding vector includes:
[0022] The sentence tree is input into the embedding layer to perform segment embedding, soft position embedding and word embedding operations respectively, to obtain the corresponding sentence encoding vector, position encoding vector and word encoding vector;
[0023] The sentence encoding vector, the position encoding vector, and the word encoding vector are summed to obtain the text position encoding vector.
[0024] Furthermore, the step of inputting the text position encoding vector and the text visibility matrix into the encoding layer for attention calculation and outputting the text semantic feature vector includes:
[0025] Determine the query vector parameter matrix, key vector parameter matrix, and value vector parameter matrix of the encoding layer;
[0026] Self-attention is calculated based on the query vector parameter matrix, the key vector parameter matrix, the value vector parameter matrix, the text position encoding vector, and the text visibility matrix.
[0027] Multi-head attention calculation is performed based on the self-attention to obtain the text semantic feature vector.
[0028] Furthermore, the capsule network model includes convolutional layers, capsule layers, and classification layers; the step of inputting the text semantic feature vector into the pre-constructed capsule network model for classification calculation and outputting the classification prediction result includes:
[0029] The text semantic feature vector is input into the convolutional layer for convolutional feature extraction to obtain the convolutional feature vector;
[0030] Text aggregation is performed on the convolutional feature vectors through the capsule layer to obtain a global semantic vector containing contextual semantics;
[0031] The global semantic vector is input into the classification layer for classification prediction, and the classification prediction result is output.
[0032] Furthermore, the capsule layer includes a main capsule layer and a digital capsule layer; the step of performing text aggregation on the convolutional feature vectors through the capsule layer to obtain a global semantic vector containing contextual semantics includes:
[0033] The convolutional feature vector is input into the main capsule layer for a one-dimensional convolution operation to obtain a vector capsule.
[0034] The vector capsule is input into the digital capsule layer, and a dynamic routing algorithm is used to map the vector capsule to obtain a global semantic vector.
[0035] To address the aforementioned technical problems, this application also provides a semantic-based text classification device, which employs the following technical solution:
[0036] The acquisition module is used to acquire a categorized text dataset and a corresponding knowledge graph, and to divide the categorized text dataset into a training sample set and a test sample set. The categorized text dataset includes multiple categorized texts and a category label corresponding to each categorized text.
[0037] The text enhancement module is used to input the training sample set and the knowledge graph into a pre-constructed knowledge-enhanced language model to obtain a knowledge-enhanced text semantic feature vector.
[0038] The classification prediction module is used to input the text semantic feature vector into a pre-constructed capsule network model for classification calculation and output the classification prediction result;
[0039] The loss calculation module is used to calculate the loss value between the predicted classification result and the classification label according to a preset loss function;
[0040] The adjustment module is used to adjust the model parameters of the knowledge-enhanced language model and the capsule network model based on the loss value, continue iterative training until convergence, obtain the final target model parameters, and output the model to be verified based on the target model parameters.
[0041] The verification module is used to input the test sample set into the model to be verified, obtain the verification result, and determine the model to be verified as a text semantic classification model when the verification result meets the preset conditions.
[0042] The classification module is used to obtain the text to be classified, input the text to be classified into the text semantic classification model, and obtain the text classification result.
[0043] To address the aforementioned technical problems, this application also provides a computer device that employs the following technical solution:
[0044] The computer device includes a memory and a processor, the memory storing computer-readable instructions, and the processor executing the computer-readable instructions to implement the steps of the semantic-based text classification method as described above.
[0045] To address the aforementioned technical problems, this application also provides a computer-readable storage medium, employing the technical solution described below:
[0046] The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps of the semantic-based text classification method as described above.
[0047] Compared with the prior art, the embodiments of this application have the following main advantages:
[0048] This application introduces knowledge graphs into a knowledge-enhanced language model. By combining the knowledge graph with the training sample set to extract features from the knowledge-enhanced language model, it can obtain text semantic feature vectors containing rich knowledge information, thus enhancing the feature expression of the text. Inputting the knowledge-enhanced text semantic feature vectors into a capsule network model for classification calculation can further obtain semantic information relationships between words, improve the ability to extract important text features, effectively identify multi-label text, and thus improve the efficiency and accuracy of text classification. Attached Figure Description
[0049] To more clearly illustrate the solutions in this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0050] Figure 1 This is an exemplary system architecture diagram to which this application can be applied;
[0051] Figure 2 This is a flowchart of an embodiment of the semantic-based text classification method according to this application;
[0052] Figure 3 yes Figure 2 A flowchart of a specific implementation of step S202;
[0053] Figure 4 This is a schematic diagram of the structure of an embodiment of the semantic-based text classification device according to this application;
[0054] Figure 5 This is a schematic diagram of the structure of one embodiment of the computer device according to this application. Detailed Implementation
[0055] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains; the terminology used herein in the specification of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having," and any variations thereof, in the specification, claims, and foregoing drawings of this application, are intended to cover non-exclusive inclusion. The terms "first," "second," etc., in the specification, claims, or foregoing drawings of this application are used to distinguish different objects, not to describe a particular order.
[0056] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0057] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
[0058] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.
[0059] Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.
[0060] This application provides a semantic-based text classification method that can be applied to, for example... Figure 1 In the system architecture 100 shown, the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. The network 104 is used as a medium to provide a communication link between the terminal devices 101, 102, and 103 and the server 105. The network 104 may include various connection types, such as wired or wireless communication links or fiber optic cables, etc.
[0061] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social media platform software, etc.
[0062] Terminal devices 101, 102, and 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), MP4 players (Moving Picture Experts Group Audio Layer IV), laptops, and desktop computers, etc.
[0063] Server 105 can be a server that provides various services, such as a backend server that supports the pages displayed on terminal devices 101, 102, and 103.
[0064] It should be noted that the semantic-based text classification method provided in this application embodiment is generally executed by a server / terminal device, and correspondingly, the semantic-based text classification device is generally set in the server / terminal device.
[0065] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0066] Continue to refer to Figure 2 The flowchart illustrates an embodiment of a semantic-based text classification method according to this application, including the following steps:
[0067] Step S201: Obtain the categorized text dataset and the corresponding knowledge graph. Divide the categorized text dataset into a training sample set and a test sample set. The categorized text dataset includes multiple categorized texts and the categorization label corresponding to each categorized text.
[0068] Categorized text datasets can be obtained based on business scenarios. These scenarios can include insurance business scenarios such as insurance topic classification, customer request classification, and insurance scenario classification, as well as sentiment analysis of comment texts from e-commerce platforms and social media platforms. For example, if it's insurance topic classification, then obtain the insurance topic text dataset; if it's customer request classification, then obtain the customer request text dataset; if it's insurance scenario classification, then obtain the insurance scenario text dataset; and if it's sentiment analysis, then obtain the relevant comment text data containing positive and negative sentiments from relevant e-commerce platforms or social media platforms.
[0069] A knowledge graph is a knowledge base where data is integrated through a graph-structured data model or topology. It typically represents a semantic relationship graph of entities and relations, stored as triples of <head entity, relation, tail entity>. The head and tail entities represent specific things that exist in the real world, and the relation expresses a semantic connection between entities. For example, in the triple <China, capital, Beijing>, China is the head entity, capital is the relation term, and Beijing is the tail entity.
[0070] Different business scenarios correspond to different knowledge graphs. In this embodiment, the corresponding knowledge graph can be selected according to the business scenario in which the classified text dataset is located.
[0071] In some embodiments, after obtaining the categorized text dataset, the dataset is preprocessed, including deduplication, handling missing values, handling outliers, and correcting erroneous values. The preprocessed categorized text dataset is then randomly divided into a training sample set and a test sample set according to a preset ratio, for example, a training sample set: test sample set ratio of 8:2.
[0072] It should be understood that a categorized text dataset includes multiple categorized texts and their corresponding categorization labels, where the categorization labels represent the true categories of the categorized texts.
[0073] It should be emphasized that, to further ensure the privacy and security of the categorized text dataset, the aforementioned categorized text dataset can also be stored in a blockchain node.
[0074] The blockchain referred to in this application is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and encryption algorithms. Essentially, a blockchain is a decentralized database, a chain of data blocks linked together using cryptographic methods. Each data block contains information about a batch of network transactions, used to verify the validity of the information (anti-counterfeiting) and generate the next block. A blockchain can include an underlying blockchain platform, a platform product service layer, and an application service layer.
[0075] Step S202: Input the training sample set and knowledge graph into the pre-built knowledge-enhanced language model to obtain the knowledge-enhanced text semantic feature vector.
[0076] Among them, the knowledge-enhanced language model is the K-BERT model. The K-BERT model introduces a knowledge graph representation on the basis of the original BERT model. After the input text is processed by K-BERT, it contains domain knowledge that was not in the original text, and the output is a feature vector containing rich domain knowledge.
[0077] In this embodiment, the pre-built knowledge-enhanced language model includes a knowledge layer, an embedding layer, a visibility layer, and an encoding layer.
[0078] In some optional implementations of this embodiment, the steps of inputting the training sample set and knowledge graph into a pre-built knowledge-enhanced language model to obtain knowledge-enhanced text semantic feature vectors include:
[0079] Step S301: Inject knowledge from the knowledge graph into the text sentences of the training sample set through the knowledge layer to form a sentence tree, and input the sentence tree into the embedding layer and the visibility layer respectively.
[0080] The knowledge layer is responsible for injecting knowledge from the knowledge graph into the text sentences, forming a sentence tree. For example, for each input sentence s = {w1, w2, w3, ..., w...}, ... n First, named entities are extracted from the sentence. Then, the obtained named entities are queried in the knowledge graph for their corresponding relations and values (tail entities), forming a triple <head entity, relation, value>. This triple is then returned to the corresponding position in the sentence, forming a tree-like structure called a sentence tree. The sentence tree completes the sentence with contextual information, solving the problem of word vectors deviating from the core semantics due to a single sentence lacking contextual knowledge.
[0081] In this embodiment, forming a sentence tree involves two steps: knowledge query (K-Query) and knowledge injection (K-Inject).
[0082] Furthermore, the knowledge query function of the knowledge layer is called to identify all entities corresponding to each text sentence in the training sample set, and to query the triples corresponding to each entity in the knowledge graph; the knowledge injection function of the knowledge layer is called to embed the triples into the corresponding positions in the text sentences to obtain the sentence tree.
[0083] K-Query is responsible for retrieving the relations and values (i.e., triples) corresponding to each entity in each text sentence from the knowledge graph. The specific query process is as follows:
[0084] E = K_Query(s, K);
[0085] Here, the function K_Query represents the query performed by the text sentence s on the knowledge graph K to obtain the set of triples E = {(w i r i0 w i0 ), ......, (w i r ik w ik )}.
[0086] K-Inject is responsible for embedding the set of triples E into the corresponding positions in the text sentence s, forming a sentence tree. Each triple constitutes a branch. Therefore, the matrix sentence tree output by the knowledge layer is: t = {w1, w2, w3, ..., w...} i (r i0 w i0 ), ..., w i (r ik w ik ),……,w n}
[0087] Step S302: The sentence tree is embedded through an embedding layer to obtain the text position encoding vector.
[0088] Sentence trees cannot be directly used as sequence input; they are converted into text sequences through an embedding layer.
[0089] The embedding layer includes a token embedding layer, a soft-position embedding layer, and a segment embedding layer. The token embedding layer maps each token in the sentence to a vector representation of dimension H. Each sentence begins with a special token, [CLS], primarily for sentence classification. The soft-position embedding layer uses soft positions to encode the embedded values and relationships, distinguishing them from the positional encoding of entities. The segment embedding layer distinguishes between two sentence segments.
[0090] Specifically, the sentence tree is input into the embedding layer and segment embedding, soft position embedding, and word embedding are performed respectively to obtain the corresponding sentence encoding vector, position encoding vector, and word encoding vector; the sentence encoding vector, position encoding vector, and word encoding vector are summed to obtain the text position encoding vector.
[0091] The sentence tree is embedded using a token embedding layer to obtain a sentence encoding vector. A soft-position embedding layer is then used to embed each word in the sentence tree to obtain a position encoding vector, which retains the main structural information of the sentence. A segment embedding layer is used to embed each word in the sentence tree to obtain a word encoding vector. The sentence encoding vector, position encoding vector, and word encoding vector are then summed to obtain a text position encoding vector. This text position encoding vector retains the structural information of the tree structure, enhancing feature representation and thus better capturing the semantic features of the text.
[0092] Step S303: Construct the text visibility matrix of the sentence tree through the visibility layer.
[0093] Since triples in the sentence tree may affect the meaning of the original text sentences, a text visibility matrix M is constructed to prevent knowledge noise from affecting the sentences. This matrix restricts each character to only seeing the context and knowledge related to itself.
[0094] Assume the two tokens are w i and w j M ij When = 0, it indicates that w is in the text visibility matrix. i For w j It is visible, M ij When =-∞, it means that w is in the text visibility matrix. i For w j It is invisible.
[0095] Step S304: Input the text position encoding vector and the text visibility matrix into the encoding layer for attention calculation, and output the text semantic feature vector.
[0096] In this embodiment, the use of an encoding layer can limit the visible area of the self-attention mechanism, thereby capturing the deep bidirectional structure in the text sentence. The encoding layer is composed of multiple mask-self-attention layers stacked together, and a text visibility matrix M is added on top of the mask-self-attention to perform attention calculation on the text position encoding vector.
[0097] Furthermore, the steps described above, which involve inputting the text position encoding vector and the text visibility matrix into the encoding layer for attention calculation and outputting the text semantic feature vector, include:
[0098] Determine the query vector parameter matrix, key vector parameter matrix, and value vector parameter matrix of the encoding layer;
[0099] Calculate self-attention based on the query vector parameter matrix, key vector parameter matrix, value vector parameter matrix, text position encoding vector, and text visibility matrix;
[0100] Multi-head attention computation is performed based on self-attention to obtain the text semantic feature vector.
[0101] Assuming the number of mask-self-attention layers is L and the number of heads is A, determine the query vector parameter matrix W for each layer of mask-self-attention. q Key vector parameter matrix W k Sum value vector parameter matrix W vThe formula for calculating the self-attention of each mask-self-attention layer is as follows:
[0102]
[0103] Among them, Q i =H i-1 W q ;K i =H i-1 W k V i =H i-1 W v Q i K represents the query vector at level i. i V represents the key vector of the i-th layer. i H represents the value vector of the i-th layer; i d represents the mask-self-attention output of the i-th layer; k This represents the dimension of the input text position encoding vector.
[0104] The formula for calculating multi-head attention is as follows:
[0105] MultiHead=Concat(head1,head2,…,head A W0;
[0106] Where Concat represents the matrix concatenation function; W0 represents the parameter matrix for compressing each self-attention point. In this embodiment, the output of multi-head attention yields the text semantic feature vector.
[0107] The sentence tree results were obtained through the text visibility matrix, and attention was calculated based on the text visibility matrix. This achieved the goal of not adding noise while embedding knowledge, thus ensuring the accuracy of the output text semantic feature vector.
[0108] In this embodiment, a knowledge-enhanced language model is used to add domain knowledge to the classified text, enriching the text semantics and avoiding the problems of inconsistent encoding spaces of diverse word vectors and deviation of sentences from the core semantics; at the same time, due to the integration of knowledge graphs, it can be used for text classification in professional fields.
[0109] Step S203: Input the text semantic feature vector into the pre-constructed capsule network model for classification calculation and output the classification prediction result.
[0110] The pre-built capsule network model includes convolutional layers, capsule layers, and classification layers. The convolutional layers use N-gram convolutional layers to extract features from the text semantic feature vectors and encapsulate the extracted features into vectors of spatial information. The capsule layers are used to extract and encode high-level abstract semantic features in sentences. The classification layers are used to classify the semantic features extracted by the capsule layers.
[0111] In some optional implementations of this embodiment, the steps of inputting the text semantic feature vector into a pre-built capsule network model for classification calculation and outputting the classification prediction result include:
[0112] The text semantic feature vector is input into the convolutional layer for convolutional feature extraction, resulting in a convolutional feature vector.
[0113] By performing text aggregation on the convolutional feature vectors through capsule layers, a global semantic vector containing contextual semantics is obtained;
[0114] The global semantic vector is input into the classification layer for classification prediction, and the classification prediction result is output.
[0115] The basic idea of N-grams is to slide the text content into a sliding window of N bytes, forming a sequence of N-byte segments, each segment called a gram. Word frequency statistics are performed on all gram segments, and features with lower word frequencies are filtered out based on a set threshold. Finally, a list of keyword grams is formed, which is the feature vector space of the text. In this embodiment, features in the text's semantic feature vector are extracted using N-grams to obtain a convolutional feature vector containing spatial information.
[0116] The capsule layer consists of a main capsule layer and a digital capsule layer. The main capsule layer realizes the conversion from scalar neurons to vector neurons (capsules) and uses a dynamic routing algorithm to further encode the convolutional feature vectors, realizing vector transfer between the main capsule layer and the digital capsule layer, thereby improving the model's recognition efficiency and enabling the model to converge quickly and smoothly. The digital capsule layer contains multiple capsules, and the probability of each capsule belonging to a certain category is predicted by the length of its activity vector.
[0117] In some alternative implementations, the steps described above for performing text aggregation on convolutional feature vectors through capsule layers to obtain a global semantic vector containing contextual semantics include:
[0118] The convolutional feature vector is input into the main capsule layer for one-dimensional convolution operation to obtain the capsule vector;
[0119] The capsule vectors are input into the digital capsule layer, and the vector capsules are mapped using a dynamic routing algorithm to obtain the global semantic vector.
[0120] During the training of the capsule network model, the vectors in the main capsule layer and the vectors in the digital capsule layer are in a fully connected mode. In the i-th capsule vector u of the main capsule... i The j-th output vector v connected to the digital capsule layer j Transformation matrix W ij The coupling coefficient is c ij and the prediction vector is Among them, the prediction vector The calculation method is as follows:
[0121]
[0122] Perform routing iterations on the main capsule layer and calculate the coupling coefficient of the dynamic routing algorithm:
[0123]
[0124] Among them, b ij c represents the unweighted initial coupling coefficients. ij It is the coupling coefficient determined by the dynamic routing algorithm, that is, the coupling coefficient obtained by applying softmax weighting to the initial coupling coefficient.
[0125] Based on the coupling coefficient c ij Calculate the weighted sum s j The calculation formula is as follows:
[0126]
[0127] Among them, s j The weighted sum of all vectors in the main capsule layer is used to reach the coupled output of the digital capsule layer.
[0128] The Squash function is used to ensure that the final output vector v is guaranteed. j The length is between 0 and 1, and is calculated as follows:
[0129]
[0130] Using the prediction vector and the output vector v of the digital capsule layer j Update b using the inner product ij This leads to an update of the coupling coefficient c. ij The update method is as follows:
[0131]
[0132] Based on the updated coupling coefficient c ij The transformation matrix W is updated using the backpropagation algorithm. ij The final output vector v j That is, the global semantic vector, and the output vector vj The length represents the probability that its corresponding category exists.
[0133] The loss function formula for the capsule network model is as follows:
[0134] L k =T k max(0,m + -‖v k ‖) 2 +λ(1-T k )max(0,‖v k ||-m - ) 2 +‖W‖;
[0135] Where k is the number of categories; T k Indicates whether the class exists; m + The upper bound is 0.9; m - The lower bound is represented by λ, which takes the value 0.1; λ is the proportionality constant, which can be set to 0.5; ||v k ‖ represents the probability that the capsule unit belongs to this category; ‖W‖ represents the regularization loss of the weight parameters.
[0136] This embodiment effectively reduces redundant information and improves the training efficiency of the model by using a dynamic routing algorithm to transfer information between the main capsule layer and the digital capsule layer.
[0137] The global semantic vector is input into the classification layer to calculate the probability of the classified text in each category, thus obtaining the classification prediction result.
[0138] This application combines multi-layer capsules with a dynamic routing mechanism to capture and effectively encode high-level features of various aspects of a sentence, thereby improving classification accuracy.
[0139] Step S204: Calculate the loss value between the predicted classification result and the classification label according to the preset loss function.
[0140] The default loss function is calculated as follows:
[0141] Loss=-[ylogy′+(1-y)log(1-y′)];
[0142] Where y represents the true classification label; y' represents the predicted classification result.
[0143] Step S205: Adjust the model parameters of the knowledge-enhanced language model and capsule network model based on the loss value, continue iterative training until convergence, obtain the final target model parameters, and output the model to be verified based on the target model parameters.
[0144] The model parameters of the knowledge-enhanced language model and the capsule network model are adjusted according to the loss value. The convergence condition can be met by either the loss value not changing significantly or the number of iterations reaching a preset number.
[0145] Upon convergence, the model parameters of the knowledge-enhanced language model and the capsule network model are used as the target model parameters. The model to be verified is obtained based on the target model parameters. That is, the model to be verified consists of the knowledge-enhanced language model and the capsule network model.
[0146] Since during the training of the capsule network model, the loss function L of the capsule network model has already been used... k In some optional embodiments, the capsule network model can be optimized by adjusting the model parameters of the knowledge-enhanced language model based on the loss value.
[0147] Step S206: Input the test sample set into the model to be verified, obtain the verification result, and determine the model to be verified as a text semantic classification model when the verification result meets the preset conditions.
[0148] Input the test sample set into the model to be validated to obtain the classification validation results. Calculate the prediction accuracy of the model based on the classification validation results and use the prediction accuracy as the validation result.
[0149] The formula for calculating prediction accuracy is as follows:
[0150]
[0151] Where N is the number of samples in the test sample set, y i ′ This is the classification verification result, y i These are the actual category labels; 1(y i ′ =y i ) indicates that the predicted result is the same as the actual value, and the sample count is 1.
[0152] If the prediction accuracy is greater than or equal to the preset threshold, the model to be validated is output as the final text semantic classification model; if the prediction accuracy is less than the preset threshold, it means that the prediction accuracy of the model is not high, and it is necessary to increase the number of samples or modify the model parameters and retrain to improve the prediction accuracy.
[0153] Step S207: Obtain the text to be classified, input the text to be classified into the text semantic classification model, and obtain the text classification result.
[0154] The trained text semantic classification model can be applied to relevant business scenarios for text classification. By acquiring the text to be classified and using the text semantic classification model, classification efficiency and accuracy are improved, bringing new technological advancements to the business.
[0155] This application introduces knowledge graphs into a knowledge-enhanced language model. By combining the knowledge graph with the training sample set to extract features from the knowledge-enhanced language model, it can obtain text semantic feature vectors containing rich knowledge information, thus enhancing the feature expression of the text. Inputting the knowledge-enhanced text semantic feature vectors into a capsule network model for classification calculation can further obtain semantic information relationships between words, improve the ability to extract important text features, effectively identify multi-label text, and thus improve the efficiency and accuracy of text classification.
[0156] This application can be used in a wide variety of general-purpose or special-purpose computer system environments or configurations. Examples include: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and distributed computing environments including any of the above systems or devices. This application can be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.
[0157] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer-readable instructions. These computer-readable instructions can be stored in a computer-readable storage medium. When the program is executed, it can include the processes of the embodiments of the above methods. The aforementioned storage medium can be a non-volatile storage medium such as a magnetic disk, optical disk, or read-only memory (ROM), or random access memory (RAM).
[0158] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.
[0159] Further reference Figure 4 As a response to the above Figure 2 The implementation of the method shown in this application provides an embodiment of a semantic-based text classification device, which is similar to... Figure 2 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.
[0160] like Figure 4 As shown, the semantic-based text classification device 400 described in this embodiment includes: an acquisition module 401, a text enhancement module 402, a classification prediction module 403, a loss calculation module 404, an adjustment module 405, a verification module 406, and a classification module 407. Wherein:
[0161] The acquisition module 401 is used to acquire a classified text dataset and a corresponding knowledge graph, and divides the classified text dataset into a training sample set and a test sample set. The classified text dataset includes multiple classified texts and a classification label corresponding to each classified text.
[0162] The text enhancement module 402 is used to input the training sample set and the knowledge graph into a pre-constructed knowledge-enhanced language model to obtain a knowledge-enhanced text semantic feature vector;
[0163] The classification prediction module 403 is used to input the text semantic feature vector into a pre-constructed capsule network model for classification calculation and output the classification prediction result;
[0164] The loss calculation module 404 is used to calculate the loss value between the predicted classification result and the classification label according to a preset loss function;
[0165] The adjustment module 405 is used to adjust the model parameters of the knowledge-enhanced language model and the capsule network model based on the loss value, continue iterative training until convergence, obtain the final target model parameters, and output the model to be verified according to the target model parameters;
[0166] The verification module 406 is used to input the test sample set into the model to be verified, obtain the verification result, and determine the model to be verified as a text semantic classification model when the verification result meets the preset conditions.
[0167] The classification module 407 is used to obtain the text to be classified, input the text to be classified into the text semantic classification model, and obtain the text classification result.
[0168] It should be emphasized that, to further ensure the privacy and security of the categorized text dataset, the aforementioned categorized text dataset can also be stored in a blockchain node.
[0169] Based on the aforementioned semantic-based text classification device 400, by introducing a knowledge graph into the knowledge-enhanced language model and extracting features from the knowledge graph in combination with the training sample set, it is possible to obtain text semantic feature vectors containing rich knowledge information, thereby enhancing the feature expression of the text. Inputting the knowledge-enhanced text semantic feature vectors into the capsule network model for classification calculation can further obtain the semantic information relationship between words, improve the ability to extract important text features, effectively identify multi-label text, and thus improve the efficiency and accuracy of text classification.
[0170] In some optional implementations, the knowledge-enhanced language model includes a knowledge layer, an embedding layer, a visibility layer, and an encoding layer, and the text enhancement module 402 includes:
[0171] The knowledge injection submodule is used to inject knowledge from the knowledge graph into the text sentences of the training sample set through the knowledge layer to form a sentence tree, and input the sentence tree into the embedding layer and the visibility layer respectively;
[0172] An embedding submodule is used to perform position embedding on the sentence tree through the embedding layer to obtain a text position encoding vector;
[0173] A matrix construction submodule is used to construct the text visibility matrix of the sentence tree through the visibility layer;
[0174] The encoding submodule is used to input the text position encoding vector and the text visibility matrix into the encoding layer for attention calculation and output the text semantic feature vector.
[0175] By using a knowledge-enhanced language model to add domain knowledge to classified texts, the semantics of the texts are enriched, avoiding the problems of inconsistent encoding spaces of diverse word vectors and sentences deviating from the core semantics; at the same time, due to the integration of knowledge graphs, it can be used for text classification in professional fields.
[0176] In this embodiment, the knowledge injection submodule includes:
[0177] The knowledge query unit is used to call the knowledge query function of the knowledge layer to identify all entities corresponding to each text sentence in the training sample set, and to query the triplet corresponding to each entity in the knowledge graph.
[0178] The knowledge injection unit is used to call the knowledge injection function of the knowledge layer to embed the triples into the corresponding positions in the text sentence to obtain a sentence tree.
[0179] By using sentence trees, the background information of sentences can be completed, which solves the problem of word vectors deviating from the core semantics due to the lack of knowledge background in a single sentence.
[0180] In some optional implementations of this embodiment, the embedded submodule includes:
[0181] The embedding unit is used to input the sentence tree into the embedding layer to perform segment embedding, soft position embedding and word embedding operations respectively, to obtain the corresponding sentence encoding vector, position encoding vector and word encoding vector;
[0182] The summation unit is used to sum the sentence encoding vector, the position encoding vector, and the word encoding vector to obtain the text position encoding vector.
[0183] The text position encoding vector obtained through embedding operations retains the structural information of the tree structure, which can enhance feature representation and thus better obtain the semantic features of the text.
[0184] In this embodiment, the encoding submodule includes:
[0185] A determining unit is used to determine the query vector parameter matrix, key vector parameter matrix, and value vector parameter matrix of the encoding layer;
[0186] The self-attention calculation unit is used to calculate self-attention based on the query vector parameter matrix, the key vector parameter matrix, the value vector parameter matrix, the text position encoding vector, and the text visibility matrix;
[0187] The multi-head attention calculation unit is used to perform multi-head attention calculation based on the self-attention to obtain the text semantic feature vector.
[0188] The sentence tree results were obtained through the text visibility matrix, and attention was calculated based on the text visibility matrix. This achieved the goal of not adding noise while embedding knowledge, thus ensuring the accuracy of the output text semantic feature vector.
[0189] In some optional implementations, the capsule network model includes convolutional layers, capsule layers, and classification layers, and the classification prediction module 403 includes:
[0190] The convolutional submodule is used to input the text semantic feature vector into the convolutional layer to extract convolutional features and obtain a convolutional feature vector.
[0191] The capsule submodule is used to perform text aggregation on the convolutional feature vector through the capsule layer to obtain a global semantic vector containing contextual semantics;
[0192] The prediction submodule is used to input the global semantic vector into the classification layer for classification prediction and output the classification prediction result.
[0193] By combining multi-layer capsules with a dynamic routing mechanism, high-level features of various aspects of a sentence can be captured and effectively encoded, thereby improving classification accuracy.
[0194] In this embodiment, the capsule layer includes a main capsule layer and a digital capsule layer, and the capsule sub-module includes:
[0195] A one-dimensional convolutional unit is used to input the convolutional feature vector into the main capsule layer to perform a one-dimensional convolution operation to obtain a vector capsule.
[0196] The dynamic reasoning unit is used to input the vector capsule into the digital capsule layer and perform a mapping operation on the vector capsule through a dynamic routing algorithm to obtain a global semantic vector.
[0197] By using a dynamic routing algorithm to transfer information between the main capsule layer and the digital capsule layer, redundant information can be effectively reduced, thus improving the training efficiency of the model.
[0198] To address the aforementioned technical problems, embodiments of this application also provide a computer device. Please refer to [link / reference needed]. Figure 5 , Figure 5 This is a basic structural block diagram of the computer device in this embodiment.
[0199] The computer device 5 includes a memory 51, a processor 52, and a network interface 53 that are interconnected via a system bus. It should be noted that only the computer device 5 with components 51-53 is shown in the figure; however, it should be understood that it is not required to implement all the shown components, and more or fewer components can be implemented alternatively. Those skilled in the art will understand that the computer device described here is a device capable of automatically performing numerical calculations and / or information processing according to pre-set or stored instructions, and its hardware includes, but is not limited to, microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), embedded devices, etc.
[0200] The computer device can be a desktop computer, laptop, handheld computer, or cloud server, etc. The computer device can interact with the user via a keyboard, mouse, remote control, touchpad, or voice control.
[0201] The memory 51 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 51 may be an internal storage unit of the computer device 5, such as the hard disk or memory of the computer device 5. In other embodiments, the memory 51 may also be an external storage device of the computer device 5, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the computer device 5. Of course, the memory 51 may also include both the internal storage unit and its external storage device of the computer device 5. In this embodiment, the memory 51 is typically used to store the operating system and various application software installed on the computer device 5, such as computer-readable instructions based on semantic text classification methods. In addition, the memory 51 can also be used to temporarily store various types of data that have been output or will be output.
[0202] In some embodiments, the processor 52 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chip. The processor 52 is typically used to control the overall operation of the computer device 5. In this embodiment, the processor 52 is used to execute computer-readable instructions stored in the memory 51 or to process data, for example, to execute computer-readable instructions for the semantic-based text classification method.
[0203] The network interface 53 may include a wireless network interface or a wired network interface, which is typically used to establish communication connections between the computer device 5 and other electronic devices.
[0204] This embodiment implements the steps of the semantic-based text classification method described above by executing computer-readable instructions stored in memory through a processor. By combining a knowledge graph with a training sample set to extract features from a knowledge-enhanced language model, it can obtain text semantic feature vectors containing rich knowledge information, thus strengthening the feature expression of the text. Inputting the knowledge-enhanced text semantic feature vectors into a capsule network model for classification calculation can further obtain semantic information relationships between words, improve the ability to extract important text features, effectively identify multi-label text, and thus improve the efficiency and accuracy of text classification.
[0205] This application also provides another implementation method, namely, providing a computer-readable storage medium storing computer-readable instructions that can be executed by at least one processor to perform the steps of the semantic-based text classification method described above. By combining a knowledge graph with a training sample set to extract features from a knowledge-enhanced language model, text semantic feature vectors containing rich knowledge information can be obtained, thus strengthening the feature expression of the text. Inputting the knowledge-enhanced text semantic feature vectors into a capsule network model for classification calculation can further obtain semantic information relationships between words, improve the ability to extract important text features, effectively identify multi-label text, and thus improve the efficiency and accuracy of text classification.
[0206] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk), and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0207] Obviously, the embodiments described above are only some embodiments of this application, not all embodiments. The accompanying drawings show preferred embodiments of this application, but do not limit the patent scope of this application. This application can be implemented in many different forms; rather, the purpose of providing these embodiments is to provide a more thorough and comprehensive understanding of the disclosure of this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. Any equivalent structures made using the content of this application's specification and drawings, directly or indirectly applied to other related technical fields, are similarly within the scope of patent protection of this application.
Claims
1. A method for semantic-based text classification, characterized in that, Includes the following steps: Obtain a categorized text dataset and its corresponding knowledge graph. Divide the categorized text dataset into a training sample set and a test sample set. The categorized text dataset includes multiple categorized texts and a category label corresponding to each categorized text. The training sample set and the knowledge graph are input into a pre-constructed knowledge-enhanced language model to obtain knowledge-enhanced text semantic feature vectors. The text semantic feature vector is input into a pre-constructed capsule network model for classification calculation, and the classification prediction result is output. The loss value between the predicted classification result and the classification label is calculated according to a preset loss function; The model parameters of the knowledge-enhanced language model and the capsule network model are adjusted based on the loss value, and iterative training continues until convergence is obtained to obtain the final target model parameters. The model to be verified is then output based on the target model parameters. The test sample set is input into the model to be verified to obtain the verification result. When the verification result meets the preset conditions, the model to be verified is determined to be a text semantic classification model. Obtain the text to be classified, input the text to be classified into the text semantic classification model, and obtain the text classification result.
2. The method of claim 1, wherein, The knowledge-enhanced language model includes a knowledge layer, an embedding layer, a visibility layer, and an encoding layer; the step of inputting the training sample set and the knowledge graph into the pre-constructed knowledge-enhanced language model to obtain knowledge-enhanced text semantic feature vectors includes: The knowledge in the knowledge graph is injected into the text sentences of the training sample set through the knowledge layer to form a sentence tree, and the sentence tree is then input into the embedding layer and the visibility layer respectively. The sentence tree is embedded using the embedding layer to obtain a text position encoding vector; The text visibility matrix of the sentence tree is constructed through the visibility layer; The text position encoding vector and the text visibility matrix are input into the encoding layer for attention calculation, and the text semantic feature vector is output.
3. The semantic-based text classification method according to claim 2, characterized in that, The step of injecting knowledge from the knowledge graph into the text sentences of the training sample set through the knowledge layer to form a sentence tree includes: The knowledge query function of the knowledge layer is invoked to identify all entities corresponding to each text sentence in the training sample set, and to query the triplet corresponding to each entity in the knowledge graph. The knowledge injection function of the knowledge layer is called to embed the triples into the corresponding positions in the text sentence, thereby obtaining a sentence tree.
4. The semantic-based text classification method according to claim 2, characterized in that, The step of performing position embedding on the sentence tree through the embedding layer to obtain the text position encoding vector includes: The sentence tree is input into the embedding layer to perform segment embedding, soft position embedding and word embedding operations respectively, to obtain the corresponding sentence encoding vector, position encoding vector and word encoding vector; The sentence encoding vector, the position encoding vector, and the word encoding vector are summed to obtain the text position encoding vector.
5. The semantic-based text classification method according to claim 2, characterized in that, The step of inputting the text position encoding vector and the text visibility matrix into the encoding layer for attention calculation and outputting the text semantic feature vector includes: Determine the query vector parameter matrix, key vector parameter matrix, and value vector parameter matrix of the encoding layer; Calculate self-attention based on the query vector parameter matrix, the key vector parameter matrix, the value vector parameter matrix, the text position encoding vector, and the text visibility matrix; Multi-head attention calculation is performed based on the self-attention to obtain the text semantic feature vector.
6. The semantic-based text classification method according to claim 1, characterized in that, The capsule network model includes convolutional layers, capsule layers, and classification layers; the step of inputting the text semantic feature vector into the pre-constructed capsule network model for classification calculation and outputting the classification prediction result includes: The text semantic feature vector is input into the convolutional layer for convolutional feature extraction to obtain the convolutional feature vector; Text aggregation is performed on the convolutional feature vectors through the capsule layer to obtain a global semantic vector containing contextual semantics; The global semantic vector is input into the classification layer for classification prediction, and the classification prediction result is output.
7. The semantic-based text classification method according to claim 6, characterized in that, The capsule layer includes a main capsule layer and a digital capsule layer; the step of performing text aggregation on the convolutional feature vector through the capsule layer to obtain a global semantic vector containing contextual semantics includes: The convolutional feature vector is input into the main capsule layer for a one-dimensional convolution operation to obtain a vector capsule. The vector capsule is input into the digital capsule layer, and a dynamic routing algorithm is used to map the vector capsule to obtain a global semantic vector.
8. A semantic-based text classification device, characterized in that, include: The acquisition module is used to acquire a categorized text dataset and a corresponding knowledge graph, and to divide the categorized text dataset into a training sample set and a test sample set. The categorized text dataset includes multiple categorized texts and a category label corresponding to each categorized text. The text enhancement module is used to input the training sample set and the knowledge graph into a pre-constructed knowledge-enhanced language model to obtain a knowledge-enhanced text semantic feature vector. The classification prediction module is used to input the text semantic feature vector into a pre-constructed capsule network model for classification calculation and output the classification prediction result; The loss calculation module is used to calculate the loss value between the predicted classification result and the classification label according to a preset loss function; The adjustment module is used to adjust the model parameters of the knowledge-enhanced language model and the capsule network model based on the loss value, continue iterative training until convergence, obtain the final target model parameters, and output the model to be verified based on the target model parameters. The verification module is used to input the test sample set into the model to be verified, obtain the verification result, and determine the model to be verified as a text semantic classification model when the verification result meets the preset conditions. The classification module is used to obtain the text to be classified, input the text to be classified into the text semantic classification model, and obtain the text classification result.
9. A computer device comprising a memory and a processor, the memory storing computer-readable instructions, wherein the processor, when executing the computer-readable instructions, implements the steps of the semantic-based text classification method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps of the semantic-based text classification method as described in any one of claims 1 to 7.