A tag-based document classification method, system, device and storage medium

A document classification and labeling technology, applied in text database clustering/classification, unstructured text data retrieval, instruments, etc., can solve the problem of classification effect, which is difficult to meet the explosive growth of text classification requirements and label dimensions in the Internet field. Simple and other problems, to achieve the effect of more distinguishing and excellent classification performance

Active Publication Date: 2022-03-25
广州锋网信息科技有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] 1. The method of manual labeling, which works well in a small amount of data and low latitude, but the labeling efficiency is low, the cost is high, and the label dimension is too simple, which is difficult to meet the explosive growth of text classification requirements in the Internet field;
[0004] 2. Traditional machine learning methods, such as naive Bayesian, SVM, etc., have good results, but rely on handcrafted rules or features, the workload is heavy, and information omission and redundancy are prone to occur, which affects the classification effect;
[0005] 3. The existing deep learning methods, such as fasttext, textcnn, etc., can automatically learn relevant useful features, but they often only focus on using the content information of the text to extract features, while ignoring the inherent writing structure of the document

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A tag-based document classification method, system, device and storage medium
  • A tag-based document classification method, system, device and storage medium
  • A tag-based document classification method, system, device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] Embodiments of the present invention are described in detail below, and examples of the embodiments are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention. For the step numbers in the following embodiments, it is only set for the convenience of illustration and description, and the order between the steps is not limited in any way. The execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art sexual adjustment.

[0047] First, explain the technical terms involved in the technical solution of this application:

[0048] A bidirectional gating recurrent unit (BI-GRU) is a neural network model composed of a GRU that is unidirectiona...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A label-based document classification method, system, device and storage medium provided by the present invention, the method includes the following steps: obtaining the text to be classified and the label, performing word segmentation on the text to be classified to obtain a word embedding vector; Segment the word to get the tag embedding vector; merge the word embedding vector to determine the first embedding sequence; get the sentence embedding vector according to the tag embedding vector and the word weight coefficient in the first embedding sequence; merge the sentence embedding vector to determine the second embedding vector, according to The label embedding vector and the sentence weight coefficient in the second embedding sequence are used to obtain the text embedding vector; the classification probability of the text to be classified is determined according to the text embedding vector, and classification is performed according to the classification probability; the document representation vector adopted by the method is more distinguishable, so that Obtain more excellent classification performance, and can be widely used in the field of natural language processing technology.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a tag-based document classification method, system and storage medium. Background technique [0002] With the development of computing technology and the popularization of the Internet, network information resources are growing explosively. In the face of massive and chaotic network information, the problem faced by users is no longer how to obtain information, but how to obtain large-scale information resources. In order to efficiently and accurately find information that can meet their own needs, realize intelligent and efficient classification of massive network information, accurate identification of user interests, and personalized information recommendation. Therefore, the labeling and classification of information is of great significance. Existing text classification methods include the following categories: [0003] 1. The method of manual la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289
CPCG06F16/35G06F40/289
Inventor 尹龙姬旭光王苏洲
Owner 广州锋网信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products