Unsupervised text classification system and method

A text classification, unsupervised technology, applied in semantic analysis, instrument, character and pattern recognition, etc., can solve the problems of increasing the difficulty of manual labeling and low efficiency of manual labeling, and achieve the effect of avoiding high labor costs

Pending Publication Date: 2020-03-17
成都数联铭品科技有限公司
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Text classification refers to assigning a specific label to the text. At present, supervised machine learning methods are usually used to train the model, and then text classification based on the model has a certain accuracy, but there are also defects.
For example, supervised methods require a lot of manpower for corpus labeling. If there are hundreds or thousands of classification labels, the difficulty of manual labeling will be greatly increased, and the efficiency of manual labeling will be relatively low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised text classification system and method
  • Unsupervised text classification system and method
  • Unsupervised text classification system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

[0027] see figure 1 , the unsupervised text classif...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an unsupervised text classification method and system, and the method comprises the steps: setting a seed keyword for each classification label; performing semantic similar word expansion on the seed keyword by adopting a pre-trained word vector to obtain an expanded keyword; encoding the seed keywords and the extended keywords into word vectors; converting the to-be-classified texts into text vectors; and classifying the text vectors based on the word vectors. According to the method and the system, when the texts are classified, manual annotation is not needed, so that the manual annotation cost can be reduced, and the text classification efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to an unsupervised text classification system and method. Background technique [0002] Natural language processing (NLP) is an important direction in the field of computer science and artificial intelligence, usually including sentence classification, text classification, information extraction and other branches. Text classification refers to assigning a specific label to text. At present, supervised machine learning methods are usually used to train models, and then text classification based on this model has a certain accuracy rate, but there are also defects. For example, supervised methods require a lot of manpower for corpus labeling. If there are hundreds or thousands of classification labels, the difficulty of manual labeling will be greatly increased, and the efficiency of manual labeling will be relatively low. Contents of the invention [0003] The...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06F40/30
CPCG06F18/24147
Inventor张发展刘世林罗镇权李焕
Owner成都数联铭品科技有限公司