Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Short text classification method based on semantic enhancement

A classification method and short text technology, applied in semantic analysis, natural language data processing, special data processing applications, etc., can solve the problems of inaccurate and complex word vector training sets, and low-quality expanded corpus, and achieve shortened training time, The effect of classification performance improvement

Active Publication Date: 2018-07-13
中国人民解放军军事科学院军事科学信息研究中心
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to overcome the shortcomings of the existing short text corpus expansion method that is too complicated, the quality of the expanded corpus is low, and the word vector training set is not accurate, and a relatively simple high-quality corpus expansion method is proposed, so as to perform high-quality short texts on short texts. Semantic representation performance is enhanced by quality corpus augmentation and word vector training using precise corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text classification method based on semantic enhancement
  • Short text classification method based on semantic enhancement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The present invention will be further described now in conjunction with accompanying drawing.

[0025] refer to figure 1 , the short text classification method based on semantic enhancement of the present invention includes two methods, the corpus expansion method and the training word vector method, and the high-quality field-related corpus obtained by using the corpus expansion method is used as a new training set, and the training set obtained by using the training word vector method Precise semantic relationship word vectors are used as auxiliary information to jointly train the text classifier, so as to obtain the optimal classification effect.

[0026] Specific process reference figure 2 , first, use each piece of short text information in the short text training set as the input retrieval keyword information of the Internet search engine, and the search engine will generally list multiple retrieval results. Due to the built-in sorting algorithm of the search en...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short text classification method based on semantic enhancement. The method comprises the steps of 1, constructing a short text classifier, obtaining a short text training setrelevant to the field from internet resources, expanding corpuses and training word vectors for each short text, and training the short text classifier; 2, after expanding the corpuses and training the word vectors for each to-be-classified short text, inputting the to-be-classified short texts to the short text classifier obtained in step 1 for classification to obtain a classification result. By means of the short text classification method based on the semantic enhancement, the semantics of the short texts is enhanced and the texts are classified, aiming at the features of the short textsthat the information amount is small and the semantics is sparse, a method of expanding the corpuses with high quality and training the word vectors with high precision is utilized to conduct semanticenhancement representation on the short texts; meanwhile, an efficient text classification algorithm is utilized, the finite features of the texts are captured to the greatest extent, and the training time of the classifier is effectively shortened.

Description

technical field [0001] The invention relates to the field of computational linguistics, in particular to the field of computer natural language processing, in particular to a short text classification method based on semantic enhancement. Background technique [0002] At present, with the rapid development of the electronic technology industry, many short texts such as Weibo, comments, and WeChat are transmitted to our mobile terminals through the network every day in our lives, and these short text messages show explosive growth. In order to better cope with such a rapidly growing amount of information, text classification technology came into being. Short text has the characteristics of less text information and sparse features, so compared with long text, the realization of automatic classification of short text is more challenging. Facing this challenge, researchers have expanded the corpus of short texts due to their short content and sparse features, and then used exi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/353G06F16/355G06F40/289G06F40/30
Inventor 尹忠博罗威罗准辰谭玉珊武帅牛海波毛彬田昌海叶宇铭
Owner 中国人民解放军军事科学院军事科学信息研究中心
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products