An Improved Classification Method of Expanding Feature Vectors of Short Text Words

A technology of eigenvectors and eigenvector sets, which is applied in text database clustering/classification, text database query, unstructured text data retrieval, etc., can solve the problem of few short text eigenvectors, improve classification performance, alleviate The effect of degree of bias

Active Publication Date: 2022-05-03
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The object of the present invention is: for the defect that prior art exists, propose a kind of classification improvement method of extended short text word feature vector, use the word2vec technology in the neural probabilistic language model to carry out word embedding to train the word vector in the extended short text, with Solve the technical problem of fewer short text feature vectors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An Improved Classification Method of Expanding Feature Vectors of Short Text Words
  • An Improved Classification Method of Expanding Feature Vectors of Short Text Words
  • An Improved Classification Method of Expanding Feature Vectors of Short Text Words

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] In order to make the purpose, implementation and advantages of the present invention clearer, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings:

[0049] The classification improvement method of the extended short text word feature vector based on the Word2vec model provided by the present invention, its flow process is as follows figure 1 As shown, it specifically includes the following steps:

[0050] Step 1. Collect the corpus as short text training set and test set. For the short text training set, use the sorted and classified news corpus. The data set includes news headlines and news content. The text uses the original news headline data set as the short text In this dataset, the content dataset is used as the background corpus dataset.

[0051] Step 2. Preprocess the short text corpus including the short text training set, the corpus and the short text test set, including the Chinese ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to an improved method for classifying extended short text word feature vectors. The method introduces the Word2vec language model to carry out word embedding for short texts, expands short text feature vectors to solve the sparsity of short texts, and converts word vectors into probabilistic semantic distributions. To measure the semantic relevance; for the extended feature vector of the short text, use the improved feature weight algorithm and introduce the semantic relevance to process the expanded word feature vector. This method can distinguish the importance of words in the expanded short text to obtain more accurate semantic correlation, which can effectively improve the classification effect of the short text. Experiments prove that the method in this paper can ensure the accuracy of short text mining feature vectors, and at the same time greatly improve the accuracy of short text classification, and the effect is remarkable. It can be used in decision-making directions in various fields such as hot topic classification and mining, monitoring public opinion information, etc., and has strong practical value.

Description

technical field [0001] The invention relates to an improved classification method for expanding short text word feature vectors, in particular to an improved classification method for word embedding and extended short text word feature vectors based on a Word2vec model, and belongs to the technical field. Background technique [0002] With the rapid development of social networks and e-commerce, short text forms such as Weibo, Twitter, product reviews, and real-time news push have become the mainstream content of the Internet. Short text is generally defined as shorter in length, ranging from 10 to 140 words. Research on the classification and mining of hot topics in short texts and the monitoring of network public opinion information have important application prospects for decision-making in various fields. Therefore, how to efficiently and correctly mine short texts has become a hot research direction. [0003] For conventional text classification, most of them use the t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/33G06F16/335
Inventor 王诚孟涛
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products