Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Short text feature extraction method

A feature extraction and short text technology, applied in special data processing applications, instruments, calculations, etc., can solve problems such as unclear topics of short texts and sparse features of short texts

Active Publication Date: 2015-10-21
PEKING UNIV
View PDF9 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] A feature extraction method for short texts. This method extracts features from short texts based on the knowledge base and syntactic analysis methods. By calculating the weight of each topic, the topic vector is used as the final feature vector of the short text to solve the problem of short text feature sparsity and Issues with short texts with unclear topics; including model training process and feature extraction process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text feature extraction method
  • Short text feature extraction method
  • Short text feature extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0056] The present invention provides a short text feature extraction method, which extracts short text features based on the knowledge base and syntactic analysis method, calculates the weight of each topic, and uses the topic vector as the final feature vector of the short text to solve the problem of short text The problem of sparse features and unclear topics in short texts; including the model training process and feature extraction process.

[0057] The short text data can be divided into training set data, verification set data and test set data. The short text feature extraction method specifically includes the following steps:

[0058] 1. Model training process: train on the training set data; use the verification set data to verify, and obtain the weight group W corresponding to the highes...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short text feature extraction method that performs feature extraction on a short text based on a knowledge base and a syntactic analysis method. The method comprises a model training process and a feature extraction process. The method comprises: performing training according to training set data; performing validation by using validation set data, and obtaining a weight set W that corresponds to a highest accuracy rate and a training model M that corresponds to the highest accuracy rate; after the feature extraction process performs processing for test set data, assigning the weight set W to each category; mapping the short text in a conceptual space by using an ESA algorithm, thereby obtaining an interpretation vector of the short text; and obtaining a topic vector through LDA, and using the vector as a final feature vector of the short text and a feature of the short text. The method provided by the invention can solve the problem that the short text is sparse in text feature and unclear in theme; and the method can reduce the difficulty in short text feature extraction processing, enhance the result of short text feature extraction, and improve accuracy of text classification.

Description

technical field [0001] The invention relates to text feature extraction and text classification methods, in particular to a short text feature extraction method. Background technique [0002] With the development of applications such as Weibo, social networking sites, and hotlines, more and more information is presented in the form of short texts, and the growth is explosive. Text mining technology can help people quickly and effectively obtain key information from massive data, and text feature extraction is a key step in text mining. [0003] Most of the existing text feature extraction methods use the method based on the Bag of Words (bag of words) model. This method usually achieves better results when used in long texts, but often does not work well when used in short texts. The main reason is that compared with long texts, short texts have the characteristics of sparse features and unclear topics. First of all, due to the limitation of the length of the short text, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 童云海叶少强关平胤李凡丁刘文一何晓宇
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products