Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Short Text Feature Extraction Method

A feature extraction, short text technology, applied in special data processing applications, instruments, calculations, etc., can solve the problems of sparse short text features, unclear topics of short texts, etc., to reduce processing difficulty, improve results, and improve accuracy. Effect

Active Publication Date: 2018-03-30
PEKING UNIV
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] A feature extraction method for short texts. This method extracts features from short texts based on the knowledge base and syntactic analysis methods. By calculating the weight of each topic, the topic vector is used as the final feature vector of the short text to solve the problem of short text feature sparsity and Issues with short texts with unclear topics; including model training process and feature extraction process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Short Text Feature Extraction Method
  • A Short Text Feature Extraction Method
  • A Short Text Feature Extraction Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0056] The present invention provides a short text feature extraction method, which extracts short text features based on the knowledge base and syntactic analysis method, calculates the weight of each topic, and uses the topic vector as the final feature vector of the short text to solve the problem of short text The problem of sparse features and unclear topics in short texts; including the model training process and feature extraction process.

[0057] The short text data can be divided into training set data, verification set data and test set data. The short text feature extraction method specifically includes the following steps:

[0058] 1. Model training process: train on the training set data; use the verification set data to verify, and obtain the weight group W corresponding to the highes...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short text feature extraction method, which is based on a knowledge base and a syntax analysis method to perform feature extraction on short texts, including a model training process and a feature extraction process; training is performed on training set data; The weight group W corresponding to the highest accuracy rate and the training model M corresponding to the highest accuracy rate; after the feature extraction process processes the test set data, each category is assigned a weight group W; the ESA algorithm is used to map the short text to the concept space, Get the explanation vector of the short text; get the topic vector through LDA, as the final feature vector of the short text, as the feature of the short text. The method provided by the invention can solve the problems of sparse short text features and unclear topics of short texts; reduce the processing difficulty of short text feature extraction, improve the results of short text feature extraction, and improve the accuracy of text classification.

Description

technical field [0001] The invention relates to text feature extraction and text classification methods, in particular to a short text feature extraction method. Background technique [0002] With the development of applications such as Weibo, social networking sites, and hotlines, more and more information is presented in the form of short texts, and the growth is explosive. Text mining technology can help people quickly and effectively obtain key information from massive data, and text feature extraction is a key step in text mining. [0003] Most of the existing text feature extraction methods use the method based on the Bag of Words (bag of words) model. This method usually achieves better results when used in long texts, but often does not work well when used in short texts. The main reason is that compared with long texts, short texts have the characteristics of sparse features and unclear topics. First of all, due to the limitation of the length of the short text, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
Inventor 童云海叶少强关平胤李凡丁刘文一何晓宇
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products