Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for expanding feature space of short text

A technology of feature space and expansion method, applied in the field of short text feature space expansion, can solve problems such as insufficient short text feature space, and achieve the effect of improving quality

Inactive Publication Date: 2010-07-07
WUHAN UNIV OF TECH
View PDF0 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Therefore, it is necessary to provide a method to expand the feature space of short text to solve the problem of insufficient feature space of short text itself

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for expanding feature space of short text
  • Method for expanding feature space of short text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0013] Embodiments of the present invention will now be described with reference to the drawings, in which like reference numerals represent like elements.

[0014] Such as figure 1 , the short text feature space expansion method of this embodiment includes the following steps:

[0015] Step S1, selecting the expansion source of the short text feature space;

[0016] Step S2, performing text preprocessing on the text data from the extended source, and obtaining a document-term matrix (document-term matrix) as a training set;

[0017] Step S3, establishing a shallow Dirichlet Allocation topic model (Latent Dirichlet Allocation, LDA) on the document-word matrix of the training set;

[0018] Step S4, representing each short text as a term vector: [term1, term2, ..., termx];

[0019] Step S5, using the word vector of the short text as the input of the shallow Dirichlet Allocation (LDA) topic model, and outputting the latent topic probability distribution related to the short te...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for expanding a feature space of a short text, comprising the following steps of: (1) selecting an expansion source of the feature space of the short text; (2) preprocessing texts of the text data of the expansion source to obtain a document-word matrix used as a training set; (3) establishing a latent Dirichlet allocation theme model on the document-word matrix of the training set; (4) expressing each short text into a word vector; (5) outputting to obtain underlying theme probability distribution related to the short text by using the word vector of the short text as the input of the latent Dirichlet allocation theme model; (6) expressing a theme of an underlying theme into a theme vector; and (7) combining the theme vector with the word vector together to form the short text with an expanded feature space. The invention expands the feature space of the short text by combining the theme vector determined by the underlying theme with the word vector determined by the short text together and can effectively improve the processing quality of short text information.

Description

technical field [0001] The invention relates to the field of short text mining, in particular to a short text feature space expansion method. Background technique [0002] As a novel communication medium, the Internet has incorporated information from various aspects such as culture, history, and society after only a few decades of development. With the rapid development of network applications such as news comments, BBS, blogs, chat rooms, and aggregate news (RSS), mobile phone short messages, instant messages from instant messaging software, chat records in online chat rooms, BBS titles, and blog comments have emerged. , news comments and other forms of short text (text data with a relatively short length). At present, the amount of short text data is increasing day by day, and the text mining of short texts has broad application prospects in fields such as topic tracking and discovery, buzzword analysis, and public opinion early warning. [0003] However, the processing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 李琳钟珞胡燕刘东飞
Owner WUHAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products