Dynamic short text cluster searching method

A short text, dynamic technology, applied in the fields of unstructured text data retrieval, text database clustering/classification, special data processing applications, etc. , to achieve the effect of improving performance and efficiency

Active Publication Date: 2018-05-04
NANJING UNIV OF INFORMATION SCI & TECH
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Aiming at the problem of accurate retrieval of massive short text streams in social media, the present invention proposes a method based on dynamic polynomial hybrid Clustering Retrieval Method of Topic Model
The present invention realizes the keyword retrieval function that changes with time by establishing a dynamic topic model, and solves the problems of sparsity of short text data and lack of information through the polynomial mixed topic model, and improves the efficiency and performance of information retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dynamic short text cluster searching method
  • Dynamic short text cluster searching method
  • Dynamic short text cluster searching method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] Below in conjunction with accompanying drawing, the implementation of technical scheme is described in further detail:

[0023] The dynamic short text stream clustering retrieval algorithm described in the present invention is further described in detail in conjunction with the flow chart and the implementation case.

[0024] like figure 1 As shown, this method includes the following steps:

[0025] Step 1) Use the word segmentation technology to obtain the word sequence of each document from the short text data stream in the experiment, filter and remove stop words and noise words such as punctuation marks in this sequence to obtain the corresponding characteristic word sequence, and use it as the input data of the experiment.

[0026] Step 2) Carry out short-term topic modeling on the short text stream in each time period, and obtain a topic-based distribution model of documents and a topic-based distribution model of feature words.

[0027] Step 201), such as fig...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a dynamic short text cluster searching method. According to the method, short text stream data are used for building a short-term topic model and a long-term historic topic model is synthesized to amend the short-term topic model in a data stream to obtain the probability distribution of topics and feature words, clustering is performed by the conditional probability of thetext and the topics, and dynamic accurate searching of the keywords is formed. The dynamic topic model is built, the keyword searching function changing along the time is realized, the problems of sparsity of the short text data, information loss and the like are solved by a polynomial mixed topic model, and the efficiency and the performance of information searching are improved.

Description

technical field [0001] The invention belongs to the field of document retrieval, and specifically relates to a clustering retrieval algorithm based on a dynamic polynomial mixed topic model. Background technique [0002] The multinomial mixed topic model is an important method for topic detection in the text mining neighborhood. Among them, the latent Dirichlet distribution model (LDA) is one of the most famous and simplest methods in the topic model. It uses the probability distribution of words in the document to input The document set is decomposed into potential topic sets. The multinomial mixed topic model assumes that each document is generated by a mixed model, and there is a one-to-one correspondence between each mixed part and a cluster, and iteratively traverses all documents in the text stream and calculates the conditional probability of the document and each cluster , to re-partition the document to obtain the final topic model. [0003] Clustering algorithm i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/3346G06F16/35
Inventor 马廷淮赵雨薇周宏豪曹杰
Owner NANJING UNIV OF INFORMATION SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products