Dynamic topic model-based dynamic text cluster device and method

A topic model and text clustering technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of polysemous words, data sparse, feature dimension disaster, etc., to reduce huge overhead, guarantee Comparability, the effect of improving the clustering effect

Inactive Publication Date: 2013-02-06
人民搜索网络股份公司
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the main purpose of the present invention is to provide a dynamic text clustering device and method based on a dynamic topic model to solve the problem of effective clustering of dynamic text data sets, so as to realize the combination of dynamic topic models and dynamic clustering algorithms , to solve the problems inherent in traditional term-based text features such as one word with multiple meanings, one meaning with multiple words, data sparseness, feature dimension disaster, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dynamic topic model-based dynamic text cluster device and method
  • Dynamic topic model-based dynamic text cluster device and method
  • Dynamic topic model-based dynamic text cluster device and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The device and method of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments of the present invention.

[0038] figure 1 It is a schematic composition diagram of a dynamic text clustering device based on a dynamic topic model according to an embodiment of the present invention, such as figure 1 As shown, the device mainly includes a news collection module, a news initial feature extraction module, a dynamic feature transformation module and a dynamic clustering module; where:

[0039] The news collection module is used for collecting news data on the Internet.

[0040] The news initial feature extraction module is used to perform initial feature extraction on the collected news data.

[0041] The dynamic feature transformation module (reduce) is used to perform dynamic feature transformation on the extracted initial features.

[0042] The dynamic clustering module is used to dynamically cluster...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a dynamic topic model-based dynamic text cluster device and method. The device comprises a news acquisition module, a news initial feature extracting module, a dynamic feature converting module and a dynamic cluster module, wherein the news acquisition module is used for acquiring news data in the Internet; the news initial feature extracting module is used for extracting the initial feature of the acquired news data; the dynamic feature converting module is used for carrying out dynamic feature conversion on the extracted initial feature; and the dynamic cluster module is used for carrying out dynamic cluster on a news data set based on the converted feature. With the adoption of the device and the method, the problem of effective cluster of the dynamic text data set is solved so as to combine a dynamic topic model with a dynamic cluster algorithm, and the inherent problems of polysemy, multiple words for one meaning, data sparsity, feature dimension disaster and the like of the text feature based on a term are solved.

Description

technical field [0001] The invention relates to machine learning and pattern recognition technology, in particular to a dynamic text clustering device and method based on a dynamic topic model. Background technique [0002] With the explosive development of information technology, the text to be processed by computers is increasing rapidly. Text clustering is a common text processing method, which gathers similar texts from a certain point of view, that is, it may be used as a direct output information, or it may be used as the basis for further processing of the text. important meaning. [0003] Most texts are described in natural language. To perform clustering, features must be extracted and converted into quantitative descriptions. However, conventional feature extraction methods are difficult to avoid problems such as data sparsity, high feature dimension, one word with multiple meanings, and one meaning with multiple words, which will have a great adverse effect on...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 李德聪杨青
Owner 人民搜索网络股份公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products