Topic mining method and apparatus

A topic and matrix technology, applied in the field of topic mining methods and devices, can solve the problems of large amount of calculation and low efficiency of topic mining

Active Publication Date: 2016-02-17
XFUSION DIGITAL TECH CO LTD
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since the BP algorithm contains a large number of iterative calculations, that is to say, the current document-topic matrix and the current word-topic matrix of the LDA model are repeated multiple times to calculate each non-zero element in the word-document matrix to obtain the word-document After the message vector of each non-zero element in the matrix, the process of updating the current document-topic matrix and the current word-topic matrix according to all the above-mentioned message vectors, until the message vector,...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic mining method and apparatus
  • Topic mining method and apparatus
  • Topic mining method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are the embodiment of the present invention Some, but not all, embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0033] figure 1 A schematic flow chart of a topic mining method provided by an embodiment of the present invention, such as figure 1 As shown, this embodiment may include:

[0034] 101. According to the current document-topic matrix and the current word-topic matrix of the latent Dirichlet distribution LDA model, the non-zero elements of the word-document matrix ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a topic mining method and apparatus. When an iterative process is performed each time, a target message vector is determined from a message vector according to a residual error of the message vector; a current document-topic matrix and a current word-topic matrix are updated only according to the target message vector; and a target element in a word-document matrix corresponding to the target message vector is only calculated according to the current document-topic matrix and the current word-topic matrix, so that the calculation of all non-zero elements in the word-document matrix in each iterative process is avoided, the update of the current document-topic matrix and the current word-topic matrix according to all message vectors is avoided, the calculation amount is greatly reduced, the topic mining speed is increased, and the topic mining efficiency is improved.

Description

technical field [0001] Embodiments of the present invention relate to information technology, and in particular, to a topic mining method and device. Background technique [0002] Topic mining is the process of clustering semantically related words in a large-scale document set by using the machine learning model of the Latent Dirichlet Allocation (LDA) model, so as to obtain each word in the large-scale document set in the form of a probability distribution. The topic of a document, that is, the theme expressed by the author through the document. [0003] Topic mining in the prior art needs to first train the LDA model based on the training documents using the Belief Propagation (BP) algorithm, and determine the model parameters of the trained LDA model, namely the word-topic matrix Φ and the document-topic matrix θ, and then input the word-document matrix of the document to be tested into the trained LDA model for topic mining, so as to obtain the document-topic matrix θ'...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06N20/00
CPCG06N20/00G06F40/30G06N7/01G06F16/2465G06F16/93G06F16/2237G06F17/16
Inventor 曾嘉袁明轩张世明
Owner XFUSION DIGITAL TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products