Document topic mining method and apparatus

A document topic and mining device technology, applied in the field of information processing, can solve the problems of poor correlation of document topic content, insufficient comprehensive and accurate document topic mining process, etc., to achieve comprehensive and accurate mining and improve the effect of correlation

Active Publication Date: 2016-01-13
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the above-mentioned process of PLSA’s semantic mining of documents only considers the relevance of words appearing in the context, and uses multinomial distributions on the vocabulary to re

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document topic mining method and apparatus
  • Document topic mining method and apparatus
  • Document topic mining method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, and are intended to explain the present application, and should not be construed as limiting the present application.

[0019] The following describes the document topic mining method and device according to the embodiments of the present application with reference to the accompanying drawings.

[0020] figure 1 It is a flow chart of a document topic mining method according to an embodiment of the present application.

[0021] Such as figure 1 As shown, the document topic mining method includes:

[0022] Step 101, according to the preset number of topic mining, use the probabilistic latent semantic analysis model to perform cyclic and iter...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present application proposes a document topic mining method and apparatus. The method comprises: according to a preset topic mining number, performing loop iteration processing on information in at least one received document based on a probabilistic latent semantic analysis model, and acquiring a posteriori estimate of each topic implied by each sentence in each document; according to the posteriori estimate of each topic, acquiring a membership weight of each word in each topic in each sentence; and generating a topic set corresponding to the topic mining number, wherein each topic set comprises a word related to each topic and screened out according to the membership weight of each word in each topic in the sentence. According to the document topic mining method and apparatus provided by the present application, the document topic is more comprehensively and accurately mined based on a PLSA (Probabilistic Latent Semantic Analysis) algorithm, and the correlation of document topic content is improved, thereby enabling a result of a search engine to be closer to semantic information of the document.

Description

technical field [0001] The present application relates to the technical field of information processing, and in particular to a document topic mining method and device. Background technique [0002] At present, the way people obtain information on the Internet is mainly through search engines. The results of traditional document retrieval largely depend on the literal matching of documents, and cannot handle the hidden semantic information of documents well. [0003] Therefore, in the prior art, topic models are used to perform semantic mining on documents, and a commonly used topic model algorithm is PLSA (Probabilistic Latent Semantic Analysis, Probabilistic Latent Semantic Analysis) algorithm. Based on the topic model algorithm, the search engine can automatically obtain the topic distribution behind the document, so that the results of the search engine are closer to the semantic information of the document, thereby reducing the cost for users to obtain information. It ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/313G06F40/216
Inventor 姜迪石磊
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products