Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and apparatus for determining the topic distribution of a given text

A topic distribution and text technology, applied in the Internet field, to achieve the effect of improving efficiency

Active Publication Date: 2017-03-29
BEIJING QIHOO TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Existing schemes still have a lot of room for improvement in the speed of text topic distribution extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for determining the topic distribution of a given text
  • Method and apparatus for determining the topic distribution of a given text
  • Method and apparatus for determining the topic distribution of a given text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.

[0024] In the prior art, the extraction of text topic distribution generally adopts the expectation maximization (EM, Expectation-maximization) method:

[0025] The training samples include the text D 1 , D 2 , …, D n , ..., first, through text topic model training can get:

[0026] The training samples contain the word w 1 , w 2 ,...,w j ,..., topic z 1 ,z 2 ,…,z i ,…;as well as

[0027] p(w|z)——Word distribution...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Disclosed are a method and apparatus for determining the topic distribution of a given text, the method comprising: determining a specific word appearing in the given text and the frequency of the specific word appearing in the given text, and the specific word belonging to a word set contained in a training sample; obtaining the topic distribution of the specific word according to the training result of topic model training on the training sample; and determining the topic distribution of the given text according to the frequency of the specific word appearing in the given text and the topic distribution of the specific word. The method and apparatus can increase the efficiency in extracting the topic distribution of the text, and can also reduce the extra overhead of memory, CPU (Central Processing Unit) and other such system resources, which is caused by introduction of plenty of intermediate variables during the process of extracting the topic distribution of the text.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method and a device for determining topic distribution of a given text. Background technique [0002] Topic Model is a statistical model used to discover abstract topics in a series of texts. A text usually may contain many kinds of topics, and the proportion of each topic is different. A topic model attempts to characterize the topic distribution of texts in a mathematical framework. The topic model can automatically analyze each text, count the words in the text, and judge which topics are contained in the current text and the proportion of each topic according to the statistical information. [0003] The topic model is not only a popular research object in the field of machine learning and data mining, but also has been practically applied in many fields. For example, in the field of search engines, the correlation between query words (Query) and web pages involves te...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/313G06F16/353G06F16/36
Inventor 胡德勇
Owner BEIJING QIHOO TECH CO LTD