Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for judging document relevance based on plsa algorithm

A correlation and document technology, applied in the field of network communication, can solve the problem of long computing time, and achieve the effect of solving the long computing time, improving computing efficiency, and reducing the amount of computing.

Active Publication Date: 2016-10-26
BEIJING QIHOO TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the calculation takes a very long time, and it often takes a whole day when the amount of data is large.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for judging document relevance based on plsa algorithm
  • Method and device for judging document relevance based on plsa algorithm
  • Method and device for judging document relevance based on plsa algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0023] Embodiments of the present invention provide a method and device for judging document relevance based on the PLSA algorithm, so as to solve the problem of long calculation time in the prior art.

[0024] figure 1 It shows a flowchart of a method for judging document relevance based on the PLSA algorithm provided by an embodiment of the present invention, as shown in figure 1 As shown, the method starts at step S110. In step...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and device for judging document correlation based on a PLSA algorithm. The method comprises the steps that elements to be processed and categories corresponding to the elements are determined according to the information of a document; the number N of the elements is determined as the column number of an array needing to be calculated in the PLSA algorithm, and the number M of the categories is determined as the row number of a calculation array, wherein the array data in the calculation array represent the corresponding relations between the elements and the categories, and the N and the M are both natural numbers; the N elements to be processed are converged into the M categories through a hard clustering algorithm, and thus membership values of each element in all the categories are obtained; initialization is carried out on the array data in the calculation array according to the membership values of each element in all the categories, iterative operation is carried out on the array data after the initialization according to the PLSA algorithm, and whether the document is correlative or not is judged according to the operation results. Therefore, the problem of large calculating time consumption in the prior art is solved.

Description

technical field [0001] The invention relates to the technical field of network communication, in particular to a method and device for judging document relevance based on PLSA algorithm. Background technique [0002] At present, the traditional method used to judge the relevance of two documents is to judge by looking at the number of words that appear in the two documents together. For example, algorithms such as TF-IDF (term frequency–inverse document frequency) can be used to judge. However, this method only uses a simple vocabulary matching method to judge, and does not take into account the semantic association behind the words. Sometimes, there may be few or no words that appear in two documents, but due to the semantics of the two documents related, resulting in the two documents being substantially similar. Therefore, in order to improve the accuracy of judgment results, it is also necessary to consider the semantic association of vocabulary when judging the relevan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/334G06F16/35
Inventor 何锐邦唐会军
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products