A method and device for acquiring page similarity

A similarity and page technology, applied in the computer field, can solve the problems of high misjudgment rate, low accuracy of page similarity judgment, and no consideration of different weights of different page blocks, etc., and achieve the effect of high accuracy.

Active Publication Date: 2018-05-01
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The defect of the prior art is that because the different weights of different page blocks in the whole page are not considered, when one of the two pages compared with each other contains, for example, a message block, the keyword overlap of the two pages may not be high, However, the content of other page blocks of the two pages may be similar, which will cause a high rate of misjudgment, resulting in low accuracy in judging page similarity, further reducing the accuracy of filtering duplicate web pages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for acquiring page similarity
  • A method and device for acquiring page similarity
  • A method and device for acquiring page similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0024] figure 1 A schematic diagram of a device for acquiring page similarity according to one aspect of the present invention is shown. Wherein, the acquisition device 1 includes a first similarity determination means 111 and a second similarity determination means 112 .

[0025] Here, the acquisition device 1 is a network device, wherein the network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets, or a cloud composed of multiple servers. (Cloud Computing) consists of a large number of computers or network servers. Among them, cloud computing is a kind of distributed computing, which is a super virtual computer composed of a group of loosely coupled computer sets.

[0026] Refer to the following figure 1 Let's describe in detail the process of obtaining device 1 to obtain page s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The purpose of the present invention is to provide a method and device for acquiring page similarity. In the present invention, first determine the block similarity between one or more page blocks in one page and one or more page blocks in another page, and then according to the weight of each page block and its block similarity in the two pages , weighting determines the page similarity of the two pages, so that the weight of the page block is introduced into the page similarity judgment standard, and through the accurate weighting of different page blocks, the value difference of different page blocks is reflected, so as to obtain more accurate The judging result of page similarity further provides a guarantee for higher accuracy in filtering duplicate web pages.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a technology for acquiring page similarity. Background technique [0002] In the prior art, the similarity determination of webpages is generally based on webpage keywords. For example, webpages can be parsed to extract webpage keywords, and then query other webpages that contain all or most of the webpage keywords, and then calculate the two The keyword overlap of web pages is used to determine their page similarity. The defect of the prior art is that due to not taking into account the different weights of different page blocks in the entire page, when one of the two pages compared with each other contains, for example, a message block, the keyword overlap of the two pages may not be high, However, the content of other page blocks of the two pages may be similar, thus causing a high misjudgment rate, resulting in low accuracy in judging page similarity, further reducing the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 胡蓉赵枫孙立波
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products