Web page similarity calculation method and web page similarity calculation device

A similarity calculation and similarity technology, applied in the field of computer networks, can solve the problem of inaccurate judgment of comparing two web pages, and achieve the effect of improving accuracy

Active Publication Date: 2014-11-05
HARBIN INST OF TECH AT WEIHAI
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The embodiment of the present invention provides a webpage similarity calculation method and device to solve the existing problem of inaccurate judgment when comparing the similarity of two webpages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page similarity calculation method and web page similarity calculation device
  • Web page similarity calculation method and web page similarity calculation device
  • Web page similarity calculation method and web page similarity calculation device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention.

[0050] figure 2 It is a schematic flowchart of a method for calculating webpage similarity provided by an embodiment of the present invention. This method is mainly used to judge the degree of similarity of different pages actually displayed in the browser, and is usually performed by a web page similarity calculation device, refer to figure 2 As shown, the method includes the following steps:

[0051] 10. Generate a visual structure-based first block feature vector corresponding to the webpage to be tested, wherein the first block feature vector includes a first block position feature vector and a first block content feature vector.

[0052] Wherein, the webpage...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a web page similarity calculation method and a web page similarity calculation device, which are applied to the field of a computer network and can solve the problem of inaccurate judgment during the similarity comparison of two web pages in the prior art. The method comprises the following steps that: a first blocking feature vector which corresponds to web pages to be tested and is based on a visual structure is generated, wherein the first blocking feature vector comprises a first blocking position feature vector and a first blocking content feature vector; and the first blocking feature vector is compared with a second blocking feature vector which corresponds to a preset web page and is based on the visual structure, and the similarity of the web pages to be tested is obtained. The method and the device provided by the embodiment of the invention are applied to the similarity comparison of the web pages.

Description

technical field [0001] The invention relates to the field of computer networks, in particular to a method and device for calculating similarity of webpages. Background technique [0002] At first, webpage similarity refers to the percentage of the code bytes in the same part of the two webpages in the total bytes of the two webpages. Through similarity comparison, pages can be filtered by content. Reducing webpage similarity is an important aspect of website optimization. important step. With the development of Internet technology and the emergence of new detection requirements, the calculation of web page similarity is not limited to bytecode comparison. Since the development of web2.0, the development of web pages has turned to the degree of clear distinction between front-end and back-end. The front-end focuses on the display form of page content, which is implemented on the browser side through scripts and layouts, while the back-end focuses on business logic and provid...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/95
Inventor 魏玉良吕芳邹新一王佰玲黄俊恒刘扬
Owner HARBIN INST OF TECH AT WEIHAI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products