Web useless link filtering method based on content relevancy

A technology of invalid links and filtering methods, applied in the field of Internet search, can solve the problems of unable to correctly reflect the relationship between web pages, and the sorting results are no longer true and effective, and achieve the effect of reasonable assumption of link relevance and improved effectiveness.

Inactive Publication Date: 2011-11-09
GUANGDONG UCAP INTERNET INFORMATION TECH
View PDF3 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If it is not processed, the constructed link structure graph cannot correctly reflect the relationship

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web useless link filtering method based on content relevancy
  • Web useless link filtering method based on content relevancy
  • Web useless link filtering method based on content relevancy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The Web invalid link filtering method that the present invention proposes roughly can be divided into the operation of two parts: the first part is to utilize the text position information in the webpage, by statistical method, removes links such as irrelevant advertisement, navigation in the webpage; The second part is On the basis of the first part, carry out a correlation analysis on the content of the web page and the content of the web page pointed to by the link, and remove those links whose content is irrelevant. Detailed descriptions are given below respectively.

[0032] 1. Filtering based on text position

[0033] At present, most of the webpages are created through a unified template, and for general webpages, links related to topics are placed under the text of a webpage by the webpage creator, so this part of the filtering work is based on this assumption of. The filtering work includes first converting the HTML document into a DOM tree structure, and the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a web useless link filtering method based on content relevancy. The method comprises the following steps of: removing irrelevant advertisement links and navigation links in a page by using text position information in the page by a statistical method; and carrying out relevancy analysis on contents of the page and the contents of the linked pages, and removing useless links having irrelevant contents. By the web useless link filtering method, the useless links can be better removed, and page rank computation is carried out on a purified link structure chart, so a page rank result can be better improved, the quality of the pages with a high page rank is improved, more high-value websites are introduced and the like.

Description

technical field [0001] The invention relates to a method for filtering useless links in Web pages, in particular to a method for filtering useless links in Web pages based on content correlation analysis, and belongs to the technical field of Internet search. Background technique [0002] With the rapid development of the Internet, search engines for Internet information queries are playing an increasingly important role. For a search engine, its main task is to find relevant web pages and return them to users in order of page importance. With the growth of the number of Web pages, the richness of page content and the variety of page links, search engines began to become more and more "powerless". There are many reasons for this, the most important of which is the increasing proliferation of invalid links in Web pages. [0003] After analysis, links in web pages can be divided into the following four categories: [0004] Artificially generated links: Most of these links a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 汪敏刘轩山
Owner GUANGDONG UCAP INTERNET INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products