Web useless link filtering method based on content relevancy
A technology of invalid links and filtering methods, applied in the field of Internet search, can solve the problems of unable to correctly reflect the relationship between web pages, and the sorting results are no longer true and effective, and achieve the effect of reasonable assumption of link relevance and improved effectiveness.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0031] The Web invalid link filtering method that the present invention proposes roughly can be divided into the operation of two parts: the first part is to utilize the text position information in the webpage, by statistical method, removes links such as irrelevant advertisement, navigation in the webpage; The second part is On the basis of the first part, carry out a correlation analysis on the content of the web page and the content of the web page pointed to by the link, and remove those links whose content is irrelevant. Detailed descriptions are given below respectively.
[0032] 1. Filtering based on text position
[0033] At present, most of the webpages are created through a unified template, and for general webpages, links related to topics are placed under the text of a webpage by the webpage creator, so this part of the filtering work is based on this assumption of. The filtering work includes first converting the HTML document into a DOM tree structure, and the...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com