Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Topic Grabbing Method Based on Anchor Text Context and Link Analysis

A link analysis and context technology, applied in the field of Internet search, can solve the problems of affecting the scalability of the crawler, high iteration cost, insufficient link quality transmission, etc.

Active Publication Date: 2017-02-15
ZHEJIANG UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Search engines such as Google or Baidu are the main entrances for everyone to obtain network resources, but research shows that general search engines have certain limitations:
[0018] Guan et al. pointed out the limitations of this algorithm: this algorithm needs to store the web graph of the downloaded pages, and needs to iteratively access and modify this web graph. As the crawler runs, the web graph becomes larger and larger, and the cost of each iteration increases. Become very large, seriously affecting the scalability of the crawler
[0020] But this method does not explain why only the highest quality pages need to be recrawled to solve the problem of insufficient link quality transfer in the network dynamic discovery process. In addition, this method uses a window around the link to predict the relevance of the link to the topic , the definition of window is unreasonable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic Grabbing Method Based on Anchor Text Context and Link Analysis
  • Topic Grabbing Method Based on Anchor Text Context and Link Analysis
  • Topic Grabbing Method Based on Anchor Text Context and Link Analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0080] Below in conjunction with accompanying drawing and specific embodiment the present invention is described in further detail:

[0081] (1) Calculation of host priority and calculation of link local priority

[0082] The calculations of these two parts are carried out in parallel, and are introduced separately.

[0083] (1) Calculation of host priority.

[0084] The hosts on the network form a directed graph: the links between the hosts form the edges of the graph, and each host is a node in the graph. Here, the method of cash (money) transmission is used to calculate the priority of the host. The basic idea is as follows: give the seed host a certain initial cash value (the seed host is the starting point we choose in the traversal process of the graph), and then start to process the host Traversing, in the process of traversing, assign the cash value of the current host to the hosts linked by this host. Finally, the priority of the host is judged according to the cas...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an internet search technology and aims to provide a theme capturing method based on anchor text context and link analysis. The theme capturing method based on anchor text context and link analysis includes: computing a global priority of a link, computing a partial priority of the link and computing a final priority of the link. The theme capturing method based on anchor text context and link analysis has the advantages that webpage quality can be quickly estimated, preceding part of a text of the link can be acquired, and accuracy of theme relevancy prediction of the link can be increased according to the preceding part of the text.

Description

technical field [0001] The invention relates to Internet search technology, in particular to a topic grabbing method based on anchor text context and link analysis. Background technique [0002] With the rapid development of the Internet, various Internet products emerge in an endless stream, such as social networking, instant messaging, online shopping, personal blogs, vertical communities, etc. These products have changed the way people obtain information in the past. Especially with the advent of the web2.0 era, everyone is a producer of information. According to the "31st Statistical Report on Internet Development in China" released by CNNIC, China's Internet penetration rate exceeds 40%, and Internet users have reached 564 million. [0003] The information on the Internet is also growing explosively. According to research, at the beginning of the 21st century, Google indexed 2 billion web pages, while the size of the Internet at that time was about 4 billion to 10 bill...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 郑小林陈德人林臻郭华
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products