PageRank method based on quick similarity

A similarity and fast technology, applied in the field of Web structure mining and information retrieval, can solve problems such as increasing algorithm complexity and reducing applicability

Inactive Publication Date: 2011-11-23
NANJING UNIV OF INFORMATION SCI & TECH
View PDF1 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the vector space model algorithm has a large number of multiplication operations, which will further increase the complexity of the algorithm and reduce the applicability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • PageRank method based on quick similarity
  • PageRank method based on quick similarity
  • PageRank method based on quick similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] In order to make the object, technical solution and advantages of the present invention clearer, the implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0036] PageRank algorithm based on fast similarity, including:

[0037] 1) Improve search accuracy: use the Hamming distance similarity method to calculate the similarity value between the title of the web page and the search term, and use it as one of the criteria for sorting the search result web pages, that is, the Hamming distance similarity algorithm and the PageRank algorithm Combined to improve the PageRank algorithm topic drift phenomenon.

[0038] 2) Improve the recall rate of the search: the web pages containing synonyms of the search terms may be related to the search topic, and the search for synonyms of the search terms can be added during the search process. For this reason, it is necessary to improve the Hamming distanc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a PageRank method based on quick similarity. A PageRank algorithm is combined with an improved Hamming distance similarity algorithm, the invention provides a novel webpage sequencing method. The PageRank algorithm is an algorithm for purely researching a webpage link and is easy to generate a problem of topic shift. Aiming at the problem, the algorithm provides two improvements: 1, by combining with the Hamming distance similarity algorithm, similarity of an index word and a webpage text is calculated, searching precision rate is improved; and for improving searching recall ratio, the Hamming distance similarity algorithm needs to be improved, the searching of a synonym of the index word is increased, and the searching range is enlarged. According to the two improvements, a calculating formula of the quick similarity PageRank algorithm is obtained, and the searching requirement is met from the two aspects of the recall ratio and the precision rate.

Description

technical field [0001] The invention is a PageRank method based on fast similarity, and belongs to the field of Web structure mining and information retrieval. Relevant knowledge includes: computer technology, database technology, statistics, coding theory, etc. Background technique [0002] The PageRank algorithm was proposed by S.Brin, L.Page, etc. in 1998. It is a webpage classification algorithm that takes the link relationship in the network as the research object. possible to meet the user's search needs. The Google search engine uses a technology that combines complex text matching algorithms and PageRank algorithms. The successful application of the PageRank algorithm in Google proves that the algorithm is very effective when used in search engines. The PageRank algorithm can iteratively calculate the PageRank value of each webpage. The PageRank value represents the authority of the webpage in the network. The higher the value, the higher the authority, and the hi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 毕硕本马燕乔文文汪大
Owner NANJING UNIV OF INFORMATION SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products