Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Rapid retrieval method and system for mass website basic information

A basic information and website technology, which is applied in the field of rapid retrieval method and system of massive website basic information, can solve the problems of low efficiency and high repetition rate of retrieval results, and achieve the effect of reducing IO operations

Pending Publication Date: 2015-10-28
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF2 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Aiming at the deficiencies of the prior art, the present invention provides a method and system for fast retrieval of basic information on massive websites, which makes up for the defects of low efficiency of traditional database retrieval methods and high repetition rate of retrieval results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rapid retrieval method and system for mass website basic information
  • Rapid retrieval method and system for mass website basic information
  • Rapid retrieval method and system for mass website basic information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0045] Embodiment: Usually there is a keyword index table (such as: Beijing: 12,34 pages, Shanghai: 3,77 pages...) behind the thicker books to help readers find the page numbers of relevant content more quickly.

[0046]

[0047] The biggest difference between full-text search and database search is that the first 100 most relevant results meet the needs of more than 98% of users.

[0048] Since the database index is not designed for full-text indexing, for example, when you enter like "%keyword%", the database index will not work. When you use like query, the search process becomes a traversal process similar to flipping through the pages of a book. Therefore, for database services with fuzzy queries, LIKE is extremely harmful to performance. If it is necessary to perform fuzzy matching on multiple keywords: like "%keyword1%" and like "%keyword2%", the efficiency will be extremely low.

[0049] The inverted index system is maintained using a B-tree structure.

[0050] Th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a rapid retrieval method and system for mass website basic information. The method comprises the steps of collecting website basic information, and converting the basic information into a json formatted file of a fixed field to be used as a retrieval object; constructing a reverse index system; rapidly retrieving by use of a full-text retrieval method; and maintaining the reverse index system by use of a B tree structure. Through adoption of the method, working time is saved, and the retrieving efficiency is greatly improved.

Description

technical field [0001] The invention relates to a retrieval method and system, in particular to a rapid retrieval method and system for basic information of massive websites. Background technique [0002] As the number of domain names around the world continues to proliferate, it will become extremely valuable to know the basics of Internet sites. On the one hand, we can understand the development direction of the technology and components of the Internet website. On the other hand, we can also use the statistical analysis of the basic information of the website to correlate with the vulnerability of the website itself and give early warning and situation analysis. [0003] Due to limited computing resources, there is a lack of technical support and resource base for basic information collection and comprehensive situation analysis on national and even global Internet sites, so this part of research has been in a blank stage at home and abroad. [0004] With the rise of var...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/319G06F16/334G06F16/951
Inventor 胡俊高胜何世平徐原赵慧金皓党向磊李世淙徐晓燕刘婧饶毓赵宸陈阳
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products