Supercharge Your Innovation With Domain-Expert AI Agents!

Internet topics file searching method, reptile system and search engine

A search method and crawler system technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of incomplete collection and low efficiency of searching Internet subject files, and achieve the effect of improving search efficiency

Active Publication Date: 2008-01-30
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF0 Cites 42 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0017] The present invention provides a search method for Internet theme files, which is used to solve the problems of low efficiency or incomplete collection of Internet theme files existing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Internet topics file searching method, reptile system and search engine
  • Internet topics file searching method, reptile system and search engine
  • Internet topics file searching method, reptile system and search engine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] Referring to FIG. 3 , it is a schematic structural diagram of the crawler system 1 provided by the present invention. It includes: a webpage and file download module 11 , a webpage parsing module 12 , a URL filtering module 13 , a collection control module 14 and a URL queue storage module 15 .

[0063] The functions of each module are described in detail below.

[0064] Web page and file download module 11: use HTTP, FTP protocol to download web page or file, and submit the downloaded web page to web page analysis module 12, submit the downloaded file to the indexing system of search engine to set up index database;

[0065] When the crawler system 1 just starts running, some seed URLs are set and put into the highest priority URL queue of the URL queue storage module 15 (its corresponding URL subject is divided into a default initial value), such as some common directory navigation webpages, such as www. hao123.com, the webpage and file download module 11 obtains the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a searching method of the internet top files and includes that a downloaded web page is analyzed and the uniform resource locator URL which is in the web page is extracted; corresponding priority of each URL is confirmed; each URL is collected according to the priority from high to low, an index is built to search for the internet topic file. The invention also discloses a crawler system and a search engine of a search engine of internet topic files. The crawler system provided by the invention comprises at least a storage module of URL queues, a downloading module of web pages and files, a web page analyzing module and a collection control module. The invention can improve the efficiency of searching for the internet topic files.

Description

technical field [0001] The invention relates to Internet file search, in particular to an Internet subject file search method, and a corresponding crawler system and search engine. Background technique [0002] The Internet has become the most popular technology in the computer field. The popularization of the Internet enables people to break through space and geographical restrictions and share information resources conveniently. www is the most important and widely used information service provided on the Internet. Since its birth, it has developed rapidly and has become a huge information database, storing a large amount of valuable information, and people can find their own information on it. Various content of interest. But in actual use, the huge amount of data on the web will bring great difficulties to users' information query work. In this case, various information retrieval services emerge as the times require, and full-text retrieval technology is an important i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 余祥鑫杨卫
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More