Implementing method for web crawler based on search engines

A technology of web crawler and implementation method, applied in the implementation field of web crawler, can solve problems such as inability to meet the needs of personalized services, and achieve the effect of updating data in real time

Inactive Publication Date: 2014-01-15
上海博腾信息科技(集团)有限公司
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the exponential growth of network information resources and the dynamic changes of network information resources, the information retrieval services provided by traditional search engines can no longer meet people's growing demand for personalized services, and are facing huge challenges

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Implementing method for web crawler based on search engines

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] The technical solution of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0015] exist figure 1 In the example shown, the

[0016] The method disclosed by the present invention is composed of five modules, which are socket function module (1), http function module (2), regular expression function module (3), depth search function module (4), breadth search function module (5 ).

[0017] The Socket function module (1) is the background knowledge that the web crawler relies on, and exists in the structure of the knowledge management system, and the client establishes a connection with the server through the socket socket;

[0018] In the http function module (2), the client must define a set of URLs to determine the address to be browsed. When the client establishes a connection with the server, it sends a request to the server. If the server receives the request, it will give corresponding response information. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an implementing method for a web crawler based on search engines. The web crawler comprises a socket function module, an http function module, a regular expression function module, a depth-first search function module and a breadth-first search function module. In order to overcome the defects that a fixed search strategy is adopted in a traditional web crawler technology and is lacking in adaptability, the implementing method for the web crawler based on the search engines can meet various requirements of customers, and therefore the web crawler technology which enables data to be updated in real time is achieved.

Description

technical field [0001] The invention relates to an information collection technology, in particular to a method for realizing a web crawler based on a search engine. Background technique [0002] With the development of the Internet, the way people obtain information is gradually replaced by the Internet. In the early days of Internet development, people mainly obtained the information they needed by browsing portal websites, but with the rapid development of the Web, it became more and more difficult to find the information they needed in this way. At present, most people obtain useful information through search engines. Therefore, the development of search engine technology will directly affect the speed and quality of people's access to the information they need. [0003] In 1994, the world's first web search tool, Web Crawler, came out. At present, the more popular search engines include Baidu, Google, Yahoo, Info seek, Inktomi, Teoma, Live Search, etc. Due to the cons...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 蒋志勇
Owner 上海博腾信息科技(集团)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products