Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Dynamic network crawler based on client end /service end

A technology of dynamic network and crawler, applied in the field of network crawler, can solve the problems of automatic dynamic updating of web pages without virus code, web page warning, inability to automatically identify web pages, etc., so as to achieve the effect of improving update and security quality and avoiding dead links.

Inactive Publication Date: 2008-07-02
SHANGHAI XINSHENG ELECTRONICS TECH
View PDF0 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to overcome the shortcomings of existing web crawlers that cannot automatically identify webpages, automatically update webpages dynamically, and cannot give early warnings to webpages containing virus codes, the present invention provides a new construction method for web crawlers. Interact with the server to communicate with the server, automatically notify the server of the search engine when the webpage is updated, and use the embedded anti-virus program to delete illegal webpages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0006] The invention restructures and designs the web crawler based on the local server in the background of the traditional search engine, and divides it into web crawler clients (Web Crawler Clients) and web crawler servers (Web Crawler Servers). The server is installed on the local server of the search engine, and has the same function as the traditional web crawler to crawl and analyze web pages, but adds the function of virus identification, and only stores safe web pages in the local database. The client is bundled and installed on key nodes such as the web content provider or the proxy server used by users to access the Internet. It can detect the update of the URL and content of the web page in time, and automatically send the updated URL to the server through the message mechanism.

[0007] The specific implementation method is as follows:

[0008] Web crawler server:

[0009] (1) Establish a database table structure to store the searched web pages. It mainly includ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention fulfills reconstruction and design on a network creeper based on local servers on the background of traditional search engines, so as to perform partition on the network creeper into a network creeper client end and a network creeper service end. The service end is installed in the local server on the search engine and has the same function with creeping-decomposed web pages in the traditional network creeper; however, the invention increases a virus identification function, so that only safe web pages can be saved into local databases. The client end is bound to install on a key node in an agency server that a web content supplier or a user uses to accesses Internet, so as to find updates on web page URLs and contents in time, and meanwhile, automatically send the updated URLs to the service end through an information system. The invention has the following beneficial technical effects: The invention can more effectively improve the quality of updating and safety of web pages in local databases in search engines, so as to avoid dead link, link loss as well as being infected by virus web pages with virus while a user is searching the web pages with the search engine, so as to resolve the shortcomings in prior network creepers.

Description

technical field [0001] The invention relates to the field of web crawlers (also known as web spiders, web crawlers or web robots) for computer search engines, in particular to the technical proposal of a web crawler capable of intelligently discriminating and selecting texts and notifying a local database in time to update webpages. Background technique [0002] A web crawler is a background program of a search engine used to discover, explore, and detect web content on the World Wide Web (WWW). The World Wide Web (WWW) is an associative collection of Hypertext Markup Language (HTML) pages distributed on many hosts in the Internet. The pages are linked and accessed through Uniform Resource Locators (URLs), which are the addresses of HTML pages. . At present, the technology of the web crawler of traditional search engine is: web crawling (downloading of webpage) is done by many centralized or distributed crawlers. A URL server sends the list of URLs to the crawler. The cap...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 蔡阳波陈勇
Owner SHANGHAI XINSHENG ELECTRONICS TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products