Dynamic network content grabbing method and dynamic network content crawler system

A dynamic content and content technology, applied in the field of web crawlers, can solve the problems of limited search strategies, failure to capture dynamic content such as scrolling news in a timely manner, and difficulty in truly achieving focus effects, etc.

A dynamic content and content technology, applied in the field of web crawlers, can solve the problems of limited search strategies, failure to capture dynamic content such as scrolling news in a timely manner, and difficulty in truly achieving focus effects, etc.

CN102880607AInactive Publication Date: 2013-01-16舆情(香港)有限公司

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dynamic network content grabbing method and dynamic network content crawler system
  • Dynamic network content grabbing method and dynamic network content crawler system
  • Dynamic network content grabbing method and dynamic network content crawler system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0075] The invention opens up a new crawler working mode through a WYSIWYG web page content grabbing mode. This new type of crawler technology not only breaks through the working principles of previous crawlers (such as: traditional search engines, vertical search engines, focused crawlers), but also avoids the disadvantages of network information collection tools. It is a business strategy-based crawler that can be directly used in large-scale industrial production and closely cooperates with Google (Google) search equipment. It is easy to use, highly configurable, and highly scalable.

[0076] The schematic implementations of the network dynamic content crawling method and the network dynamic content crawler system of the present invention will be described below with reference to the accompanying drawings. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of claimed subject matter....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a dynamic network content grabbing method and a dynamic network content crawler system. The dynamic network content grabbing method comprises the following steps of: submitting an access request for a target network, and acquiring a target webpage comprising one or more dynamic contents; extracting the dynamic content in a specific area in the acquired target webpage; judging whether each extracted dynamic content exists in cache, if so, not processing the dynamic content, and if not, advancing to the next step so as to grab the dynamic content; locally solidifying the dynamic content through rendering so as to generate a static content which corresponds to the current content of the dynamic content; and analyzing the static content, extracting a target content, locally saving the target content and caching the dynamic content in the cache. According to the network content grabbing technology, the content in the specific area in the webpage can be customized and grabbed, the rolling news and other dynamic rolling contents are timely grabbed, and the network content grabbing technology can serve as a search engine and a content provider of other external application.

Description

technical field [0001] The invention relates to web crawler technology, in particular to a web dynamic content grabbing method and a web dynamic content crawler system capable of grabbing specific content in webpages. Background technique [0002] The rapid development of the network has made the Internet a carrier of a large amount of important information. How to effectively extract and utilize this information has become a huge challenge. [0003] Currently, the means that can help people access Internet information mainly include traditional search engines, vertical search engines, and focused crawlers. However, they all have certain limitations, and the scope of application is not focused enough to meet the business needs of actual production fields such as news editing and network content monitoring, which are mainly reflected in the following aspects. [0004] Limitations of traditional search engines: [0005] 1. The returned results contain a large number of web ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
16 Jan 2013
Publication
CN102880607A
IPC
G06F17/30
Inventors
张振辉