Dynamic network content grabbing method and dynamic network content crawler system

A dynamic content and content technology, applied in the field of web crawlers, can solve the problems of limited search strategies, failure to capture dynamic content such as scrolling news in a timely manner, and difficulty in truly achieving focus effects, etc.

Inactive Publication Date: 2013-01-16
舆情(香港)有限公司
View PDF3 Cites 58 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 2. The contradiction between limited search engine server resources and unlimited network data resources
[0011] 1. Although the capture target can be described and defined to a certain extent, the granularity of the filtered content is not fine enough;
[0012] 2. It only stays in the search and match of the URL, and cannot go deep into the content of the page itself;
[0013] 3. It is difficult to truly achieve the focusing effect, limited by its own search strategy; and
[0015] In addition, with the development of technologies such as online news, blogs, and microblogs, dynamic content in web pages has shown explosive growth on the Internet. However, existing search engines and crawlers that are oriented to pages and URLs or driven by keywords cannot customize crawling. Fetch the content in the specified area of ​​the webpage, and cannot capture the update of dynamic content such as scrolling news in a timely manner

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dynamic network content grabbing method and dynamic network content crawler system
  • Dynamic network content grabbing method and dynamic network content crawler system
  • Dynamic network content grabbing method and dynamic network content crawler system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0075] The invention opens up a new crawler working mode through a WYSIWYG web page content grabbing mode. This new type of crawler technology not only breaks through the working principles of previous crawlers (such as: traditional search engines, vertical search engines, focused crawlers), but also avoids the disadvantages of network information collection tools. It is a business strategy-based crawler that can be directly used in large-scale industrial production and closely cooperates with Google (Google) search equipment. It is easy to use, highly configurable, and highly scalable.

[0076] The schematic implementations of the network dynamic content crawling method and the network dynamic content crawler system of the present invention will be described below with reference to the accompanying drawings. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of claimed subject matter....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a dynamic network content grabbing method and a dynamic network content crawler system. The dynamic network content grabbing method comprises the following steps of: submitting an access request for a target network, and acquiring a target webpage comprising one or more dynamic contents; extracting the dynamic content in a specific area in the acquired target webpage; judging whether each extracted dynamic content exists in cache, if so, not processing the dynamic content, and if not, advancing to the next step so as to grab the dynamic content; locally solidifying the dynamic content through rendering so as to generate a static content which corresponds to the current content of the dynamic content; and analyzing the static content, extracting a target content, locally saving the target content and caching the dynamic content in the cache. According to the network content grabbing technology, the content in the specific area in the webpage can be customized and grabbed, the rolling news and other dynamic rolling contents are timely grabbed, and the network content grabbing technology can serve as a search engine and a content provider of other external application.

Description

technical field [0001] The invention relates to web crawler technology, in particular to a web dynamic content grabbing method and a web dynamic content crawler system capable of grabbing specific content in webpages. Background technique [0002] The rapid development of the network has made the Internet a carrier of a large amount of important information. How to effectively extract and utilize this information has become a huge challenge. [0003] Currently, the means that can help people access Internet information mainly include traditional search engines, vertical search engines, and focused crawlers. However, they all have certain limitations, and the scope of application is not focused enough to meet the business needs of actual production fields such as news editing and network content monitoring, which are mainly reflected in the following aspects. [0004] Limitations of traditional search engines: [0005] 1. The returned results contain a large number of web ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 张振辉
Owner 舆情(香港)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products