Data collection method based on intelligent grasping system for science and technology service information

A technology of service information and system data, applied in the field of data collection based on scientific and technological service information intelligent grabbing system, can solve the problems of network consumption, server hardware resources, inability to obtain effective information, grab failure and other problems, and achieve convenient expansion. Effect

Active Publication Date: 2017-08-08
山东辰华科技信息有限公司
View PDF5 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The crawling system using artificially generated crawling wrapper technology has accurate scientific and technological information extraction results, but it is necessary to generate, update and maintain crawl wrappers for thousands of websites on the Internet. Ordinary vertical crawlers cannot do this well. Can rely on substantial human involvement
[0004] Safe and efficient real-time crawling technology; when high real-time crawling is required, it is necessary to frequently initiate link and download requests to the crawling website server, which will put a lot of pressure on the server of the other party, which in turn will cause the other party to use Banning policies such as denying access to ensure the normal operation of the server will lead to crawling failures; at the same time, high real-time crawling requirements consume a lot of network, server and other hardware resources, resulting in increased costs
[0005] With the continuous popularization of AJAX technology and the emergence of Single-page application frameworks such as AngularJS, more and more pages are now rendered by js; for crawlers, this kind of page is quite annoying: only extracting HTML content, It is often not possible to obtain valid information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data collection method based on intelligent grasping system for science and technology service information
  • Data collection method based on intelligent grasping system for science and technology service information
  • Data collection method based on intelligent grasping system for science and technology service information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The technical solutions of the present invention will be clearly and completely described below in conjunction with the accompanying drawings of the present invention.

[0026] The method for collecting data in an intelligent capture system based on technological service information includes the following steps:

[0027] ① Data capture: crawler configuration, the user publishes the capture task through the configuration module and the startup module of the client, and sets the website to be captured and the corresponding rules, including encoding, capture interval, timeout period, and retry times Wait;

[0028] ②Scheduled grabbing tasks: dynamically load them into the scheduled grabbing task list according to the tasks posted by users;

[0029] ③Download page: According to the crawling rules and crawling process set by the customer, the breadth and depth priority crawling algorithm will start traversing the crawled web pages and download the captured pages, and put the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data collection method based on an intelligent grasping system for science and technology service information. The method comprises the following steps: 1, data grasping: in terms of configuration of a crawler, a user releases a grasping task through a configuration module and a start module of a client and sets a webpage for grasping and corresponding rules; 2, loading of timing grasping tasks: the task released by the user is loaded dynamically to a timing grasping task list; 3, page downloading; 4, page parsing: pages in a queue are parsed; 5, adding to URL to be grasped; 6, data processing and storage: page data is subjected to parsing extraction processing, and extracted two-dimensional structured data is stored. The method can meet the requirement of the crawler for generality and the requirement of the science and technology service system for grasping, and convenient extension and plug-in type development are realized; parser rule configuration, width and depth of the pages subjected to grasping, grasping threads, database configuration or index configuration is added to specific service logic, and information can be grasped and collected intelligently.

Description

technical field [0001] The invention relates to an intelligent capture method, in particular to a data collection method for an intelligent capture system based on scientific and technological service information. Background technique [0002] Crawler, also known as spider, is not the name of an insect, but a computer program that people use to continuously extract links to web pages through customized entry URLs on the Internet, and then crawl and extract deeper unknown links based on these links. Going forward, we describe such program crawling behavior as a crawler-like action, which is called a crawler. A crawler is a program that automatically obtains web content and is an important part of a search engine. [0003] The crawling system using artificially generated crawling wrapper technology has accurate scientific and technological information extraction results, but it is necessary to generate, update and maintain crawl wrappers for thousands of websites on the Intern...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951G06F16/955
Inventor 陈文海霍英霞丁平黄美珍陈劲峰姚蕴佘文文马晓贾旭闫斌斌柏道菲张军成华娟
Owner 山东辰华科技信息有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products