An efficient data collection method and system based on deep web crawler

A technology of data collection and crawler, which is applied in the direction of network data indexing, network data retrieval, network data query, etc. It can solve the problems of unstable collection output, inability to obtain complete collections, and inefficiency, so as to shorten data collection time and improve collection efficiency , the effect of compressing the acquisition time

Active Publication Date: 2019-08-16
SHENZHEN AUDAQUE DATA TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] One is to use relevant names as keywords to obtain the complete set of information. The disadvantage of this method is: because the names of all the information to be collected are not necessarily known, it is basically impossible to obtain the complete set, and the processing efficiency is low;
[0006] One is to use the relevant serial number information to traverse and search one by one to obtain the complete set of information. The disadvantage of this method is that the collection output is unstable due to the discontinuous characteristics of the serial number information (such as: reserved space, information discarded, not yet started, etc.) , not efficient, unable to achieve the purpose of efficient collection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An efficient data collection method and system based on deep web crawler
  • An efficient data collection method and system based on deep web crawler
  • An efficient data collection method and system based on deep web crawler

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The present invention provides a high-efficiency data collection method and system embodiments based on deep web crawlers, in order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present invention, and to make the above-mentioned purposes, features and The advantages can be more obvious and easy to understand, and the technical solution in the present invention will be further described in detail below in conjunction with the accompanying drawings:

[0032] Such as figure 1 As shown, the present invention firstly provides a kind of efficient data collection method embodiment based on deep web crawler, including:

[0033] S101 Use known names as keywords to perform preliminary search and sampling to obtain corresponding numbering information and numbering rules; wherein, the numbering information is coded data that follows a certain numbering rule and can be traversed one by one; wherein, the numbering informatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a high efficiency data acquisition method and system based on deep web reptiles; the method comprises the following steps: using a known name as the keyword so as to carry out preliminary search sampling, thus obtaining corresponding number information and number rules; grouping all number information according to number rules, and ranking the number information in an ascending order, wherein a data gap is built between every two adjacent number information; traversing and searching the data according to the number information and data interval in a rising sequence, thus obtaining the data complete set. The method and system can grab various industry information data including but not limited to: enterprise business information, book information, goods information, and trial files; the method and system can grab deep web data on industrial websites, can accurately obtain related data complete set, and can finish mass effective data acquisition in a short time.

Description

technical field [0001] The present invention relates to the technical field of network data collection, in particular to an efficient data collection method and system based on deep web crawlers, which can be applied to capture data such as enterprise business information, book information, commodity information, and trial documents. Background technique [0002] The concept of the deep web is defined relative to the surface web, referring to content that cannot be obtained by ordinary search engines. The amount of data hidden in the deep web is huge, but traditional search engines ignore these high-quality data hidden behind search forms. Therefore, it is imperative to design an engine that can obtain deep network data, and the use of these high-quality data can bring unexpected effects. [0003] However, because the data in the deep web is hidden behind various search interfaces and cannot be accessed directly through hyperlinks, they must be viewed by using some keyword ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/951G06F16/953G06F16/9535G06F16/955
CPCG06F16/951G06F16/955
Inventor 张军贾西贝钟志强
Owner SHENZHEN AUDAQUE DATA TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products