Data capture system and method

A data capture and data technology, applied in network data retrieval, network data indexing, electronic digital data processing, etc., can solve untargeted problems such as real-time, high-frequency invocation, high availability, and unclear business. Capture the effect of easy scaling

Inactive Publication Date: 2017-09-19
ADMASTER TECH BEIJING LTD
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the methods of capturing massive data in the industry have unclear business, and have not solved the problems of real-time, high-frequency calls, and high availability in a targeted manner.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data capture system and method
  • Data capture system and method
  • Data capture system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0082] The video business sends tasks through the API to capture webpages A1 (website homepage) and A2, and the public opinion business sends tasks through the API to capture webpages B1 (website homepage), B2, B3, and B4. The de-duplication module receives tasks A1, A2, B1, B2, B3 and B4, queries in the historical task database, finds the same task as B4, then discards task B4, and the remaining tasks enter the task queue module.

[0083] In the task queue module, Redis is used to form a list of tasks to be fetched.

[0084] The task scheduling module reads the list of tasks to be captured, and uses the double weighted polling algorithm to calculate the priority according to the business line information and webpage type information carried in the task header information.

[0085] The weight of the public opinion business is higher than that of the video business, so tasks B1, B2, and B3 will be taken away by the scheduling module faster. In the video service, since A1 is th...

Embodiment 2

[0089] The process is basically the same as that of Embodiment 1, except that when the same historical task as B4 is found, the result of the historical task is directly returned to the public opinion business.

Embodiment 3

[0091] The process is basically the same as that of Embodiment 1, except that after the data is captured, the data capture module sends the captured data to the result queue module, and at the same time notifies the task scheduling module that the capture task is completed.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data capture system. The system comprises a task duplicate removal module, a task queue module, a task scheduling module, a data capture module and a result queue module. The invention furthermore relates to a data capture method. The method comprises the steps of receiving data capture tasks sent by business lines and performing duplicate removal; forming a task queue by the tasks subjected to the duplicate removal; calculating a task priority based on a double polling algorithm, scheduling the tasks based on the priority, and allocating the tasks to crawler nodes; capturing data in the internet by utilizing a crawler; and returning the captured data, forming a result queue and sending the result queue to the business lines.

Description

technical field [0001] The present invention relates to the fields of computer application and information technology. Specifically, it relates to a system and method for data capture. Background technique [0002] With the large-scale development of social networks and mobile Internet, people can more conveniently obtain information, express opinions, and communicate through mobile phones. Especially after the prosperity of social networks, every netizen can create information, which leads to an explosive growth of the amount of information on the network. There are many sources of text information: Weibo, news, forums, blogs, Q&A, comments (including video, e-commerce, O2O comments), etc., can be collectively referred to as public opinion data. Brand advertisers and government departments all hope to understand the public opinion of netizens. For brand advertisers, they hope to obtain users' attitudes towards the brand, as well as users' interests and preferences from th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 杨博宋兵强张成白荣东
Owner ADMASTER TECH BEIJING LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products