Webpage data acquisition method and system based on Google browser plug-in

A technology of Google browser and webpage data, which is applied in the direction of network data retrieval, other database retrieval, electronic digital data processing, etc., can solve the problems of difficult to obtain data, different problems, etc., reduce the probability of crawlers, and have good promotion and application value Effect

Inactive Publication Date: 2019-09-24
浪潮卓数大数据产业发展有限公司
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the rapid development of the Internet, the Internet has become the carrier of a large amount of information, but users in different fields and backgrounds have different data needs. If we want to obtain the data we need from massive data, we need to use web crawlers, but Internet data The actual owners (website managers) will find ways to identify web crawlers and protect their own data or websites, and a battle between data crawling and anti-crawling begins.
[0003] At the same time, some websites have strict anti-crawling strategies. Some data must be logged in to be visible. Continuous access will also show verification methods such as slider verification codes. It is difficult to obtain the desired data by ordinary data crawling methods. Targeted and customized data acquisition methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage data acquisition method and system based on Google browser plug-in
  • Webpage data acquisition method and system based on Google browser plug-in
  • Webpage data acquisition method and system based on Google browser plug-in

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0053] The web page data acquisition method based on Google browser plug-in of the present invention comprises the following steps:

[0054] S1. Write a Google browser plug-in.

[0055] Writing a Google Chrome plugin conforms to the Google plugin template, including manifest.json, .project, html pages, js files, and images files.

[0056] S2. Fill in the corresponding configuration in the written Google browser plug-in to ensure the normal operation of the plug-in.

[0057] Fill in the corresponding configuration files in manifest.json and .project to ensure the normal operation of the plug-in.

[0058] S3. Automatically obtain the link.

[0059] By executing the get request to obtain the link from redis through the rest service, the link can be obtained automatically.

[0060] S4. Acquiring webpage data: firstly, it is judged whether the webpage is loaded, and after the loading is completed, the loaded webpage data is obtained.

[0061] Use chrome.tabs.query({'active':tru...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a webpage data acquisition method and system based on a Google browser plug-in, and belongs to the technical field of Internet data acquisition. The webpage data acquisition method based on the Google browser plug-in comprises the following steps that S1, compiling the Google browser plug-in; s2, filling corresponding configuration in the compiled Google browser plug-in to ensure normal operation of the plug-in; s3, automatically obtaining a link; s4, acquiring webpage data; s5, automatically turning pages; s6, automatically dragging the sliding block; s7, carrying out page operation; and S8, carrying out data processing: obtaining required data from the webpage or the text, and carrying out format or processing on the webpage or the text data. According to the webpage data acquisition method based on the Google browser plug-in, the probability of being identified as a crawler by a website can be reduced, and the webpage data acquisition method has very good popularization and application values.

Description

technical field [0001] The invention relates to the technical field of Internet data acquisition, and specifically provides a method and system for acquiring web page data based on a Google browser plug-in. Background technique [0002] With the continuous development of society, social and economic development, at the same time, the technical level of society has been greatly improved. With the rapid development of the Internet, the Internet has become the carrier of a large amount of information, but users in different fields and backgrounds have different data needs. If we want to obtain the data we need from massive data, we need to use web crawlers, but Internet data The actual owners (website managers) will find ways to identify web crawlers and protect their own data or websites, and a battle between data crawling and anti-crawling begins. [0003] At the same time, some websites have strict anti-crawling strategies. Some data must be logged in to be visible. Continu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/958G06F3/0485G06F9/445
CPCG06F3/0485G06F9/44526G06F16/972
Inventor 姜敬超徐宏伟单震宋设杨照通
Owner 浪潮卓数大数据产业发展有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products