System and method for automatically acquiring webpage data

A technology for web page data and specified data, which is applied in network data retrieval, network data indexing, digital data authentication, etc. It can solve the problems of inability to accurately collect and process data, and achieve customization of collected content and realization of customization. Effect

Active Publication Date: 2019-11-22
云帐房网络科技有限公司
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the universality of the crawler's data crawling strategy, it is impossible to accurately process the data of specific web pages, or perform special processing for specific web pages, especially for the data of tax websites.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for automatically acquiring webpage data
  • System and method for automatically acquiring webpage data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0024] Example 1: Please see figure 1 , figure 1 This is a structural diagram of a system for automatically collecting webpage data according to Embodiment 1 of the present invention. The system for automatically collecting webpage data according to Embodiment 1 of the present invention includes an embedded browser 1, an API interface 2, a script engine module 3, and The process control module 4, the API interface 2, the script engine module 3, and the process control module 4 are respectively embedded in the embedded browser 1. The system for automatically collecting webpage data of the present invention combines the script engine module 3 and the process control module 4 to jointly realize the access to the specified webpage and the specified data collection.

[0025] Preferably, the script engine module 3 is used to load a JS script; the JS script contains a custom JS function for operating a webpage, and the execution of the webpage requires the interpretation and execution of...

Embodiment 2

[0028] Embodiment 2: According to another aspect of the present invention, a method for automatically collecting webpage data is also provided, please refer to figure 2 , figure 2 This is a flowchart of a method for automatically collecting webpage data according to Embodiment 1 of the present invention. The method for automatically collecting webpage data according to Embodiment 1 of the present invention includes the following steps:

[0029] Step S10: the platform database issues a designated data collection request;

[0030] Step S20: Log in to the website to be collected: the embedded browser 1 receives the specified data collection request and visits the specified website to be collected, receives the page load event after the visit is successful, and obtains the memory address after the page is loaded;

[0031] Step S30: Load the JS script: the script engine module 3 loads the JS script for the current page, and executes the custom JS function in the memory address of the c...

Embodiment 3

[0035] Embodiment 3: The system and method for automatically collecting webpage data of the present invention has a wide range of application scenarios. For example, it can be applied to collect webpage data of taxation websites, provide customers with intelligent fiscal and taxation services, and use account information provided by customers to log in to the tax bureau website to collect relevant information. To obtain the basic information and financial information of customers on the taxation website, provide data support for intelligent fiscal and taxation services, and provide customers with various value-added services such as automated tax filing and risk assessment.

[0036] Next, we have collected data from the tax website as an example to introduce the workflow of the application.

[0037] The first step: The embedded browser visits the tax website. After the visit is successful, the page loading event is received, and the memory address after the page is loaded is also ob...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a system and a method for automatically acquiring webpage data. The system comprises an embedded browser, an API, a script engine module and a process control module, and the script engine module and the process control module are combined to jointly realize access to a specified webpage and acquisition of specified data. The script engine module enables the system for automatically collecting the webpage data to have the capability of executing the self-defined JS function in the memory address of the current page. The memory address of the current page can be obtainedafter the webpage is loaded, a JS script is utilized to simulate various click operations of a user, and a process control module can customize acquired contents on a specific page. The method is suitable for accurately processing data of the specific page or specially processing the specific page, and particularly can accurately acquire data of a tax website. Collection process self-definition and collection content self-definition can be realized.

Description

Technical field [0001] The invention relates to the technical field of website data collection, in particular to a system and method for automatically collecting web page data. Background technique [0002] The current way to grab webpage data on the Internet is mainly to download webpages on the Internet through a scheduler (crawler) and enter it into the database. According to a specific calculation method, the database information is collected, summarized, and classified. The calculation method is divided into It is depth first and breadth first. For example, Baidu's spider crawler uses this method of crawling webpage data. This method of crawling webpage data can automatically obtain data from webpages in large quantities. However, due to the universality of the crawler's data crawling strategy, it is impossible to accurately process the data of specific webpages or perform special processing for specific webpages, especially the data of taxation websites cannot be accuratel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/958G06F16/951G06F9/445G06F21/31
CPCG06F16/958G06F16/951G06F9/44521G06F21/31Y02D10/00
Inventor 李沁李娜
Owner 云帐房网络科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products