Webpage data analysis method and device and computer readable storage medium

A webpage data and parsing method technology, applied in the field of data processing, can solve the problems of high coupling, lack of data extraction and format conversion of data parsing engine, and inability to handle webpage sequence and data association well, so as to achieve high cohesion, Reduce the effects of overdependence

Pending Publication Date: 2019-06-21
重庆金融资产交易所有限责任公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The first method cannot deal with issues such as sequence and data association between web pages well during centralized parsing; the second method relies too much on the structure and format of the source website, and if there is an exception, the web page must be re-crawled. Coupling is too high
In addition, the above two analysis methods of webpage data are lacking in data analysis engine, configurable data extraction and format conversion.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage data analysis method and device and computer readable storage medium
  • Webpage data analysis method and device and computer readable storage medium
  • Webpage data analysis method and device and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0040] The invention provides a method for parsing web page data. refer to figure 1 as shown, figure 1 It is a schematic flowchart of a method for parsing webpage data provided by an embodiment of the present invention. The method may be performed by a device, and the device may be implemented by software and / or hardware.

[0041] exist figure 1 In the described embodiment, the web page data parsing method includes:

[0042] Step S10 , when capturing data, analyze and obtain the number of pages where the data to be captured is located for the network pages where the data to be captured is located.

[0043] In the embodiment of the present invention, in the process of crawling web pages for data capture, according to the actual situation, a preliminary analysis is performed on the network pages where the data to b...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of data acquisition, and discloses a webpage data analysis method, which comprises the following steps: when data capture is carried out, aiming at a network page where data to be captured is located, analyzing and acquiring the number of pages where data to be captured is located; according to the number of pages where the analyzed to-be-captured datais located, data capture is conducted in a data capture mode matched with the number of the pages, and captured page data are obtained; and carrying out data processing on the captured page data, generating required structured data and storing the structured data. The invention further provides a webpage data analysis device and a computer readable storage medium. According to the method, a webpage data analysis technology that webpage capturing primary analysis and structured data format conversion are executed separately is realized, and excessive dependence of webpage data analysis on a source website structure is reduced; in addition, webpage capturing primary analysis and structured information extraction format conversion are separated, independent operation is achieved, and the beneficial effect of high cohesion and low coupling is achieved.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a web page data parsing method, device and computer-readable storage medium. Background technique [0002] With the continuous advancement of Internet information technology and the diversification of Internet applications, network technology has increasingly profoundly changed people's work, study and lifestyle, and even affected the entire process of society. The rapid development of the Internet has accelerated the arrival of the era of big data. Global companies are full of enthusiasm for big data, and big data analysis and processing have emerged as the times require. The big data processing process mainly includes data acquisition, data storage integration, data preprocessing, data mining analysis, data presentation and application, etc. When traditional industries develop big data, the first thing they face is how to obtain Internet data that is not based ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/25G06F16/951
Inventor 檀传华冉梦龙孟文斌李祖光陈锦韬
Owner 重庆金融资产交易所有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products