Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for analyzing webpage data crawled by crawler

A webpage data and webpage technology, applied in the field of parsing webpage data crawled by crawlers, can solve the problems of repeated codes and low parsing efficiency, and achieve the effect of avoiding failure and improving parsing efficiency

Active Publication Date: 2017-12-12
SHENZHEN AUDAQUE DATA TECH
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

After the web crawler crawls the web page data, it traverses tr and td according to the page table tags, and analyzes the data one by one row by column. The parsing efficiency is low and there are many repeated codes
[0003] Moreover, if the webpage is revised, the parsing code needs to be rewritten

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for analyzing webpage data crawled by crawler
  • Method and device for analyzing webpage data crawled by crawler
  • Method and device for analyzing webpage data crawled by crawler

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] Embodiments of the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solution of the present invention more clearly, so they are only examples, and should not be used to limit the protection scope of the present invention.

[0032] It should be noted that, unless otherwise specified, the technical terms or scientific terms used in this application shall have the usual meanings understood by those skilled in the art to which the present invention belongs.

[0033] In the first aspect, a method for parsing webpage data crawled by crawlers provided by the embodiments of the present invention combines figure 1 , the method includes:

[0034] Step S1: Extract webpage data according to the table tags in the HTML structure of the webpage.

[0035] Step S2: According to the predetermined mapping relationship between class field names and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention belongs to the technical field of data processing, and provides a method and device for analyzing webpage data crawled by a crawler. The method comprises: extracting webpage data according to a table label in a webpage HTML structure; and according to a pre-determined mapping relationship between class field names and webpage data, mapping the obtained webpage data through extraction into a pre-defined class field. The method and device for analyzing webpage data crawled by a crawler provided by the present invention can improve the efficiency of analyzing the webpage data crawled by the crawler without repeatedly writing analysis codes.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method and device for analyzing web page data crawled by reptiles. Background technique [0002] The HTML structure of most web pages is the table tag. After the web crawler crawls the web page data, it traverses tr and td according to the page table tags, and analyzes the data one by one row by column. The parsing efficiency is low and there are many repeated codes. [0003] Moreover, if the web page is revised, the parsing code needs to be rewritten. For example, for the social security information webpage, if the name option is changed from the first column to the second column after the revision, the original crawler parsing code will become invalid and the code needs to be rewritten. [0004] How to improve the parsing efficiency of webpage data crawled by reptiles without repeatedly writing parsing codes is an urgent problem to be solved by those skilled in the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 颜龙武贾西贝
Owner SHENZHEN AUDAQUE DATA TECH