Automatic webpage table data extraction method and device
A table data, automatic extraction technology, applied in the fields of electronic digital data processing, special data processing applications, semi-structured data retrieval, etc. achieve high practical value
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0058] Taking the list of tutors of a school of aerospace as an example, the automatic extraction method of web form data is explained.
[0059] Table 1 A list of tutors in the School of Astronautics of a certain school
[0060]
[0061] It can be seen from the source code of the webpage that the table is designed with 2 layers of table tag nesting. refer to figure 1 , the table data extraction process is described in detail below:
[0062] Step 1: Obtain the webpage content containing the table tag through jsoup or other webpage parsers, and parse the webpage content into a DOM tree structure.
[0063] Step 2: Layer the table data containing the Table tag in the DOM tree structure, and then filter layer by layer until the table data that needs to be processed is obtained; the specific process includes the following:
[0064] Step 2.1: use the outermost Table tag in the DOM tree structure as the first layer, use the nested Table tag in the first layer as the second layer...
Embodiment 2
[0098] Now take a web page character list as an example to illustrate the automatic extraction method of web page form data.
[0099] Table 2 list of characters
[0100] name
tom
Kity
Lucy
Tomas
Rome
Bloom
Age
30
23
34
37
35
31
Gender
male
female
female
male
male
male
[0101] As can be seen from the source code of the web page, the table has only one table tag. The following is a detailed description of the table data extraction process:
[0102] Step 1: Obtain the webpage content containing the table tag through html or other webpage parsers, and parse the webpage content into a DOM tree structure.
[0103] Step 2: Layer the table data containing the Table tag in the DOM tree structure, and then filter layer by layer until the table data that needs to be processed is obtained; the specific process includes the following:
[0104] Step 2.1: Use the outermost Table tag in the DOM t...
Embodiment 3
[0142] Now take a webpage land bidding form as an example to explain the method of automatic data extraction from the webpage form.
[0143] Table 3 Bidding form for a certain land
[0144]
[0145] As can be seen from the source code, the table contains table tags. The following is a detailed description of the table data extraction process:
[0146] Step 1: Obtain the webpage content containing the table tag through a webpage parser, and parse the webpage content into a DOM tree structure.
[0147] Step 2: Layer the table data containing the Table tag in the DOM tree structure, and then filter layer by layer until the table data that needs to be processed is obtained; the specific process includes the following:
[0148] Step 2.1: Use the outermost Table tag in the DOM tree structure as the first layer, the nested Table tag in the first layer as the second layer, and so on;
[0149] Step 2.2: Filter the table data containing the Table tag layer by layer from the outsid...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com