Structure recognition based Web table information extraction method
A technology of structure recognition and form information, applied in the field of Web information extraction, to achieve the effect of reducing the number of string matching, fast recognition, and reducing redundant data
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0035] The present invention proposes a method for extracting Web form information based on structure recognition. This method can correctly extract table information on the basis of quickly and accurately identifying the table structure, and can effectively reduce the generation of redundant data in the extraction result. The complete process of the method is as Figure 5 Shown.
[0036] The operation of this method includes the following steps:
[0037] 1. Web form structure recognition
[0038] ① Heuristic rules (given a Web form)
[0039] Get the number of columns in the table, Get_Table.column.size();
[0040] If Table.column.size() is 2 or 3, and Table.row.size() is much larger than the number of columns (usually more than 2 times), the first column of the table is the attribute cell;
[0041] / / The same rule applies to tables where the number of columns is much larger than the number of rows, and the first row of the table is the attribute row.
[0042] For the form that does ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com