System, method and program for extracting web page core content based on web page layout
A core content and extraction system technology, applied in the direction of instruments, computing, electrical digital data processing, etc., can solve the problem that the core content of the web page is not satisfactory
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0079] In the following detailed description of specific embodiments according to the present invention, some terms are used. In order to facilitate understanding of the content disclosed in this application, these terms are collectively explained as follows here:
[0080] 1) Tags related to tables
[0081] "Table related tags" (HTML tags) include 、 、 、 、 and etc. among them
[0084]
[0085] for creating data tables, is used to represent the body of the table, is used to denote a footnote for a table, used to represent data rows of a table, is used to define headers, while Used to create data structures.
[0082] 2) Basic structure
[0083] "Basic structure" refers to the HTML tags included in the and , or the HTML tag pair and information items within. The information items mentioned here may be images, text / image links, plain texts, table structures, and the like. A basic structure can nest the next basic structure. 3) Table stru...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com