Web page body text extraction method and apparatus
A text and webpage technology, applied in the field of webpage text extraction methods and devices, can solve the time-consuming and labor-intensive problems of webpage information extraction, and achieve the effect of good versatility, rapid and accurate extraction
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0034] The embodiments of the present invention will be described in detail below in conjunction with the drawings.
[0035] figure 1 Shows a flowchart of a method for extracting webpage text provided by an embodiment of the present invention, see figure 1 , A method for extracting webpage text provided by an embodiment of the present invention includes:
[0036] S101, extract the text in the title tag and the text in the h tag in the HTML source code of a webpage.
[0037] Specifically, since the text in the title tag of some web pages is information describing the website and has nothing to do with the main text, it is necessary to first determine whether the text in the title tag is related to the actual main text. At this time, the text in the title tag can be extracted from the source code of the web page, for example, denoted as Title 1, and the text in the h tag can be extracted from the HTML source code of the web page, for example, denoted as Title 2.
[0038] S102: Determine...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap