Aggregated text density based webpage body text extraction method and apparatus
A web page text extraction and text technology, which is used in website content management, network data retrieval, special data processing applications, etc. Extract accurate and efficient effects with simple and efficient methods
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0027]
[0028] figure 1
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043] i+2 i+2
[0044]
[0045]
[0046] figure 2
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056] i+2 i+2 i+2
[0057]
[0058]
[0059] Tags are parsed and stored as units; paragraphs are clustered using a text clustering algorithm and the text is finally generated. Existing problems: simple problems are complicated, which makes extracting the text cumbersome and complicated, which is not conducive to wide application. SUMMARY OF THE INVENTION The purpose of the present invention is to provide a method and device for extracting webpage text based on aggregated text density in order to solve the technical problems in the prior art mentioned in the background art above. The ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com