Web page text extraction method and device based on aggregated text density
A webpage text extraction and text technology, which is applied in the direction of text database indexing, unstructured text data retrieval, website content management, etc., can solve the problems of cumbersome and complicated extraction of text, complicated simple problems, and unfavorable wide application, so as to achieve accurate extraction High efficiency, avoid low efficiency, and strong versatility
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0027]
[0028] figure 1
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043] i+2 i+2
[0044]
[0045]
[0046] figure 2
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056] i+2 i+2 i+2
[0057]
[0058]
[0059] Tags are parsed and stored as units; paragraphs are clustered using a text clustering algorithm and the text is finally generated. Existing problems: simple problems are complicated, which makes extracting the text cumbersome and complicated, which is not conducive to wide application. SUMMARY OF THE INVENTION The purpose of the present invention is to provide a method and device for extracting webpage text based on aggregated text density in order to solve the technical problems in the prior art mentioned in the background art above. The ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com