Text structure analysis method based on text semantics
A technology for structural analysis and text, applied in semantic analysis, word processing, special data processing applications, etc., to achieve the effect of wide application and general method framework
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0029] The present invention will be described in detail below in conjunction with the examples.
[0030] 1. Data acquisition
[0031] 101 plain text data. Get plain text TXT data in machine-unreadable formats such as PDF, images, etc. Documents to be processed can be converted to machine-readable TXT format using open source tools. For example, use PDFBOX to parse PDF documents into TXT documents, or use OCR technology to convert scanned files in JPEG format into TXT documents.
[0032] 2. Text extraction
[0033] 102 noise content filtering. Filter noise content for structure extraction tasks, such as blank lines, headers and footers, table content, etc. Header and footer can be filtered according to the repeated information of each page, or based on rules to filter the header and footer of specific types of documents. The content of the table may affect the judgment of the hierarchical structure, and table identification and elimination are required.
[0034] 103 dir...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com