File content extraction method, device and apparatus, and computer readable storage medium
A technology for extracting methods and files, applied in the field of testing, which can solve problems such as inefficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0056] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
[0057]The main solution of the embodiment of the present invention is to convert the PDF file into HTML data, and further extract different types of content data from the HTML data to generate corresponding content files, thereby extracting the content of the PDF file.
[0058] Since the extraction of PDF file content in the prior art mainly relies on manual screening and comparison, especially when a large number of PDF files need to be processed in batches, the processing efficiency of PDF files will be very low.
[0059] The present invention provides a solution, by converting PDF files into HTML data, extracting different types of content data from HTML data, and generating corresponding content files, so as to realize the automatic extraction of PDF file content and improve the efficiency of PDF files. Processing...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com