Complex PDF structure analysis method and device based on neural network
A neural network and structure analysis technology, applied in the computer field, can solve problems such as poor generalization ability, difficult to design analysis rules, etc., and achieve the effect of strong generalization
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0047] figure 1 It is a schematic flowchart of a complex PDF structure analysis method based on a neural network in an embodiment of the present invention. Such as figure 1 shown. The method is applied to a neural network-based complex PDF structure analysis device, and the neural network-based complex PDF structure analysis processing device includes an input device and a display device, and the input device has a document input module and a document processing module inside , a memory, and a signal input module, the input device can be connected to a device that generates an output signal such as a printer or a scanner, the display device is connected to the input device, and can process the input device such as a printer or a scanner Devices such as display screens on which documents are displayed. The method includes steps S101-S104.
[0048] S101: Obtain feature information of the PDF document;
[0049] Further, the obtaining feature information of the PDF document i...
Embodiment 2
[0060] Based on the same inventive concept as the neural network-based complex PDF structure analysis method in the foregoing embodiments, the present invention also provides a neural network-based complex PDF structure analysis device, such as image 3 shown, including:
[0061] A first obtaining unit 11, configured to obtain feature information of the PDF document;
[0062] The second obtaining unit 12 is configured to coarsely divide the feature information of the PDF document according to the maximum entropy model, and obtain the hierarchical paragraphs of the PDF document;
[0063] The third obtaining unit 13 is used to convert the hierarchical paragraphs of the PDF document according to the two-layer bidirectional language model trained in the corpus to obtain paragraph word vectors, and compress the paragraph word vectors to obtain paragraph semantic vectors;
[0064] The fourth obtaining unit 14 is configured to input the paragraph semantic vector into a multi-layer b...
Embodiment 3
[0080] Based on the same inventive concept as the neural network-based complex PDF structure analysis method in the first embodiment, the present invention also provides a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to realize The steps of any method of a neural network-based complex PDF structure analysis method described above.
[0081] Among them, in Figure 4 In, bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 will include one or more processors represented by processor 302 and various types of memory represented by memory 304 circuits linked together. The bus 300 may also link together various other circuits, such as peripherals, voltage regulators, and power management circuits, etc., which are well known in the art and thus will not be further described herein. The bus interface 306 provides an interface between the bus 300 and ...
PUM

Abstract
Description
Claims
Application Information

- Generate Ideas
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com