Complex PDF structure analysis method and device based on neural network
A neural network and structure analysis technology, applied in the computer field, can solve problems such as poor generalization ability, difficult to design analysis rules, etc., and achieve the effect of strong generalization
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0046] Example one
[0047] figure 1 It is a schematic flowchart of a method for analyzing a complex PDF structure based on a neural network in an embodiment of the present invention. Such as figure 1 Shown. The method is applied to a complex PDF structure analysis device based on neural network. The complex PDF structure analysis processing device based on neural network includes an input device and a display device. The input device has a document input module and a document processing module. , Memory, signal input module, the input device can be connected to a device that generates output signals such as a printer or scanner, the display device is connected to the input device, and can process input devices such as the printer or scanner The display screen and other equipment where the document is displayed. The method includes steps S101-S104.
[0048] S101: Obtain feature information of the PDF document;
[0049] Further, the obtaining the characteristic information of the ...
Example Embodiment
[0059] Example two
[0060] Based on the same inventive concept as the neural network-based complex PDF structure analysis method in the foregoing embodiment, the present invention also provides a neural network-based complex PDF structure analysis device, such as image 3 Shown, including:
[0061] The first obtaining unit 11 is configured to obtain feature information of the PDF document;
[0062] The second obtaining unit 12 is configured to perform coarse-grained division of the feature information of the PDF document according to the maximum entropy model to obtain hierarchical paragraphs of the PDF document;
[0063] The third obtaining unit 13 is configured to transform the hierarchical paragraphs of the PDF document to obtain paragraph word vectors according to the two-layer bidirectional language model trained in the corpus, and compress the paragraph word vectors to obtain paragraph semantic vectors;
[0064] The fourth obtaining unit 14 is configured to input the paragraph se...
Example Embodiment
[0079] Example three
[0080] Based on the same inventive concept as the neural network-based analysis method of complex PDF structure in the first embodiment, the present invention also provides a computer-readable storage medium on which a computer program is stored, which is implemented when executed by a processor The steps of any method of the complex PDF structure analysis method based on neural network described above.
[0081] Among them, in Figure 4 In the bus architecture (represented by the bus 300), the bus 300 can include any number of interconnected buses and bridges. The bus 300 will include one or more processors represented by the processor 302 and various memories represented by the memory 304. The circuits are linked together. The bus 300 may also link various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are all known in the art, and therefore, no further descriptions thereof are provided herein. The bus...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap