PDF table structure identification method based on graph attention mechanism
A table structure and attention technology, applied in character and pattern recognition, neural architecture, computer components, etc., can solve problems such as difficult and complex tables, and achieve the effect of improving the effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0030] Such as figure 1 As shown, a PDF table structure relationship recognition method based on graph attention mechanism, including the following steps:
[0031] Step 1. Preprocessing: Get all the cells in the table and their position coordinates.
[0032] Step 1: Extract all the text characters in the document according to the storage format of the PDF, form a cell with all the characters whose distance is less than the threshold d, and record the position coordinates and size of each cell. Assuming that there are n cells in total, we will record these n cells as w 1 ,w 2 ,...,w n . Such as figure 1 (Step 1) shown.
[0033] Step 2, graph construction: build an undirected graph for the obtained cells.
[0034] Step 2: Use the K-nearest neighbor method to create an undirected graph for the obtained cells. Such as figure 1 (Step 2) shown.
[0035] Step 2.1: Treat each cell as a node in the graph, and the nodes are in figure 1 It is indicated by a circle in the upper...
Embodiment 2
[0069] This embodiment describes the process of identifying the table structure on two public table structure identification data sets, the process used, the parameter design involved and the experimental results.
[0070] In this embodiment, three stages are involved. First, the edge classification model based on the graph attention mechanism is trained on the public table structure recognition data set to obtain the parameters of the model; then, implement the four aspects of the technical solution of the present invention. The step is to identify the structure of the tables in the test set; finally, compare the identified table structure with the correct result, and compare the present invention with the existing method.
[0071] (A) Model training
[0072] Step A: Use the training set to train the edge classification model based on the graph attention mechanism, and obtain the parameters of the model.
[0073] Step A.1: Prepare the dataset.
[0074] In this embodiment, t...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com