Method for directly obtaining table content in PDF through browser
A browser and table technology, applied in the field of PDF table content extraction, can solve the problems of missing tables, cumbersome operations, high consumption, etc., and achieve the effects of strong pertinence, good analysis effect, and less server resource occupation
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0094] The content of the PDF file is the mid-term exam results of Class 1 of a certain grade in a certain school. The results are a borderless table, such as figure 2 As shown, extract the content of the table in PDF format.
[0095] Upload the PDF file to the website, and the rendering engine of the browser renders the PDF file, which is divided into an html view layer and a canvas view layer. Among them, the html view layer includes text and number content and coordinate information; the canvas view layer includes background color and frame line information.
[0096] The browser can monitor the table area selected by the mouse. The Canvas technology in the browser can affect the borderless table (such as figure 2 shown) to scan, scan and collect the pixel value and position information of the table, and determine the position information of the intersection of the frame lines.
[0097] 从canvas视图层中得到的表格横线在Y轴坐标信息为:107,131,151,170,190,209,228,248,267,287,306,325,345,364,38...
Embodiment 2
[0122] The form in the PDF file is a person's personal information, and the form is a form with incomplete borders, such as Figure 4 As shown, the content of the table is extracted.
[0123] The PDF file is uploaded to the website, and the rendering engine of the browser renders the PDF file, and renders the PDF file as an html view layer and a canvas view layer.
[0124] The browser can monitor the table area selected by the mouse, and the user moves the mouse to complete the border of the table. The browser senses the mouse movement, and the canvas technology draws the frame along the mouse movement position to completely draw the missing frame in the table. Such as Figure 5 shown.
[0125] The Canvas technology in the browser scans the border form to obtain the position information of the form in the selected area. Among them, the position information of the vertical line on the X axis is: [0,0], [171,0], [576,0]; the position information of the horizontal line on the ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


