Literature table content recognition and information extraction method based on image processing
An image processing and content recognition technology, applied in character and pattern recognition, special data processing applications, instruments, etc., can solve the problems of re-recovery of unrecognized content, inability to meet the diverse forms of tables, and unsatisfactory recognition effect, etc. Thorough removal of frame lines, effective and feasible methods, and the effect of promoting research and development
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0050] In this embodiment, a method for identifying and extracting information from tables in documents based on image processing includes the following steps:
[0051] (1) Read a document, extract part of the content of the table in the document, convert it into a picture format and save it, and store the picture access path into the path list;
[0052] (2) Read a table picture, remove the frame line of the table picture, including binarization, open operation to extract straight lines, and bitwise AND calculation. When performing straight line extraction, use different kernels to perform open operations, and the extraction level and The straight line in the vertical direction is then superimposed on the same picture, and then the bitwise AND operation is performed with this picture and the binary picture to complete the removal of the table frame;
[0053] (3) Acquisition and cutting of the text area is to expand the table image after the frame line has been removed and bina...
Embodiment 2
[0059] This embodiment is basically the same as Embodiment 1, especially in that:
[0060] In this embodiment, step (2) processes the input form image to obtain a binary image without frame lines. The specific steps are as follows:
[0061] (2-1) The original image is first converted into a grayscale image, and then the inverse binarization of a fixed threshold is performed to obtain the binary image of the original image;
[0062] (2-2) First perform an opening operation on the original image binary image to maintain the vertical direction feature, and obtain a vertical line binary image that only retains vertical lines; then perform an open operation on the original image binary image that maintains the horizontal direction feature, and obtain Horizontal line binary map with only horizontal lines preserved;
[0063] (2-3) Superimpose the binary image of vertical lines and the binary image of horizontal lines and then invert it to obtain the binary image of frame line, in wh...
Embodiment 3
[0067] This embodiment is basically the same as the previous embodiment, and the special features are:
[0068] In the present embodiment, for step (3), it is mainly to identify and cut out the area with characters in the form from the form picture, and the specific steps are as follows:
[0069] (3-1) Corrosion operation is performed on the binarized table image with the frame line removed, focusing on strengthening the corrosion in the horizontal direction, so that adjacent characters are connected into a whole block;
[0070] (3-2) Use the contour discovery technology based on the binary image to find out all the candidate target areas on the corroded picture, and number each target area in turn;
[0071] (3-3) The target area is screened, and the target area with an area smaller than the threshold pixel number is filtered out, and the rest is the target character block area that meets the conditions and is to be recognized;
[0072] (3-4) According to the coordinate range...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com