Visual rich document information extraction method for actual OCR scene

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An information extraction, rich document technology, applied in the field of visual information extraction, which can solve the problems of complex OCR prediction, unclear named entity boundaries, and positioning frame extraction.

Active Publication Date: 2021-05-14

SOUTH CHINA UNIV OF TECH

View PDF7 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] Since visual features such as glyph, text position, layout, and font size are important cues for extracting information from document images, many methods incorporate document images into sequential labeling models In , better results were obtained compared to using only plain text, however, most of the existing research assumes that the OCR (Optical Character Recognition) results are accurate and cannot deal with the case of flawed OCR results

On the other hand, it is very complicated to achieve error-free OCR prediction of document images, and manually annotated positioning boxes cannot be directly used for information extraction in defective OCR results, because defective OCR results usually contain a large number of repeated or missed content, which directly affects the performance of the VIE model

In addition, VIE systems that fuse the positions of text segments will face the problem of unclear boundaries of named entities, which will cause a lot of post-processing to get the final correct result

Although VIE models should consider the problem that human annotations cannot fully match OCR results, as a downstream task of OCR, it has often been ignored in previous studies

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0052] Such as figure 1 , figure 2 As shown, in the present invention, a method for extracting visually rich document information in an actual OCR scene comprises the following steps:

[0053] S1. Collect visual rich text images with key information in the actual scene, and label the collected images with text lines, specifically:

[0054] In this embodiment, the visually rich text image data set includes data of a simple layout and a complex layout, which are respectively composed of bills, train tickets, passports and other data. Contains 4306, 1500, and 2331 in sequence, a total of 8137 images.

[0055] S11. Carry out labeling of text line position, text content and named entity attributes on the collected images, specifically:

[0056] The labeling of the named entity attribute is specifically for the named entity label under the actual OCR result, and the named entity label refers to the labeling of the sentence word using the sequence labeling method of BIO tagging; ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a visual rich document information extraction method for an actual OCR scene. The method comprises the following steps: collecting a visual rich text image in the actual scene; extracting text word embedding features and position embedding features of character levels and word levels by utilizing a pre-training word embedding model; training a named entity classification module; constructing a global document graph structure based on graph convolution GAT, and introducing a self-attention mechanism; training a named entity boundary positioning module; constructing a multi-feature aggregation structure; and training an error semantic correction module, adopting a decoding structure of a GRU, extracting a coding hidden state of a corresponding dimension feature according to an optimal path of a CRF, and guiding output of a decoder every time by taking category information of a named entity as prior guidance information to obtain entity naming information in a standard format. According to the visual rich document information extraction method, the precision of the visual rich document information extraction method in actual OCR detection and recognition application is effectively improved, and the visual rich document information extraction method is of great significance to structured storage of visual rich document information.

Description

technical field [0001] The invention belongs to the technical field of visual information extraction, and in particular relates to a method for extracting visually rich document information in an actual OCR scene. Background technique [0002] Visual Information Extraction (VIE), as an important part of Natural Language Processing (NLP), aims to directly extract structured information from unstructured document images, which is a key step in understanding document images. The extracted structured information is widely used in many occasions, such as fast indexing, efficient archiving, and document analysis. The typical method is to formulate the information extraction problem as a sequential labeling problem. In recent years, information extraction from document images (such as invoices, ID cards, and purchase receipts, etc.) has become a research hotspot. [0003] Since visual features such as glyphs, text position, layout, and font size are important cues for extracting i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06F40/295G06F16/35G06F40/30

CPCG06F40/295G06F16/353G06F40/30G06V30/414G06V30/40G06V30/10

Inventor 唐国志金连文林上港汪嘉鹏薛洋

Owner SOUTH CHINA UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Visual rich document information extraction method for actual OCR scene

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology