Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Document recognition method and device and storage medium

A document and recognition model technology, applied in the field of text recognition, can solve problems such as Excel tables that cannot recognize special shapes

Active Publication Date: 2019-12-31
盈盛智创科技(广州)有限公司
View PDF10 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a method, device and storage medium for document recognition, so as to solve the problem that the purpose of recognizing abnormally shaped Excel tables in PDF cannot be realized

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document recognition method and device and storage medium
  • Document recognition method and device and storage medium
  • Document recognition method and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0059] Figure 1A It is a flow chart of a document identification method provided by Embodiment 1 of the present invention. This embodiment is suitable for identifying information in non-editable documents (such as pictures or documents in PDF format), especially for identifying irregular forms and text belonging to the forms in non-editable documents. The method can be performed by a device for document identification, which can be implemented by software and / or hardware, and which can be configured in an electronic device with data processing capabilities, such as a mobile phone, a tablet computer, a wearable device, etc. (such as smart glasses, smart watches), etc., the electronic device is equipped with a screen and a central processing unit (CPU).

[0060] refer to Figure 1A , the method specifically includes:

[0061] S101. Receive a first document.

[0062] There are pages in the first document, and the number of pages is not limited. Each page can include different...

Embodiment 2

[0085] Figure 2A It is a flow chart of a document identification method provided by Embodiment 2 of the present invention. This embodiment is refined on the basis of the first embodiment, and describes in detail the specific steps of locating the sub-region formed by the intersection points in the region. refer to Figure 2A , the method includes:

[0086] S201. Receive a first document.

[0087] S202. Determine an element identification model.

[0088] Element recognition models are pre-trained models for recognizing target elements. The model can be constructed by means of deep learning or neural network.

[0089] In a feasible implementation manner, an ANN classification model is built through training samples to identify target elements, and is applied to test samples to output detection results. First, for a given sample pair {(xi,yi), xi∈RN, yi={0,1,2,...,100}}, where xi is the training sample and x is the sample to be judged, a parameter The adaptively adjusted ...

Embodiment 3

[0117] image 3 A structural diagram of a device for document identification provided by Embodiment 3 of the present invention. The device comprises: a first document receiving module 31, an area extracting module 32, an intersection detection module 33, a sub-region determining module 34, a character recognition module 35, a second form generating module 36 and a second form writing module 37, wherein:

[0118] The first document receiving module 31 is configured to receive a first document, the first document has pages;

[0119] an area extraction module 32, configured to extract an area having a target element from the page, and the target element includes a first table;

[0120] An intersection detection module 33, configured to detect an intersection in the area, where the intersection is a position where at least two line segments intersect;

[0121] A sub-area determining module 34, configured to locate in the area a sub-area composed of the intersection points, the s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a document recognition method and device and a storage medium. The method comprises the steps of receiving a first document, wherein the first document is provided with a page;extracting an area with a target element from the page, wherein the target element comprises a first table; detecting an intersection point in the area, wherein the intersection point is the intersection position of at least two line segments; positioning a subarea formed by intersection points in the area, wherein the subarea is used for representing cells in the first table; identifying characters located in the subareas; generating a second table the same as the first table; and writing the characters into the second table. The beneficial effect of reconstructing the Excel table in the first document through the cells, especially the special-shaped Excel table, is achieved.

Description

technical field [0001] Embodiments of the present invention relate to character recognition technologies, and in particular to a method, device and storage medium for document recognition. Background technique [0002] According to the generation process of a layout document, a document is a collection of data and structure, specifically including content data, physical structure and logical structure. Document analysis is to extract the physical structure of the document, while document understanding is to establish a mapping relationship between the physical structure and the logical structure. In practical applications, the readability requirements of mobile devices make the restoration of physical and logical structures particularly important. The detection and identification of tables in the page is one of the key points of document understanding. Tables have their own independent logical functions, which need to be physically divided and logically labeled. A table o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/62
CPCG06V30/412G06V30/10G06F18/241G06F18/2411G06F18/24323
Inventor 黄劲梁泽龙康阳
Owner 盈盛智创科技(广州)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products