General document identification method and system, terminal and storage medium

A recognition method and document technology, applied in neural learning methods, unstructured text data retrieval, text database clustering/classification, etc. Sexual problems, etc.

Pending Publication Date: 2021-04-23
上海深杳智能科技有限公司 +1
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the model based on named entity recognition also has great defects: 1) the text content of the document is concatenated into a sequence to deal with the loss of a large amount of spatial information of the document content; 2) the named entity recognition model only uses the text content information of the document, and does not use Other information such as the image features of the above document greatly affects the effect of understanding the content of the document
This method still has the problem of poor generalization and applicability. For other unset templates, the accuracy is very low, and the universal recognition of documents cannot be achieved.
[0008] To sum up, the existing methods for structuring document content usually have problems such as poor versatility, poor flexibility, poor robustness, and poor accuracy. At present, no description or report similar to the technology of the present invention has been found, nor has it been collected at home and abroad. similar information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • General document identification method and system, terminal and storage medium
  • General document identification method and system, terminal and storage medium
  • General document identification method and system, terminal and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0079] The following is a detailed description of the embodiments of the present invention: this embodiment is implemented on the premise of the technical solution of the present invention, and provides detailed implementation methods and specific operation processes. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention.

[0080] figure 1 It is a flowchart of a general document recognition method in an embodiment of the present invention.

[0081] Such as figure 1 As shown, the general document recognition method provided in this embodiment may include the following steps:

[0082] S100. Obtain text information of one or more text fields in the document, where the text information includes: text content and a text bounding box;

[0083] S200. Obtain category information corresponding to one or more tex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a universal document recognition method, which comprises the following steps of: obtaining text information of one or more text fields in a document, the text information comprising text content and a text bounding box; obtaining category information in one-to-one correspondence with one or more text fields in the document, wherein the category information at least comprises a primary key field category Key and a value field category Value; obtaining a connection relationship between the character field of which the category is Key and other character fields; and on the basis of the connection relationship, obtaining a Value-class text field connected or disconnected with a Key-class text field and/or a Key-class text field as structured content corresponding to the Key-class text field, determining class information and text information of the structured content, and completing identification of the document. Meanwhile, the invention provides a corresponding system, a terminal and a storage medium. According to the method and the device, the accuracy and universality of document structured content identification are improved.

Description

technical field [0001] The present invention relates to the technical fields of computer word processing and named entity recognition, in particular, to a general document recognition method and system. Background technique [0002] Document processing automation uses artificial intelligence technology to help people free themselves from complicated electronic document processing tasks, and one of the key tasks is automatic document analysis and recognition technology. Faced with a large number of unmarked electronic documents, such as purchase receipts, insurance policy documents, customs declarations, etc., it will consume a lot of manpower and material resources to extract key information through manual processing. How to effectively use artificial intelligence to extract key and interesting information from documents is very important. [0003] Existing methods for document content structuring include traditional rule-based methods based on string matching, methods base...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F40/295G06N3/04G06N3/08
Inventor 周异陈凯何建华
Owner 上海深杳智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products