Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and a terminal for creating paper document structured data based on a deep learning model

A technology for structured data and paper documents, applied in biological neural network models, neural architectures, character and pattern recognition, etc., can solve problems such as bill displacement, low accuracy, and beyond the setting range, to improve efficiency and accuracy , improve accuracy and save resources

Active Publication Date: 2019-05-24
厦门商集网络科技有限责任公司
View PDF4 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this document structure system is only effective when the positions of the fields to be intercepted are fixed in all documents, which limits the scope of use of the system
In actual application, if the invoice printing system sets different key field content printing position settings or the key field content length changes, the content information of these key fields will be offset and exceed the setting range, resulting in errors
For some bill recognition applications, a large number of bills are stored in the computer by scanning or taking pictures with mobile phones, which can easily cause the displacement of bills in the image, and different bills may have different formats, and the position of the same field in the image is not necessarily the same , these characteristics make the above-mentioned document structuring scheme not suitable for businesses such as bill identification
For the above-mentioned document structuring scheme, the accuracy of converting the ORC recognition results of paper documents into structured document nodes is low for application scenarios where position shifts are prone to occur.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and a terminal for creating paper document structured data based on a deep learning model
  • A method and a terminal for creating paper document structured data based on a deep learning model
  • A method and a terminal for creating paper document structured data based on a deep learning model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0065] Such as image 3 As shown, the present invention provides a method for creating paper document structured data based on a deep learning model, including:

[0066] S1, a preset document training sample set; each sample in the training sample set includes a paper document OCR recognition result and an annotated document corresponding to the paper document OCR recognition result; the annotated document records the document OCR recognition Location and category information for each key field in the results.

[0067] Paper documents include but are not limited to text documents and bill documents; for example, 1000 bill pictures are collected and processed as samples, part of the samples are used as training samples, and part of them are used as test samples. Each ticket includes a certain number of fields, including key fields of interest. Each sample includes the OCR recognition results of paper documents, and a document with key fields marked. The annotation document r...

Embodiment 2

[0101] Such as Figure 6 As shown, the present invention also provides a terminal for creating paper document structured data based on a deep learning model, including one or more processors 1 and a memory 2, the memory 2 is stored with a program, and is configured to be used by all The one or more processors 1 perform the following steps:

[0102] S1. A preset document training sample set; each sample in the document training sample set includes a paper document OCR recognition result and an annotated document corresponding to the paper document OCR recognition result; the annotated document records the paper document The position information and category information of each key field in the document OCR recognition result.

[0103] For example, collect 1000 bill pictures, and use them as samples after processing, part of the samples are used as training samples, and part of them are used as test samples. Each ticket includes a certain number of fields, including key fields...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and a terminal for creating paper document structured data based on a deep learning model. The method comprises the following steps: training a sample set through a preset document; wherein each sample in the document sample set comprises a paper document OCR recognition result and a labeled document corresponding to the paper document OCR recognition result; wherein the labeled document records position information and category information of each key field in the OCR recognition result of the paper document; training a preset first deep learning model by using the training sample set to obtain a second deep learning model; enabling the second deep learning model to analyze a first paper document OCR recognition result to obtain position information and category information of each key field in the first paper document OCR recognition result; and creating a structured document corresponding to the first paper document OCR recognition result accordingto the position information and the category information of each key field in the first paper document OCR recognition result. The accuracy of converting the OCR result of the paper document into thestructured document is improved.

Description

technical field [0001] The invention relates to a method and terminal for creating paper document structured data based on a deep learning model, and belongs to the field of artificial intelligence paper document identification data processing. Background technique [0002] Converting paper documents to structured data is a method of extracting key field information from a large amount of text information in the ORC recognition results of paper documents, such as the payer, payment date, and payee in the receipt, and according to certain The process of saving the structure. After a large number of paper documents identified by OCR are processed in a structured manner, efficient document retrieval, document analysis and other intelligent services can be provided. The key and main technical difficulty of structured data processing of paper documents is to extract key field information from a large amount of text, including determining the location of the required key fields i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/34G06K9/62G06N3/04
Inventor 陈文传郝占龙林玉玲
Owner 厦门商集网络科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products