Unlock instant, AI-driven research and patent intelligence for your innovation.

Deep learning-based dense text picture information extraction method

A text image and information extraction technology, applied in neural learning methods, instruments, biological neural network models, etc., can solve problems such as inability to extract information, and achieve the effect of reducing training sets and saving labor

Pending Publication Date: 2020-03-27
BEIHANG UNIV
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In view of this, the present invention provides a method for extracting information from dense text pictures based on deep learning to solve the problem that the existing OCR technology can only recognize the text in the picture and convert it into text, but cannot extract information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep learning-based dense text picture information extraction method
  • Deep learning-based dense text picture information extraction method
  • Deep learning-based dense text picture information extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only for illustration and are not intended to limit the present invention.

[0042] A method for extracting information from dense text images based on deep learning provided by the present invention, such as image 3 and Figure 4 shown, including the following steps:

[0043] S1: Preprocessing dense text images with complete semantic structure;

[0044] Specifically, dense text refers to text files with language structure, not graphic files, such as notarial certificates, letters of introduction, employment contracts, employment certificates, property certificates, etc.; preprocessing can include color to black and white, watermark removal, etc.;

[0045] S2: Use OCR software to convert the preprocessed dense text image int...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a deep learning-based dense text picture information extraction method. Many data on dense text pictures are automatically collected, picked, sorted and structured through an artificial intelligence method; a deep learning model with Chinese language understanding capability through a large amount of learning is used; through automatic machine learning, a user can train theinformation extraction model according to own needs without artificial intelligence knowledge, the user is helped to automatically customize different information extraction models to extract different information, and customized services are provided for different application scenes / users. Moreover, the information extraction model can reduce a training set to the greatest extent through a pre-training mode. The method mainly solves the problem of information extraction of dense text pictures, can save a large amount of labor, and provides data support for office automation, information query, big data, artificial intelligence technology based on big data and other applications.

Description

technical field [0001] The invention relates to the technical fields of artificial intelligence, optical character recognition and machine reading, and in particular to a method for extracting information from dense text images based on deep learning. Background technique [0002] Optical Character Recognition (OCR) refers to the process of performing text recognition on scanned text images and turning them into editable documents. This process generally only returns all the text on the picture, and does not include the extraction of specific information. However, many application scenarios not only need to recognize text, but also need to extract the above information. For example, when a bank loan examines the employment certificate, it needs to extract the applicant's unit, income, position, entry time and other information, which usually requires manual intervention. Direct reading Scanning documents and entering the required information into the system takes time and e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/20G06N3/04G06N3/08
CPCG06N3/08G06V10/22G06N3/044G06N3/045
Inventor 屈晓磊万波朱跃飞
Owner BEIHANG UNIV