Unlock instant, AI-driven research and patent intelligence for your innovation.

Classification and Information Extraction Method of Formatted Fax Based on OCR

An information extraction and fax technology, applied in the field of image processing, can solve the problems of inability to realize fax image classification and information extraction, inability to scan faxes, and difficulty in extracting key information, so as to improve office work efficiency, high accuracy of information extraction, The effect of fast classification

Active Publication Date: 2020-09-29
JIANGSU HONGXIN SYST INTEGRATION
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the system can only realize the classification and indexing of files, and it is difficult to extract key information in files
[0004] Chinese Patent Publication No. CN102222289 discloses an OCR-based mobile phone financial management method and system. The system uses OCR technology to analyze and identify financial bills, but it cannot target formatted scanned faxes, and cannot realize the classification and information extraction of fax images.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classification and Information Extraction Method of Formatted Fax Based on OCR

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The following is based on figure 1 The specific embodiment of the present invention is further described:

[0045] see figure 1 , this embodiment is applicable to any formatted fax, wherein the formatted fax is an image fax with a form. This embodiment takes the fax of a bill as an example, and the details are as follows:

[0046] A method for classifying and extracting information of formatted faxes based on OCR, specifically comprising the following steps:

[0047] Step 1: Obtain the faxed image file of the bill, perform adaptive threshold binarization on the image, and reduce noise interference;

[0048] Step 2: Determine the inclination angle of the image, and correct the image;

[0049] Step 3: Find the outline of the largest bounding box of the table in the corrected image, and intercept the banknote area of ​​the image from the upper area of ​​the largest bounding box of the table in the image;

[0050] Step 4: filter the font outlines in the header area and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for classifying and extracting information of a formatted fax based on OCR, which includes: binarizing the fax image with an adaptive threshold; correcting the image; and finding the largest bounding box of the table in the corrected image The outline of the image, intercepting the header area of ​​the image from the upper area of ​​the largest bounding box of the table in the image; filtering the font outline in the header area and merging the font outline; detecting the number of fields after the header area is merged, and performing image processing Classification; extract the image that has been successfully classified, and locate the area to be identified in the image; identify the field of the area to be identified in the table according to the OCR recognition technology; optimize the identified field. The present invention can improve office work efficiency, liberate employee productivity, realize transformation from unstructured data to structured data, and is suitable for formatted faxes, that is, faxes of form images, such as standardized contracts, self-made vouchers, bills, etc.

Description

technical field [0001] The invention relates to the field of image processing, in particular to a method for classifying and information extracting formatted faxes based on OCR. Background technique [0002] With the advancement of science and technology, business exchanges across countries and regions are becoming more and more frequent. Because faxes have special legal effects compared with other document transmission methods, they are widely used in office systems. Formatted fax files contain a lot of useful information. At present, these fax files need to be manually classified and the important information is manually extracted, which is inefficient. There is an urgent need for an efficient and fast file classification and information extraction method to improve the work efficiency of employees, reduce labor costs, and release productivity. [0003] Chinese Patent Publication No. CN101876999 discloses a method for generating a fax index, a message analysis device and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/32G06K9/34G06K9/38G06K9/40G06K9/62
CPCG06V10/242G06V30/158G06V10/28G06V10/30G06V30/10G06F18/2411
Inventor 于志文车少帅胡笳吴洲洋周玲
Owner JIANGSU HONGXIN SYST INTEGRATION