Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for recognizing format template

A template method and layout technology, applied in character and pattern recognition, instruments, computer parts, etc., can solve problems such as unrecognizable, impact on subsequent management of documents, waste of time, etc., and achieve the effect of improving efficiency

Inactive Publication Date: 2011-06-01
FOUNDER INTERNATIONAL CO LTD +1
View PDF5 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But at the same time, the information in the header and footer is sometimes relatively repetitive, such as company logo, document title, file name and author name, etc., in each page, both in position and content, are the same, for existing pages If the recognition method or system repeatedly recognizes the same content on multiple pages, a lot of time will be wasted, and the efficiency of recognition in the page area will be greatly reduced
For some very important information in the header and footer, such as page number information, if the page number information in a certain page is covered by stains, resulting in unrecognizable or inaccurate identification, it will have a great impact on the subsequent management of the entire document

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for recognizing format template
  • Method and system for recognizing format template
  • Method and system for recognizing format template

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The specific implementation manners of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0033] Such as figure 1 As shown, the present invention provides a format recognition template system, including:

[0034] The template library 11 is used to save the layout template for page recognition;

[0035] The recognition module 12 is used to match the mark block in the layout template with the connected domain of the page according to the position, and perform OCR recognition on the text in the mark block;

[0036] The evaluation module 13 is used to evaluate the accuracy and automatically correct the content of the same marked block identified in multiple pages;

[0037] The segmentation module 14 is configured to erase the page connection area corresponding to the identified marked block.

[0038] There may be multiple template libraries in the format recognition template system, and the construction of template li...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and system for recognizing a format template, belonging to the technical field of character recognition and comprising the following steps of: carrying out profile analysis on scanned pages, and finding out the format template with a superposition rate being up to a set threshold value with a page connected domain; matching tag blocks in the format template with information in the scanned pages, and extracting and recognizing information on page headers and footers matched with the tag blocks; cutting off the recognized tag blocks and submitting to following recognizing processes. A regional layout with a fixed format is recognized by adopting a region recognizing template and stripped from recognizing targets, thereby the efficiency of recognizing in a page region is increased. With the method provided by the invention, the recognized content of general scanned pages is simplified, and reorganization management can be performed manually and conveniently on the basis of the template.

Description

technical field [0001] The invention belongs to the technical field of OCR character recognition, and in particular relates to a format recognition template method and system. Background technique [0002] The header and footer are located at the top and bottom of each page in the document. They are often used to display additional information of the document. You can insert page numbers, graphics, company logos, document titles, file names, and author names. These information are important for document management. Very important. [0003] In the field of OCR text recognition technology, header and footer information is simple, but it is very important for the management of the entire document recognition. But at the same time, the information in the header and footer is sometimes relatively repetitive, such as company logo, document title, file name and author name, etc., in each page, both in position and content, are the same, for existing pages If the recognition metho...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/20
Inventor 周长岭赵海涛
Owner FOUNDER INTERNATIONAL CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products