Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and apparatus for detecting reading order of document

A reading sequence and document technology, applied in the computer field, can solve the problems of unstable recognition performance and high error rate of reading sequence recognition, and achieve good robustness

Active Publication Date: 2018-07-27
TENCENT TECH (SHENZHEN) CO LTD
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In OCR technology, methods based on directed graphs, fixed rules, and semantic analysis are commonly used to identify the reading order of documents. However, these methods have a high error rate in identifying the reading order in complex environments or for complex document images. , there is a problem of unstable recognition performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for detecting reading order of document
  • Method and apparatus for detecting reading order of document
  • Method and apparatus for detecting reading order of document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0025] figure 1 It is a schematic diagram of the working environment of the present invention solution in one embodiment; the working environment for implementing the method for detecting the reading order of documents in the embodiment of the present invention is an intelligent terminal equipped with an OCR system, and the intelligent terminal at least includes a system bus connection A processor, a display module, a power interface and a storage medium, and the intelligent terminal recognizes and displays the text information contained in the document picture through the OCR system. Among them, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and apparatus for detecting a reading order of a document. The method includes the steps of recognizing text blocks included in a document picture, and constructing ablock set; determining a starting text block from the block set; executing a path finding operation on the starting text block according to feature information of the start text block to determine afirst text block corresponding to the start text block in the block set, wherein the feature information of each text block includes location information of the text block in the document picture andlayout information of the text block; repeating in a similar fashion until the execution order of the path finding operation corresponding to each text block in the block set can be uniquely determined; determining the execution order of the path finding operation corresponding to the text blocks in the block set, and obtaining the reading order of the text blocks in the document picture accordingto the execution order. According to the invention, the document reading order of various document pictures can be accurately recognized.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and a device for detecting document reading order. Background technique [0002] OCR (Optical Character Recognition) is a type of algorithm that describes document image recognition. It uses optical methods to convert the text in paper documents into black and white dot matrix image files for printed characters, and through recognition The software converts the text in the image into a text format for further editing and processing by word processing software. [0003] In OCR technology, methods based on directed graphs, fixed rules, and semantic analysis are commonly used to identify the reading order of documents. However, these methods have a high error rate in identifying the reading order in complex environments or for complex document images. , there is a problem of unstable recognition performance. Contents of the invention [0004] Embodiments of the presen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/20
CPCG06V30/414G06V10/22G06V10/10G06F18/00
Inventor 朱传聪
Owner TENCENT TECH (SHENZHEN) CO LTD