Layout analysis method and system for automatically classifying test paper contents

An automatic classification and layout analysis technology, which is applied to instruments, biological neural network models, character and pattern recognition, etc., can solve the task of layout analysis of documents with complex content, offline documents are not applicable, and spatial context is ignored Information and other issues

Active Publication Date: 2019-04-26
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF2 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the documents in this work are online handwritten documents, and the graph model structure used by the author is a linear chain conditional random field, which is not suitable for offline documents
LSTM-based classification [4] uses LSTM to model the context information of time series, but it tends to ignore the spatial context information, which may be crucial for classification
[0004] In general, although researchers have proposed many layout analysis methods for document content classification, they mainly focus on relatively simple document images.
For ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Layout analysis method and system for automatically classifying test paper contents
  • Layout analysis method and system for automatically classifying test paper contents
  • Layout analysis method and system for automatically classifying test paper contents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The technical problems solved by the embodiments of the present invention, the technical solutions adopted and the technical effects achieved are clearly and completely described below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present application, not all of them. Based on the embodiments in the present application, all other equivalent or obviously modified embodiments obtained by those skilled in the art without creative efforts fall within the protection scope of the present invention. Embodiments of the invention can be embodied in many different ways as defined and covered by the claims.

[0052] It should be noted that, in the following description, many specific details are given for the convenience of understanding. It may be evident, however, that the present invention may be practiced without these specific details.

[0053] It should be noted that, in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a layout analysis method and system for automatically classifying test paper contents. The method comprises the following steps: acquiring an input document image; Extracting communicating parts of the document image to form an original communicating part set; Performing text and non-text classification on each communication component according to the communication components of the document image, and obtaining a first text communication component set and a non-text communication component set; Detecting and segmenting the character components of each communication component in the non-text communication component set to obtain the character components adhered to the communication components of the non-text classification, and adding the character components into the first text communication component set to obtain a second text communication component set; Classifying the printed characters and the handwritten characters for each communication component in thesecond text communication component set; And outputting a classification result of the document image content. By the adoption of the method, the classification problem of the elements is converted into a global optimization problem for solving the maximum joint probability of all the elements, and therefore the overall classification accuracy can be improved.

Description

technical field [0001] The invention relates to the technical field of electronic equipment, in particular to a layout analysis method and system for automatic classification of test paper contents. Background technique [0002] The layout analysis algorithm of complex document images occupies a vital position in the field of document analysis and recognition, especially with the application of deep learning in the field of text recognition in recent years, single character recognition, word recognition and string recognition have achieved great success. The very high accuracy rate makes layout analysis the bottleneck in the entire document analysis and recognition process. In many cases, there may be more than one type of content in a document, but multiple types of content such as text, geometric figures, illustrations, tables, formulas, and background noise. For the text category, there may be a mixture of printed text, handwritten text, different languages, different la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/34G06K9/32G06K9/68G06N3/04
CPCG06V10/25G06V10/267G06V30/2455G06N3/045
Inventor 刘成林李晓辉殷飞
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products