Layout analysis method and system for automatic classification of test paper content

An automatic classification and layout analysis technology, which is applied in the fields of instruments, biological neural network models, calculations, etc., can solve the problem that offline documents are not applicable, cannot cope well with document layout analysis tasks with complex content, and the layout structure is complex and changeable, etc. problem, to achieve the effect of improving the classification accuracy

Active Publication Date: 2021-06-25
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the documents in this work are online handwritten documents, and the graph model structure used by the author is a linear chain conditional random field, which is not suitable for offline documents
LSTM-based classification [4] uses LSTM to model the context information of time series, but it tends to ignore the spatial context information, which may be crucial for classification
[0004] In general, although researchers have proposed many layout analysis methods for document content classification, they mainly focus on relatively simple document images.
For complex test paper document images, due to the rich and varied content contained in it, coupled with the complex and changeable layout structure, it brings great challenges to the existing layout analysis methods.
Although there are some methods based on structured prediction, the potential energy function or network structure used is still relatively elementary, and the structured prediction method based on the general undirected cyclic graph structure has not been fully studied, so it is still Does not cope well with document layout analysis tasks with complex content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Layout analysis method and system for automatic classification of test paper content
  • Layout analysis method and system for automatic classification of test paper content
  • Layout analysis method and system for automatic classification of test paper content

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The technical problems solved by the embodiments of the present invention, the technical solutions adopted and the technical effects achieved are clearly and completely described below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present application, not all of them. Based on the embodiments in the present application, all other equivalent or obviously modified embodiments obtained by those skilled in the art without creative efforts fall within the protection scope of the present invention. Embodiments of the invention can be embodied in many different ways as defined and covered by the claims.

[0052] It should be noted that, in the following description, many specific details are given for the convenience of understanding. It may be evident, however, that the present invention may be practiced without these specific details.

[0053] It should be noted that, in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention proposes a layout analysis method and system for automatic classification of test paper content. The method includes: obtaining an input document image; extracting connected components of the document image to form an original set of connected components; The connected components are classified into text and non-text, and the first set of text connected components and the set of non-text connected components are obtained; for each connected component in the set of non-text connected components, the text components are detected and segmented, and the connected components of the non-text classification are obtained. The text components in the connected components, and add this component to the first text connected component set to obtain the second text connected component set; for each connected component in the second text connected component set, carry out the classification of printed characters and handwritten characters ; Output the classification result of the document image content. By adopting the method of the invention, the classification problem of elements is transformed into a global optimization problem for solving the maximum joint probability of all elements, so that the overall classification accuracy rate can be improved.

Description

technical field [0001] The invention relates to the technical field of electronic equipment, in particular to a layout analysis method and system for automatic classification of test paper contents. Background technique [0002] The layout analysis algorithm of complex document images occupies a vital position in the field of document analysis and recognition, especially with the application of deep learning in the field of text recognition in recent years, single character recognition, word recognition and string recognition have achieved great success. The very high accuracy rate makes layout analysis the bottleneck in the entire document analysis and recognition process. In many cases, there may be more than one type of content in a document, but multiple types of content such as text, geometric figures, illustrations, tables, formulas, and background noise. For the text category, there may be a mixture of printed text, handwritten text, different languages, different la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/34G06K9/32G06K9/68G06N3/04
CPCG06V10/25G06V10/267G06V30/2455G06N3/045
Inventor 刘成林李晓辉殷飞
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products