Text line extraction method and device

An extraction method and text line technology, applied in the field of image processing, can solve the problems of low extraction efficiency, affecting the text line extraction effect and extraction efficiency, and difficulty in adaptation.

Active Publication Date: 2019-04-19
IFLYTEK CO LTD
View PDF6 Cites 61 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for different text images, the size, size, and arrangement direction of characters vary widely, which makes it difficult for the rule matching calculation to adapt to all situations, which directly affects the extraction effect and efficiency of text lines, that is, the extraction results of text lines Often inaccurate and less efficient to extract

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text line extraction method and device
  • Text line extraction method and device
  • Text line extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0110] see figure 1 , a schematic flow chart of a text line extraction method provided in this embodiment, the method includes the following steps:

[0111] S101: By detecting the characters in the document image, each candidate text box containing the characters is formed.

[0112] It should be noted that this embodiment does not limit the way of obtaining document images. For example, a document image may be a document in which a user converts a paper document into an image format by scanning or taking a photo. This embodiment does not limit the document image The language of the character, for example, it can be Chinese, English and other characters.

[0113] After obtaining the document image to be detected, firstly, the characters in the document image can be detected by using existing or future character detection algorithms, so as to extract each candidate text box containing characters in the document image, wherein the candidate text The frame refers to the approxim...

no. 2 example

[0140] It should be noted that this embodiment will introduce a specific implementation manner of step S1021 in the first embodiment.

[0141]In this embodiment, after each candidate text frame of the document image is formed through step S101 in the first embodiment, each candidate text frame can be connected to one or more adjacent candidate text frames through undirected connecting lines Connect to form an undirected graph. It should be noted that an undirected connection line between every two candidate text boxes in the undirected graph corresponds to a weight value, and the weight value will be represented by a distance metric value . It should be noted that in the follow-up content, this embodiment will use a certain candidate text box in the document image as the standard to introduce how to connect the candidate text box with adjacent candidate text boxes through undirected connecting lines, and The connection methods of other candidate text boxes are similar and wil...

no. 3 example

[0206] It should be noted that this embodiment will introduce two specific implementation manners of step S1022 in the first embodiment.

[0207] In a first alternative implementation, see Figure 6 , which shows one of the schematic flowcharts for forming one or more target text regions by breaking at least one of the connecting lines between the candidate text boxes provided by this embodiment, the process includes the following steps :

[0208] S601: Find N candidate text boxes on the leftmost side in the document image, where N≥1.

[0209] In this embodiment, after step S102, each candidate text frame is connected with at least one adjacent candidate text frame through an undirected connection line to construct an undirected graph, such as Figure 5 The undirected graph shown in , when using the minimum spanning tree algorithm to generate the undirected graph, the entire undirected graph corresponds to a complete tree. From Figure 5 It can be seen that most of the adj...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text line extraction method and device. The method comprises the following steps of: obtaining a sample; detecting Characters in a document image; forming each candidate character box containing characters; and aggregating the candidate character boxes into one or more target text areas, the target text areas comprising at least one candidate character box, characters inthe at least one candidate character box belonging to at least one text line of the document image, and finally extracting each text line in the target text area. Visibly, the candidate character boxes are aggregated; aggregating the candidate character boxes of the document image into a target text area; According to the text line extraction method and the text line extraction device, each text line is extracted from the target text area, and various rules do not need to be set according to priori knowledge such as colors and sizes to define which candidate character boxes can be combined into the text line, so that the text line extraction method not only improves the accuracy of the extraction result of the text line, but also improves the detection efficiency.

Description

technical field [0001] The present application relates to the technical field of image processing, in particular to a text line extraction method and device. Background technique [0002] With the outbreak of information technology and big data industry, a large amount of image data is stored in digital form, and distributed and disseminated on the Internet. Because it contains a large amount of effective character information, it can be widely used in people's daily life. In many practical scenarios in life, such as license plate detection, content-based image search, classification, recommendation, filtering, mobile phone camera document recognition and robot automatic navigation, etc. Among them, high-precision text line extraction technology plays a decisive role in improving the effect and efficiency of these fields, so it has also received more and more research and attention. [0003] However, due to the diversity of characters in images in terms of color, font, size...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/20G06K9/34
CPCG06V30/40G06V10/22G06V30/153
Inventor 常欢崔瑞莲胡金水殷兵刘聪
Owner IFLYTEK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products