Unlock instant, AI-driven research and patent intelligence for your innovation.

A Method for Extracting Text Lines of Handwritten Documents Based on Instance Segmentation

An extraction method and technology of text lines, applied in the field of image processing, can solve problems such as inaccurate extraction, and achieve an effect that is easy to implement and has good practical value

Active Publication Date: 2021-11-16
XIAN UNIV OF TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a method for extracting text lines of handwritten documents based on instance segmentation, which solves the problem that the extraction is not accurate enough in the current extraction of text lines of handwritten documents

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method for Extracting Text Lines of Handwritten Documents Based on Instance Segmentation
  • A Method for Extracting Text Lines of Handwritten Documents Based on Instance Segmentation
  • A Method for Extracting Text Lines of Handwritten Documents Based on Instance Segmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0060] This embodiment provides a method for extracting text lines of handwritten documents based on instance segmentation, which is specifically performed according to the following steps:

[0061] Step 1. Scale and zoom the images in the handwritten document dataset to finally obtain the training set;

[0062] Scale the images in the ICDAR2013HandSegmCont handwritten document dataset. Assume that the height and width of a picture are h and w respectively, if h≤max_size, w≤max_size, no scaling operation will be performed, otherwise, use the nearest neighbor interpolation method in the Image library to reduce the picture and label to height and images whose widths are h×scale and w×scale respectively, where, Among them, the value of max_size is 800, and the values ​​of h×scale and w×scale need to be rounded. Perform the above operations on each picture and its label in the training set to obtain the final training set.

[0063] Step 2, train the data set in the training se...

Embodiment 2

[0086] This embodiment provides a method for extracting text lines of handwritten documents based on instance segmentation, which is specifically performed according to the following steps:

[0087] Step 1. Scale and zoom the images in the handwritten document dataset to finally obtain the training set;

[0088] Scale the images in the ICDAR2013HandSegmCont handwritten document dataset. Assume that the height and width of a picture are h and w respectively, if h≤max_size, w≤max_size, no scaling operation will be performed, otherwise, use the nearest neighbor interpolation method in the Image library to reduce the picture and label to height and images whose widths are h×scale and w×scale respectively, where, Among them, the value of max_size is 1000, and the values ​​of h×scale and w×scale need to be rounded. Perform the above operations on each picture and its label in the training set to obtain the final training set.

[0089] Step 2, train the data set in the training s...

Embodiment 3

[0112] This embodiment provides a method for extracting text lines of handwritten documents based on instance segmentation, which is specifically performed according to the following steps:

[0113] Step 1. Scale and zoom the images in the handwritten document dataset to finally obtain the training set;

[0114] Scale the images in the ICDAR2013HandSegmCont handwritten document dataset. Assume that the height and width of a picture are h and w respectively, if h≤max_size, w≤max_size, no scaling operation will be performed, otherwise, use the nearest neighbor interpolation method in the Image library to reduce the picture and label to height and images whose widths are h×scale and w×scale respectively, where, Among them, the value of max_size is 600, and the values ​​of h×scale and w×scale need to be rounded. Perform the above operations on each picture and its label in the training set to obtain the final training set.

[0115] Step 2, train the data set in the training se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for extracting text lines of a handwritten document based on an instance segmentation network, which is characterized in that the method is specifically carried out in accordance with the following steps: Step 1: performing zooming operations on pictures in a handwritten document data set to finally obtain a training set; Step 2 : Train the data set in the training set obtained in step 1, so as to update the weight of the instance segmentation network; step 3: put the handwritten document image to be processed into the instance segmentation network obtained in step 2, and obtain the mapped mapping Figure; Step 4: Use the mean shift algorithm to perform clustering operations on the corresponding mapping vectors of the black pixels in the image of the handwritten document to be processed, and finally extract a single text line. The method for extracting the text line of the handwritten document based on the instance segmentation network of the present invention makes the extraction of the text line of the handwritten document more accurate.

Description

technical field [0001] The invention belongs to the technical field of image processing methods, in particular to a method for extracting text lines of handwritten documents based on instance segmentation. Background technique [0002] Text line extraction method of handwritten document is an important issue in image processing of handwritten document. Text line extraction is of great significance for the content recognition of handwritten documents and the extraction of individual characters of handwritten documents. Handwritten documents include photographic documents, checks, certificates, manuscript documents and many other types. Automatic processing of these documents can greatly reduce human labor. Handwritten document images are different from printed document images with relatively regular character sizes and arrangements. Handwritten documents have different writing styles, and there may be different sizes between characters, and there are likely to be sticking, c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00
CPCG06V30/413
Inventor 张九龙张振雄屈晓娥
Owner XIAN UNIV OF TECH