Ticket recognition training sample synthesis method and computer storage medium

A technology of training samples and synthesis methods, applied in the field of text recognition, can solve problems such as unbalanced characters and uncontrollable number of real samples, and achieve the effect of solving unbalanced character coverage and uncontrollable number of samples

Active Publication Date: 2019-08-23
阳光保险集团股份有限公司
View PDF7 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the above problems, the present invention proposes a method for synthesizing training samples for ticket recognition and a computer storage medium. By generating synthetic training samples to replace real samples for model training, problems such as uncontrollable number of existing real samples and unbalanced characters can be solved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Ticket recognition training sample synthesis method and computer storage medium
  • Ticket recognition training sample synthesis method and computer storage medium
  • Ticket recognition training sample synthesis method and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0059] Please refer to figure 1 , this embodiment proposes a method for synthesizing training samples for ticket recognition, which can be applied to the text recognition model training of various tickets, such as real estate certificates, land certificates, etc. Uncontrollable samples for model training with real training samples, unbalanced character coverage and other problems. The method will be described in detail below.

[0060] Step S100, perform character sampling from the corpus according to preset rules to obtain a character sampling set, read characters from the character sampling set to generate a sample character string with a predetermined length, and form a plurality of sample character strings into sample characters string collection.

[0061] In this implementation, in order to replace real ticket samples with artificially synthesized training samples, the text characters needed for the training samples to be synthesized are obtained first. Among them, the ...

Embodiment 2

[0113] Please refer to Figure 8 , based on the ticket recognition training sample synthesis method of the above-mentioned embodiment, this embodiment proposes a ticket recognition training sample synthesis device 10, comprising:

[0114] The sample character string acquisition module 100 is used to perform character sampling from the corpus according to preset rules to obtain a character sample set, and read characters from the character sample set to generate a sample character string with a predetermined length. Sample strings make up a sample string set.

[0115] The foreground text mask image generating module 200 is configured to perform text mask preprocessing on each sample character string and generate a corresponding foreground text mask image.

[0116] The secondary image fusion module 300 is configured to perform secondary image fusion on each foreground text mask image and the corresponding selected ticket background image to obtain a synthetic training sample se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a ticket recognition training sample synthesis method and a computer storage medium. The method comprises the following steps: carrying out character sampling from a corpus according to a preset rule to obtain a character sampling set, reading characters from the character sampling set to generate sample character strings with a predetermined length, and forming a sample character string set by a plurality of sample character strings; carrying out character mask preprocessing on each sample character string and generating a corresponding foreground character mask image;and carrying out secondary image fusion on each foreground character mask image and the correspondingly selected ticket background image to obtain a synthetic training sample set for ticket identification. According to the technical scheme provided by the invention, the training sample required for artificially synthesizing the ticket text identification can be realized to replace a real sample to carry out model training, so that the problems that the number of samples of the real sample is uncontrollable and the like can be solved.

Description

technical field [0001] The invention relates to the technical field of text recognition, in particular to a method for synthesizing training samples for ticket recognition and a computer storage medium. Background technique [0002] With the development of smart phone technology, it is becoming more and more popular for users to use mobile phones to take pictures of various bills and certificates (such as real estate certificates, etc.) and upload them as business certificates when handling financial and insurance business. Taking text information in pictures for information entry or information review and comparison can not only improve efficiency, reduce costs, but also improve user experience. [0003] In the existing OCR technology system based on deep learning, it is generally divided into two steps: text detection and text recognition. Text recognition mostly uses the overall recognition of text strings, and the recognition model is trained based on real samples. Howe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06K9/00
CPCG06V30/413G06V30/287G06F18/25G06F18/214
Inventor 田强邓冠群李树凯
Owner 阳光保险集团股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products