Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for generating text line classifier

一种文本行、分类器的技术,应用在模式识别领域,能够解决算法准确率低、鲁棒性差、适用性差等问题,达到应用范围广、准确率高、实现简单的效果

Active Publication Date: 2020-02-18
ALIBABA GRP HLDG LTD
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In the existing text area recognition of Chinese OCR, the method of classification using empirical threshold is the simplest, and the features for judgment are mostly from the text features extracted from single-character verification, but the algorithm has low accuracy and robustness. Poor, prone to over-fitting phenomenon; the second type of method is currently the mainstream solution, the use of the third type of method is rare, the main reason is that the CNN method will consume too much computing resources and affect the overall efficiency of the algorithm, but , whether it is the second type of method or the third type of method, a large number of samples need to be labeled, which will inevitably consume a lot of labor costs, and the classification effect depends on the extraction of features and the selection of samples, so it is often necessary for different application requirements. Annotate a batch of new business-dependent data, that is, new samples. Therefore, the existing annotation samples have poor applicability. Not only that, Chinese characters have various fonts and complex styles, including simplified, traditional, and handwriting. The diversity is extremely rich, which undoubtedly greatly increases the difficulty of identifying Chinese text areas.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for generating text line classifier
  • Method and device for generating text line classifier
  • Method and device for generating text line classifier

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, and are intended to explain the present application, and should not be construed as limiting the present application.

[0025] The method and device for generating a text line classifier according to the embodiments of the present application will be described below with reference to the accompanying drawings.

[0026] Figure 1a It is a flowchart of a method for generating a text line classifier in an embodiment of the present application.

[0027] like Figure 1a As shown, the generation method of the text line classifier includes:

[0028] S101. Generate a text line sample by using the font library of the current terminal system.

[0029...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present application discloses a method and device for generating a text line classifier, wherein the method for generating a text line classifier includes: using the current terminal system font library to generate text line samples; performing feature extraction on the text line samples and pre-stored marked samples; And perform model training according to the extracted features to generate a text line classifier for identifying text regions. The generation method of the above text line classifier is based on the method of generating text line samples in the system font, so that the generated text line classifier can identify text regions for different scenarios or different needs, has strong applicability, wide application range and simple implementation, and at the same time The feature extraction method of text line samples combined with labeled samples makes the generated text line classifier have high accuracy.

Description

technical field [0001] The present application relates to the technical field of pattern recognition, in particular to a method and device for generating a text line classifier. Background technique [0002] At present, many pictures such as Taobao pictures contain a large number of prohibited texts. In order to identify these prohibited texts, the optical character recognition (Optical Character Recognition, OCR) technology of natural scene pictures can be used to filter the results of text detection and positioning, and filter out non-text The detection results are selected, and candidate texts are screened out and sent to the recognition device, thereby improving the accuracy of recognition. [0003] Among them, the OCR technology of natural scenes has always been one of the hotspots in the industry and academic research. For different languages, the features used and the algorithm architecture will change. At present, the OCR technology in the world is mainly for Englis...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/20G06K9/62G06V30/224G06V30/413
CPCG06V30/413G06V30/2268G06V30/2445G06V30/293G06V30/1914G06F18/28
Inventor 金炫王天舟薛琴
Owner ALIBABA GRP HLDG LTD