Unlock instant, AI-driven research and patent intelligence for your innovation.

Scene character recognition method based on visual language modeling network

A visual language and text recognition technology, applied in character recognition, character and pattern recognition, reasoning methods, etc., can solve the problem that it is difficult to fully consider and effectively integrate text recognition, large additional computing overhead, scene text recognition speed and accuracy need to be improved and other issues to achieve the effect of improving recognition ability and enhancing visual features

Active Publication Date: 2021-03-23
BEIJING RES INST UNIV OF SCI & TECH OF CHINA +2
View PDF13 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But there are two problems in these methods: 1) Because of the additional introduction of the language model structure, resulting in a large amount of additional computational overhead
2) Due to the modeling of visual information and language information in two separate modules, it is difficult for the network to fully consider and effectively fuse two independent information to achieve accurate text recognition
Therefore, the speed and accuracy of scene text recognition still need to be improved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scene character recognition method based on visual language modeling network
  • Scene character recognition method based on visual language modeling network
  • Scene character recognition method based on visual language modeling network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0021] Embodiments of the present invention provide a scene text recognition method based on a visual language modeling network, such as figure 1 As shown, it mainly includes:

[0022] Construct a visual model including a backbone network, a position-aware mask generation module and a visual semantic reasoning module, and use the position-aware mask generation module to guide the visual semantic reasoning module to deduce the occluded character inf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a scene character recognition method based on a visual language modeling network, and the method comprises the steps: enabling a visual model to directly recognize a complete word-level result in a shielded character image feature in a training process, and guiding the visual model to deduce the shielded character content according to visual context information, so that thevisual model is endowed with language ability; and therefore, under the condition that an extra language model structure does not need to be introduced, the visual model captures language informationin the visual context in a self-adaptive mode to enhance visual features, and therefore the recognition capacity is improved. Moreover, the whole generation process of the character-level mask only needs the original word-level annotation without introducing additional annotation information; in a test process, only a backbone network and a visual semantic reasoning module are used for identification, so that a position-aware mask generation module is only used in a training process, and extra calculation overhead is not introduced.

Description

technical field [0001] The invention relates to the technical field of natural scene text recognition, in particular to a scene text recognition method based on a visual language modeling network. Background technique [0002] Natural scene text recognition is a general text recognition technology, which has become a hot research direction in the field of computer vision and document analysis in recent years, and has been widely used in autonomous driving, license plate recognition, and helping the visually impaired. The goal of this task is to convert the text content in the image into editable text. [0003] Due to the characteristics of low resolution, complex background, and susceptibility to noise interference in natural scenes, traditional text recognition technology cannot be applied to natural scenes. Therefore, character recognition technology in natural scenes has great research significance. [0004] With the development of deep learning technology in the field ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/32G06K9/62G06N5/04
CPCG06N5/04G06V20/62G06V30/10G06F18/214
Inventor 张勇东王裕鑫谢洪涛柳轩
Owner BEIJING RES INST UNIV OF SCI & TECH OF CHINA