Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Document image Chinese keyword detection method and system based on single word matching

A keyword detection and document image technology, which is applied in character and pattern recognition, instruments, computer components, etc., can solve the problems of Chinese character arrangement diversity, insufficient accuracy and robustness of Chinese keyword recognition, and unstable image quality of documents and images and other issues to achieve the effect of reducing the risk of omission, improving accuracy and improving integrity

Active Publication Date: 2019-07-26
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF9 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the above-mentioned problems in the prior art, that is, in order to solve the problem of insufficient accuracy and robustness of Chinese keyword recognition caused by the instability of document image image quality and the diversity of Chinese character arrangements, the first aspect of the present invention proposes A method for detecting Chinese keywords in document images based on word matching is proposed, the method includes the following steps:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document image Chinese keyword detection method and system based on single word matching
  • Document image Chinese keyword detection method and system based on single word matching
  • Document image Chinese keyword detection method and system based on single word matching

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention, rather than Full examples. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0055] The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain related inventions, rather than to limit the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of text image recognition. The invention relates to a detection method and system, in particular to a document image Chinese keyword detection method and system based on single word matching. In order to solve the problems of low accuracy and robustness of Chinese keyword recognition caused by unstable quality of a document image and diversity of Chinese character arrangement, the method comprises the following steps: carrying out binarization processing on the document image to obtain a first image; performing character detection to obtain a firstcandidate character set; filtering the first candidate character set to obtain a second candidate character set and a first noise candidate character set; screening characters from the first noise candidate character set and adding the characters to a second candidate character set to obtain a third candidate character set; performing candidate character combination to obtain a first candidate word set; carrying out secondary detection on the lost characters to obtain a second candidate word set; and selecting a final keyword detection result based on the cost function. According to the method, the accuracy of document keyword recognition is improved, and the robustness is high.

Description

technical field [0001] The invention belongs to the technical field of text image recognition, and in particular relates to a method and a system for detecting Chinese keywords in document images based on word matching. Background technique [0002] The development of science and technology has made the way of information processing advance by leaps and bounds. In order to realize the processing of information editing, searching and data analysis, it is of great significance to quickly input the text information of paper materials into the computer, and OCR (Optical Character Recognition) technology was born from this. Document images widely exist in various fields such as transportation, finance, logistics, taxation, and administrative management. With the rapid popularization of smart terminals, automatic document recognition technology has great economic benefits and extensive social value. [0003] However, it is difficult for general OCR technology to provide structured...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/34G06K9/46
CPCG06V30/40G06V30/153G06V10/44
Inventor 王春恒贾馥溪赵晋媛肖柏华
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products