Chinese business card OCR (optical character recognition) data correction system utilizing massive associated information of knowledge base

A technology for data correction and correlation information, applied in the fields of data cleaning and optical character recognition, it can solve the problems of blurred aggregation, low illumination, limited error correction effect, etc.

Inactive Publication Date: 2014-07-16
JIANGSU WEISHI TECH
View PDF3 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the current OCR technology itself is affected by many unfavorable factors, such as low illumination, low pixels, image noise, angle tilt, aggregation blur, etc., resulting in a low final recognition rate
There are basically two ways to correct the OCR results. One is to start from the image itself and try to weaken or eliminate the influence of the adverse environment through image

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese business card OCR (optical character recognition) data correction system utilizing massive associated information of knowledge base
  • Chinese business card OCR (optical character recognition) data correction system utilizing massive associated information of knowledge base
  • Chinese business card OCR (optical character recognition) data correction system utilizing massive associated information of knowledge base

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] The present invention will be further described below in conjunction with specific drawings and embodiments.

[0061] like figure 1 As shown, the business card OCR data correction system provided by the present invention is composed of eight modules: image acquisition module, image standardization processing module, block extraction module, OCR module, knowledge base module, data correction module, incremental maintenance module, and result display module; where:

[0062] 1. Image acquisition module: the purpose is to input the business card photos into the computer or smart phone in the form of digital images through the acquisition device. For the data correction system built on the computer, the acquisition device is composed of a camera (or camera) and an image acquisition card; for the system built on the smart phone, it is simpler, only the phone has its own camera.

[0063] 2. The image standardization processing module uses a variety of image processing techno...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Chinese business card OCR (optical character recognition) data correction system utilizing massive associated information of a knowledge base. The Chinese business card OCR data correction system comprises an image collection module, an image standardized processing module, a block extracting module, an OCR module, a knowledge base module, a data correction module, a gain maintaining module and a result displaying module. The system is characterized in that to-be-corrected data are labeled by subjecting recognition results of the OCR module to information structuralized processing; address and organization name associated information is corrected by utilizing the massive associated information of the knowledge base module and combing a series of techniques like Chinese word segmentation, importance weighting based on the knowledge base, similarity comparison based on texts and images and information integration to improve accuracy; corrected OCR results are output and displayed. In addition, the gain maintaining module of the system performs information maintaining on the knowledge base in a semiautomatic manner to meet needs of continually-growing of information quantity.

Description

technical field [0001] The invention belongs to the technical field of optical character recognition and the field of data cleaning, and in particular relates to a Chinese business card OCR data correction system based on massive associated information in a knowledge base. Background technique [0002] With the development of optical scanning, computer image processing, pattern recognition and other technologies, OCR technology has gradually matured, and its successful application in many aspects has also brought convenience to people's work and life, and business card OCR recognition is one of the most In a representative application, the user only needs to transfer the photo of the business card to the computer or smart phone, and use the corresponding OCR software to recognize it, and then store the recognized text information, which saves people from using the traditional Ways to deal with the troubles caused by business cards, such as the inconvenience of carrying busin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/20
CPCG06V30/40G06V10/98
Inventor 王晓平肖仰华汪卫
Owner JIANGSU WEISHI TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products