Automatic character extraction and recognition system and method for low-resolution medical bill image

A low-resolution, automatic extraction technology, applied in the direction of character and pattern recognition, instruments, computer parts, etc., can solve the problems of text area pollution, character recognition rate reduction, single character recognition error rate, etc., and achieve the goal of improving the recognition rate Effect

Active Publication Date: 2016-06-08
HARBIN INST OF TECH
View PDF5 Cites 85 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Most receipt images contain one or more stamps. If the stamps are not processed, the text area covered by the stamps may be polluted, thereby reducing the recognition rate of characters
[0018] (2) There is no layout analysis of the bill image, and the semantics of each information area are not clear
[0019] (3) The same information extraction method is use

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic character extraction and recognition system and method for low-resolution medical bill image
  • Automatic character extraction and recognition system and method for low-resolution medical bill image
  • Automatic character extraction and recognition system and method for low-resolution medical bill image

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0060] Specific Embodiment 1: In this embodiment, a Windows-based medical bill recognition system is developed for the huge bill business in the medical industry. The main functions are the input and recognition of medical bill images and the collection of image feature information.

[0061] According to the characteristics of low resolution and various types of interference of medical bill images, this embodiment designs a device including four modules: image preprocessing, field segmentation, single character segmentation, and character recognition, in which:

[0062] The functions that the image preprocessing module needs to realize are: reduce the noise on the original receipt image to improve the recognition rate of individual characters, such as the shading of the background, and remove elements that do not need to be recognized in the original receipt image, such as seals, barcodes, and borders around the edge of the image. Large areas of noise, etc. In this embodiment,...

specific Embodiment approach 2

[0066] Specific Embodiment 2: This embodiment provides a method for automatic text extraction and recognition of low-resolution medical bill images. The overall processing flow is divided into the following four steps: preprocessing of bill images, field area recognition, character string segmentation and Character recognition and verification.

[0067] Step 1. Preprocessing of bill image

[0068] General description of the implementation: In principle, the method of processing the elements that do not need to be recognized in the original bill image is to use the method of filling the background color of the bill image. Since the noise position on the edge of the original bill image is relatively fixed, this area can be filled with the background Color to achieve the effect of noise removal, and in the feasibility analysis stage, by analyzing the color parameters of the color pixels that make up the stamps and form lines, you can use the range rules of its color parameters to...

specific Embodiment approach 3

[0149] Specific embodiment three: the bill image processed in this embodiment is "Beijing Medical Outpatient Charge Bill", such as Figure 5 shown.

[0150] In the specific implementation process, the scanning device is required to be the current mainstream flatbed scanner when collecting images, and a scanner with automatic image cropping function is recommended, such as the Fujitsu fi-5220c high-speed scanner. When scanning, try to make the four sides of the check image Parallel to the scanning frame of the scanner, the receipt image generated by scanning needs to have the following characteristics:

[0151] 1. Color images with image resolution above 200dpi;

[0152] 2. The width of the image is greater than 1500 pixels, and the height is greater than 650 pixels (the default image size and coordinates in the following text are pixels);

[0153] 3. The image storage format is one of 24-bit JPG format, tiff format, and 256-color bmp format;

[0154] 4. All bill faces in th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic character extraction and recognition system and method for a low-resolution medical bill image. The system comprises an image preprocessing module, a field segmenting module, a single character segmenting module and a character recognizing module. The method comprises the steps of image preprocessing, field area recognizing, character string segmenting and character recognizing and verifying. The automatic character extraction and recognition system and method can be better suitable for automatic character extraction and recognition of the low-resolution medical bill image. The information can be fully utilized by performing layout analysis on a bill. For the image of which the image quality is low and the noise and the image resolution influence are very high, a character string is conveniently segmented into single characters through the semanteme of each field area, and then recognition on the image is converted into recognition on the single characters; for example, an invoice number composed of pure numbers can be recognized through a method special for processing an image only containing numbers, and when the invoice number is recognized, the recognizing range is limited within ten numbers from 0 to 9, and therefore the recognition rate can be greatly increased.

Description

technical field [0001] The invention relates to a system and method for automatically extracting and identifying medical bill information. Background technique [0002] There are a large number of paper medical bills in hospitals and community clinics, and these medical bills are statistical information used by hospitals and community clinics to settle expenses. However, for a long time, a series of troubles and problems caused by the backward management of medical bills in hospitals and community outpatient clinics have plagued hospital managers. In terms of processing medical bill information, the vast majority of hospitals and almost all community outpatient clinics are still at the stage of "manual decentralized processing + paper-based warehouse storage + manual query update", which has become a hindrance to the development of medical industry informatization A big source. Therefore, in order to solve this weak link, using a "centralized, unified, efficient and standa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/20G06K9/40
CPCG06V30/412G06V10/22G06V10/30
Inventor 苏统华涂志莹周圣杰曹源江周靖淳周韬宇孙黎
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products