Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A system and method for automatic text extraction and recognition of low-resolution medical bill images

A low-resolution, automatic extraction technology, applied in character and pattern recognition, instruments, computing, etc., can solve the problems of text area pollution, character recognition rate reduction, and character recognition accuracy rate, and achieve the effect of improving the recognition rate

Active Publication Date: 2019-03-01
HARBIN INST OF TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Most receipt images contain one or more stamps. If the stamps are not processed, the text area covered by the stamps may be polluted, thereby reducing the recognition rate of characters
[0018] (2) There is no layout analysis of the bill image, and the semantics of each information area are not clear
[0019] (3) The same information extraction method is used for all information areas, and the semantic constraints of the area are not effectively used, and the accuracy of character recognition is low
[0020] (4) In the single-character recognition process, the semantic information of the field where the character is located is not fully utilized, which may lead to a high single-character recognition error rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A system and method for automatic text extraction and recognition of low-resolution medical bill images
  • A system and method for automatic text extraction and recognition of low-resolution medical bill images
  • A system and method for automatic text extraction and recognition of low-resolution medical bill images

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0060] Specific Embodiment 1: In this embodiment, a Windows-based medical bill recognition system is developed for the huge bill business in the medical industry. The main functions are the input and recognition of medical bill images and the collection of image feature information.

[0061] According to the characteristics of low resolution and various types of interference of medical bill images, this embodiment designs a device including four modules: image preprocessing, field segmentation, single character segmentation, and character recognition, in which:

[0062] The functions that the image preprocessing module needs to realize are: reduce the noise on the original receipt image to improve the recognition rate of individual characters, such as the shading of the background, and remove elements that do not need to be recognized in the original receipt image, such as seals, barcodes, and borders around the edge of the image. Large areas of noise, etc. In this embodiment,...

specific Embodiment approach 2

[0066] Specific Embodiment 2: This embodiment provides a method for automatic text extraction and recognition of low-resolution medical bill images. The overall processing flow is divided into the following four steps: preprocessing of bill images, field area recognition, character string segmentation and Character recognition and verification.

[0067] Step 1. Preprocessing of bill image

[0068] General description of the implementation: In principle, the method of processing the elements that do not need to be recognized in the original bill image is to use the method of filling the background color of the bill image. Since the noise position on the edge of the original bill image is relatively fixed, this area can be filled with the background Color to achieve the effect of noise removal, and in the feasibility analysis stage, by analyzing the color parameters of the color pixels that make up the stamps and form lines, you can use the range rules of its color parameters to...

specific Embodiment approach 3

[0149] Specific embodiment three: the bill image processed in this embodiment is "Beijing Medical Outpatient Charge Bill", such as Figure 5 shown.

[0150] In the specific implementation process, the scanning device is required to be the current mainstream flatbed scanner when collecting images, and a scanner with automatic image cropping function is recommended, such as the Fujitsu fi-5220c high-speed scanner. When scanning, try to make the four sides of the check image Parallel to the scanning frame of the scanner, the receipt image generated by scanning needs to have the following characteristics:

[0151] 1. Color images with image resolution above 200dpi;

[0152] 2. The width of the image is greater than 1500 pixels, and the height is greater than 650 pixels (the default image size and coordinates in the following text are pixels);

[0153] 3. The image storage format is one of 24-bit JPG format, tiff format, and 256-color bmp format;

[0154] 4. All bill faces in th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a system and method for automatic text extraction and recognition of low-resolution medical bill images. The system includes four modules: an image preprocessing module, a field segmentation module, a single character segmentation module, and a character recognition module. The method described includes four steps: image preprocessing, field area recognition, character string segmentation, and character recognition and verification. The present invention can be better applicable to automatic text extraction and recognition of low-resolution medical bill images. By analyzing the layout of the bills, this information can be fully utilized. For images with low image quality, noise and image resolution have a great impact, using the semantics of each field area helps to divide the string into individual characters, which can be converted into single-character recognition. For example, for an invoice number composed of pure numbers, you can use a method specially designed to process images with only numbers; when recognizing, limit the recognition range to ten numbers from 0 to 9, and the recognition rate can be greatly improved.

Description

technical field [0001] The invention relates to a system and method for automatically extracting and identifying medical bill information. Background technique [0002] There are a large number of paper medical bills in hospitals and community clinics, and these medical bills are statistical information used by hospitals and community clinics to settle expenses. However, for a long time, a series of troubles and problems caused by the backward management of medical bills in hospitals and community outpatient clinics have plagued hospital managers. In terms of processing medical bill information, the vast majority of hospitals and almost all community outpatient clinics are still at the stage of "manual decentralized processing + paper-based warehouse storage + manual query update", which has become a hindrance to the development of medical industry informatization A big source. Therefore, in order to solve this weak link, using a "centralized, unified, efficient and standa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00G06K9/20G06K9/40
CPCG06V30/412G06V10/22G06V10/30
Inventor 苏统华涂志莹周圣杰曹源江周靖淳周韬宇孙黎
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products