Decision tree model training method, and method and apparatus for determining data attributes in OCR result

A data attribute and decision tree technology, applied in the medical field, can solve problems such as low efficiency, time-consuming and labor-intensive, and achieve the effects of improving recognition efficiency, reducing consumption costs, and avoiding data attribute labeling

Active Publication Date: 2017-10-20
天方创新(北京)信息技术有限公司
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there is a problem that after identifying the medical data picture to be recognized by the optical character recognition algorithm, it is necessary to manually participate in further labeling opera

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Decision tree model training method, and method and apparatus for determining data attributes in OCR result
  • Decision tree model training method, and method and apparatus for determining data attributes in OCR result
  • Decision tree model training method, and method and apparatus for determining data attributes in OCR result

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0063] Example 1, assuming that in this embodiment, according to the acquired second label data, it is determined that the prediction accuracy rate of the decision tree model generated in the above embodiment is 98%, it means that the decision tree model generated in the above embodiment meets the requirements, Therefore, data annotation can be performed on the OCR recognition results of medical data pictures according to the decision tree model.

example 2

[0064] Example 2, assuming that in this embodiment, according to the obtained second label data, it is determined that the prediction accuracy rate of the decision tree model generated in the above embodiment is 46%, it means that the decision tree model generated in the above embodiment does not meet the requirements , that is, there are more error message texts (bad case) in the prediction results. Therefore, it is necessary to optimize the decision tree model to improve the prediction accuracy of the decision tree model.

[0065] Wherein, the optimization of the decision tree model can be specifically realized through the following steps: re-extracting new first feature information from the OCR result of the test medical data picture, and retraining the decision tree model. Preferably, the error message text can be obtained from the verification result, and new first feature information is re-extracted from the error message text, so that the decision tree model can be retr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a decision tree model training method, and a method and an apparatus for determining data attributes in an OCR result. The decision tree model training method comprises the steps of obtaining a sample medical data picture, and performing OCR on the sample medical data picture to generate a first OCR result, wherein the first OCR result is a two-dimensional character string array, and each column of data in the two-dimensional character string array is used for indicating data which belongs to a same attribute column; extracting first feature information of each piece of data in the first OCR result; obtaining first labeled data corresponding to each piece of the data in the first OCR result, wherein the first labeled data is used for indicating an attribute which each piece of the data belongs to; and performing training according to the first feature information and the first labeled data to generate a decision tree model used for determining the data attributes in the OCR result. According to the method, the purpose of automatically labeling the data attributes in the recognition result is achieved; the consumption cost in a to-be-recognized picture recognition process is effectively reduced; and the recognition efficiency is improved.

Description

technical field [0001] The present invention relates to the medical field, in particular to a method and device for training a decision tree model for determining data attributes in OCR recognition results, and a method and device for determining data attributes in OCR recognition results. Background technique [0002] Currently, the text in the picture can be recognized through an optical character recognition (Optical Character Recognition, OCR) algorithm. Among them, Optical Character Recognition (OCR) refers to the process of recognizing optical characters in pictures through image processing and pattern recognition technology, and translating optical characters into computer text. [0003] In the related technology, after the medical data picture to be recognized is recognized by the optical character recognition algorithm, the recognition result can be provided to the user, wherein the recognition result of the OCR algorithm for the medical data picture is a two-dimens...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/20G06K9/62
CPCG06V10/22G06F18/214
Inventor 周列淳岳智磊刘泓江岩
Owner 天方创新(北京)信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products