Supercharge Your Innovation With Domain-Expert AI Agents!

OCR (Optical Character Recognition) result classification method based on knowledge graph

A technology of recognition results and knowledge graphs, applied in the field of OCR recognition result classification based on knowledge graphs, can solve problems such as inconvenient retrieval, low level of classification refinement, ignoring category hierarchical relationships, etc., and achieve a wide range of applications

Pending Publication Date: 2021-08-06
XIDIAN UNIV
View PDF8 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Both schemes realize the automatic classification of OCR recognition results to a certain extent, but both schemes have shortcomings: the first one ignores the hierarchical relationship between categories, and the degree of refinement of classification is not high
The second is to extract and store the category information of the text, ignoring other information in the text, which is not convenient for text retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • OCR (Optical Character Recognition) result classification method based on knowledge graph
  • OCR (Optical Character Recognition) result classification method based on knowledge graph
  • OCR (Optical Character Recognition) result classification method based on knowledge graph

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The present invention will be specifically introduced below in conjunction with the accompanying drawings and specific embodiments.

[0046] like figure 1 As shown, a method for classifying OCR recognition results based on a knowledge map provided by the present invention includes the following steps:

[0047] S1. Construct the ontology of the OCR recognition result knowledge map, such as figure 2 As shown, it specifically includes the following steps:

[0048] S11. Specify the domain of the target text, and collect classification information of the text in the domain, including the name of each category, the attribute name of the text, and the like.

[0049] S12. Define the concepts in the ontology, define the levels between the concepts, and define the attributes of the concepts. The schematic diagram of the ontology structure after attributes and concepts are defined is as follows: Figure 5 shown. Figure 5 A is the field of text classification, B1, B2, B3, B4, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an OCR (Optical Character Recognition) result classification method based on a knowledge graph. The method comprises the following steps: constructing an ontology of an OCR result knowledge graph; constructing a text classification model and a named entity extraction model to form a classifier; and constructing a knowledge graph based on an OCR recognition result and the classifier constructed in the step S2. According to the method, the ontology is constructed according to the text classification information in the specific field, the text classifier is constructed based on the ontology, the classification and key information of the OCR software recognition result are extracted by utilizing the classifier, and the text knowledge graph is constructed, so that the purposes of automatic multi-level classification and key information extraction of the OCR recognition result are achieved. According to the method, automatic multi-level classification and key information extraction of the OCR recognition result can be realized, and the problems that the hierarchical relationship between categories is ignored and the classification refining degree is not high in the existing similar technology, and the problems that only the category information of the text is extracted and stored, other information in the text is ignored and text retrieval is inconvenient are solved.

Description

technical field [0001] The invention belongs to the technical field of image OCR (Optical Character Recognition, optical character recognition), and specifically relates to a method for classifying OCR recognition results based on a knowledge graph. Background technique [0002] OCR (Optical Character Recognition, Optical Character Recognition) technology refers to the use of electronic equipment (such as scanners or digital cameras) to convert paper documents into image files of black and white dot matrix, and determine the characters in the image by detecting dark and bright patterns. shape, and then use character recognition to translate the shape into computer text for further editing and processing by word processing software. OCR software refers to software that uses OCR technology to digitize paper documents, and is widely used in various fields of production and life. [0003] Units and institutions that require the use of OCR technology often have the characteristi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/36G06F40/295G06N5/02
CPCG06F16/35G06F16/367G06F40/295G06N5/022
Inventor 李向宁覃书农蔡宇旗廖永平肖凌峰李铭清赵君
Owner XIDIAN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More