Document determination system, document determination device, and document determination method

The document determination system enhances OCR accuracy by using vectorized word vectors and positional information to identify document types, addressing the challenge of varying content in forms.

JP7876366B2Active Publication Date: 2026-06-19NTT DATA GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
NTT DATA GROUP CORP
Filing Date
2022-07-21
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing OCR technologies struggle to accurately identify the type of document based on character information, especially when the content varies significantly between predefined format items and user-generated entries, leading to difficulties in distinguishing between different types of forms.

Method used

A document determination system that utilizes character recognition, vectorized word vectors, and positional information to identify document types by calculating the probability of word vector occurrence in predefined areas, enhancing the accuracy of OCR by comparing these probabilities against pre-defined document information.

🎯Benefits of technology

The system effectively identifies document types by recognizing and converting words into vectorized forms, allowing for improved OCR accuracy by distinguishing between different document formats and user-generated content.

✦ Generated by Eureka AI based on patent content.
Patent Text Reader

Abstract

To provide a form determination system, a form determination apparatus, and a form determination method configured to properly identify the type of a form.SOLUTION: A form determination system SYS includes: a character recognition processing unit 102 which recognizes a character written on a form which is to be character recognition, and extracts character information and position information of the character written on the form; a character information analysis unit 103 which extracts a word from the character information extracted by the character recognition processing unit 102 and converts the extracted word into a vectorized word vector; and a form identifying unit 104 which identifies the type of the form to be subjected to character recognition, on the basis of form information in which information on word vectors included in each region of the form is defined for each type of the form and the word vector and the position information of the word extracted from the character information by the character information analysis unit.SELECTED DRAWING: Figure 1
Need to check novelty before this filing date? Find Prior Art