Document image classification method and device

A document image and classification method technology, applied in the field of computer vision, can solve problems such as difficulty in obtaining multiple styles, impossibility of training, and complexity, and achieve the effects of improving the accuracy of distinction, facilitating promotion and use, and easy expansion

Active Publication Date: 2019-10-01
BEIJING YIDAO BOSHI TECH
View PDF10 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1. Difficult to obtain many styles: There are too many types of document images, and different document types in different fields. It is impossible to collect all of them for training, and sometimes they are added later and cannot be obtained in advance. Some documents are confidential Yes, unable to train without desensitization
[0005] 2. Complicated collection methods: With the popularization of collection devices such as mobile phones, tablets, cameras, scanners

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document image classification method and device
  • Document image classification method and device
  • Document image classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0060] The exemplary embodiments will be described in detail here, and examples thereof are shown in the accompanying drawings. When the following description refers to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present disclosure. Rather, they are only examples of devices and methods consistent with some aspects of the present disclosure as detailed in the appended claims.

[0061] The terms “first”, “second”, etc. in the specification and claims of the present disclosure are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, so that the embodiments of the present disclosure described herei...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document image classification method and device, and belongs to the field of computer vision. The method comprises: training a text feature vector extraction model and an image feature vector extraction model respectively, extracting a fusion feature vector of a document image in an embedded feature mode that text feature vectors and image feature vectors are fused, and classifying the document image based on the similarity of the fusion feature vector. According to the method, various document images can be quickly registered and classified, the service process can be greatly simplified, the OCR API can be simplified, all documents can be recognized through one API, and permanent use of one-time access is truly achieved.

Description

technical field [0001] The invention relates to the field of computer vision, in particular to a document image classification method and device. Background technique [0002] In all walks of life, there are still many paper documents that need to be saved, processed, retrieved, etc., especially in the financial field such as banking, securities, insurance, mutual funds, finance, taxation and other industries. In the past, the digitization of these paper documents was generally manually entered. With the continuous popularization of OCR technology, many industries have gradually adopted OCR recognition technology instead of manual entry, which has greatly improved work efficiency. But at present, the prerequisite for good OCR recognition and structuring is that the category of the document needs to be clearly known, otherwise it is difficult to have a good structured result. In addition, in many occasions such as bank counters, the current application is that the user must ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/20G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06V10/22G06N3/045G06F18/241
Inventor 朱军民王勇康铁钢
Owner BEIJING YIDAO BOSHI TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products