Document classification method and device and electronic equipment

A document classification and document technology, which is applied in the field of data processing, can solve the problems of the quantity, insufficiency, generality and unsatisfactory effect of labeling data due to the non-generalization of rules, so as to improve the generality and generalization ability, and achieve good classification. As a result, the effect of avoiding generalization reduction

Pending Publication Date: 2021-10-26
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Text classification technology is generally based on artificial rules or a small amount of labeled data for learning. Although these methods can solve some document analysis and extraction tasks, their versatility and effectiveness are often not as good due to the non-generalization of the rules and the insufficient amount of labeled data. satisfactory
Currently, there is a lack of general-purpose document classification methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document classification method and device and electronic equipment
  • Document classification method and device and electronic equipment
  • Document classification method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0121] Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0122] Traditional document analysis and extraction techniques are mostly based on artificial rules or learning from a small amount of labeled data in the dataset. Although the traditional document analysis methods can solve some document analysis and extraction tasks, due to the non-generalization of artificial rules and the labeling The amount of data is insufficient,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document classification method and device and electronic equipment, and relates to the technical field of data processing, in particular to the document classification method and device and the electronic equipment. According to the specific implementation scheme, the method comprises the following steps: acquiring image blocks and word blocks according to a to-be-identified document; inputting the image blocks and the word blocks into a pre-training migration model, and obtaining visual representation and text representation; and acquiring the category of the to-be-identified document according to the visual representation and the text representation. According to the embodiment of the invention, the visual representation and the text representation of the to-be-identified document are extracted, so that the classification of the text in the document is realized. According to the embodiment of the invention, the uncertainty of manual labeling can be avoided, and the accuracy of document classification is improved.

Description

technical field [0001] The present disclosure relates to the technical field of data processing, and in particular to a document classification method, device and electronic equipment. Background technique [0002] With the rapid development of the Internet and the advent of the era of big data, text mining technologies such as text classification are applied in more and more fields. Text classification technology is generally based on artificial rules or a small amount of labeled data for learning. Although these methods can solve some document analysis and extraction tasks, due to the non-generalization of the rules and the insufficient amount of labeled data, their versatility and effectiveness are often not as good as Satisfactory. Currently, there is a lack of general-purpose document classification methods. Contents of the invention [0003] The disclosure provides a document classification method, device, system and storage medium. [0004] According to a first a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06K9/62G06K9/00
CPCG06F16/353G06F18/24
Inventor 罗斌曹宇慧彭启明冯仕堃
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products