Unlock instant, AI-driven research and patent intelligence for your innovation.

A method, device, storage medium and electronic device for document content classification

A content and document technology, applied in the field of document content classification, can solve problems such as low efficiency and difficulty in mining the deep features of documents, and achieve the effect of solving the dimensional disaster

Active Publication Date: 2021-07-02
北京香侬慧语科技有限责任公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The traditional method is not only inefficient, but also difficult to mine the deep features of the text in the document because it requires human participation to understand and summarize the content of the document.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method, device, storage medium and electronic device for document content classification
  • A method, device, storage medium and electronic device for document content classification
  • A method, device, storage medium and electronic device for document content classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In describing the present invention, it should be understood that the terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", " Orientation or position indicated by "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. The relationship is based on the orientation or positional relationship shown in the drawings, and is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, therefore It should not be construed as a limitation of the present invention.

[0048]In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indica...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a method, device, storage medium and electronic equipment for classifying document content, wherein the method includes: determining the position information of each text data in the document content, and generating font discrete coding and font size discrete encoding of the text data Encoding; generate extended features of text data based on location information, font discrete coding and font size discrete coding of text data; determine semantic information of text data based on cyclic neural network; generate deep features of text data based on extended features and semantic information, and Features determine the category to which the text data belongs. Through the document content classification method, device, storage medium and electronic equipment provided by the embodiments of the present invention, the deep features of text data can be mined, and the problem of dimensionality disaster in data mining and classification can be solved; and discrete coding of fonts with discrete features can be used Discrete encoding of font size and font size is beneficial to distinguish different fonts and font sizes.

Description

technical field [0001] The present invention relates to the technical field of document classification, in particular, to a method, device, storage medium and electronic equipment for document content classification. Background technique [0002] With the application and development of information technology, people write and create more and more documents, and the text content in the documents is varied. Documents with a lot of content are generally divided into multiple levels, such as table of contents, title, text, and so on. [0003] For standardized documents, such as documents in word format, it is relatively easy to determine the text content of each level of the document; but the reality is that many documents do not have a uniform standard format. Due to the diversity of document content, it is difficult to simply judge the text content category in documents through artificially summarized rules. The traditional solution is to rely on manual grading; or, to class...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 任翔远
Owner 北京香侬慧语科技有限责任公司