Unlock instant, AI-driven research and patent intelligence for your innovation.

Document characterization method and device based on deep learning, equipment and storage medium

A deep learning and characterization technology, applied in neural learning methods, instruments, biological neural network models, etc., can solve problems such as loss of information, inability to apply polysemous words, and failure to consider other text data, etc., to achieve the effect of improving accuracy

Active Publication Date: 2021-06-18
SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
View PDF10 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, Author2Vec and Cite2Vec do not consider other text data, and the conversion of author vectors to document vectors in Author2Vec is relatively simple and rude, and a lot of information is lost, while the abstract in Cite2Vec uses Word2Vec to extract information, which cannot be applied to polysemy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document characterization method and device based on deep learning, equipment and storage medium
  • Document characterization method and device based on deep learning, equipment and storage medium
  • Document characterization method and device based on deep learning, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047]Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the specific embodiments set forth herein. Rather, the embodiments are provided to explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications as are suited to particular intended uses. In the drawings, the same reference numerals will be used to denote the same elements throughout.

[0048] The literature itself is a collection of various data, which can be roughly divided into two categories: text data and picture data. Among them, text data includes title, author list, keywords, abstract, text, citations and citations, while picture data mainly includes Including paper illustrations, various data forms brin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a document characterization method and device based on deep learning, equipment and a storage medium, and the method comprises the steps: analyzing a to-be-characterized document, and obtaining a keyword, an author list and multiple pieces of text information of the to-be-characterized document; inputting each piece of text information and the keyword into a network model combined with a keyword attention mechanism to obtain a first feature vector of each piece of text information; sequentially inputting the author list and each piece of text information into a first feature extraction model to obtain a second feature vector of the author list and each piece of text information; and inputting the first feature vector and the second feature vector into a fusion network model for fusion to obtain a representation vector of the to-be-represented literature. According to the literature characterization method, keyword information is fully utilized, multiple pieces of text data of the literature are considered at the same time, and different feature extraction methods are adopted for different pieces of text data, so that the literature vectorization characterization precision is effectively improved.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a deep learning-based document representation method, device, equipment, and storage medium. Background technique [0002] The rapid growth of the number of documents poses a huge challenge to the current scientific research workers. How to quickly screen high-quality documents and how to quickly understand and analyze the documents are problems that scientific researchers need to solve urgently. Professional scientific researchers generally solve this problem by classifying, retrieving, recommending, and automatically generating abstracts of documents. In the above-mentioned document processing tasks, document representation (Paper Representation) is an indispensable first step. . In short, document characterization is to generate a mathematical vectorized expression for each document, convert unstructured data documents into structured vectors, and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/205G06F40/289G06F40/30G06N3/04G06N3/08
CPCG06F40/205G06F40/289G06F40/30G06N3/04G06N3/08
Inventor 程章林杨之光奥利夫·马丁·多伊森潘光凡
Owner SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI