Keyword extracting method based on seq2seq (sequence to sequence) deep neural network model

A deep neural network and keyword technology, applied in the computer field, can solve problems such as affecting the accuracy of keywords, affecting the accuracy of keyword extraction, and unable to predict keywords, so as to expand the scope of investigation, improve accuracy, and expand search range effect

Inactive Publication Date: 2018-08-07
SUN YAT SEN UNIV
View PDF4 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In the search keywords, half of the keywords do not come from the source document, and the existing keyword extraction technology can only select candidate keywords from the source document, so it is impossible to predict the keywords that are not in the source document, and cannot The synonyms of words in the document are used as keywords, which greatly affects the accuracy of keyword extraction
[0007] At the same time, the existing keyword extraction technology can only select candidate words from a vocabulary of a certain size. When the size of document words far exceeds the size of the vocabulary, the inability to predict words outside the vocabulary will affect the accuracy of keyword extraction.
[0008] Existing keyword extraction methods usually consider the features obtained by machine learning when selecting candidate keywords. However, these features can only discover the importance of each word by counting the frequency of words in the document, and cannot reveal hidden keywords. full semantics in document content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword extracting method based on seq2seq (sequence to sequence) deep neural network model
  • Keyword extracting method based on seq2seq (sequence to sequence) deep neural network model
  • Keyword extracting method based on seq2seq (sequence to sequence) deep neural network model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0074] The selected text is input in the keyword extraction system of the present invention, and the keyword extraction experiment is carried out, such as figure 2 As shown, "Towards content-based relevance ranking for video search. Most existing webvideo search engines index videos by file names, URLs, and surrounding texts. These type of video metadata roughly describe the whole video at an abstract level without taking the rich content, such as semantic content descriptions and speech within the video. In this paper we propose a novel relevance ranking approach for Web-based video search using both videometadata and rich content. To leverage real content into ranking, the videos are segmented into shots, which are smaller -meaningful retrievable units.With video metadata and content information of shots, we developed an integrated ranking approach, which achieves improved ranking performance.” After word segmentation and part-of-speech tagging, set the default reserved part...

Embodiment 2

[0078] Comparing multiple existing keyword extraction algorithms, using the F value as the performance index, predicting the top 5 and 10 keywords, the results are as follows. It can be seen that our proposed keyword extraction algorithm and model (CopyRNN cyclic neural network with copy mechanism) perform best on each data set.

[0079]

Embodiment 3

[0081] The extraction experiment is carried out for keywords other than the source document. Since other algorithms cannot predict the keywords other than the source document, it is only compared with the algorithm using the traditional cyclic neural network to predict the top 10 and 50 keywords, and Taking the recall rate as the evaluation index, the results are as follows. It can be seen that the keyword extraction algorithm and model (CopyRNN) we proposed have a higher recall rate on each data set, indicating that the algorithm can more accurately predict keywords other than the source document.

[0082]

[0083] It can be seen that the keyword extraction system proposed by this invention can not only extract keywords existing in the source document, but also have a good prediction effect on keywords outside the source document. Compared with the existing keyword extraction technology, this The results achieved by the invention system are more reasonable and efficient.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of computers, in particular to a keyword extracting method based on a seq2seq (sequence to sequence) deep neural network model. The keyword extracting method comprises the following steps: first, extracting target information through a pre-processing module; then, converting and labeling the target information through a word vector conversion module and a part-of-speech labeling module; next, obtaining a candidate word sequence through a candidate word weight calculating module; finally, obtaining a keyword through a candidate word screening module. Accordingto the keyword extracting method based on the seq2seq deep neural network model, the importance of each word to a document can be analyzed better and the keyword which can represent the theme of thedocument better is selected by regarding a document vector as the average of a word vector, and combining the word vector and the document vector to serve as the vector representation of a word; meanwhile, the investigation scope of keyword extracting is expanded; the defect that keywords beyond a vocabulary and the keywords which are not in the contents of a source document cannot be predicted byan existing extracting technology is overcome.

Description

technical field [0001] The invention relates to the field of computers, in particular to a keyword extraction method based on a seq2seq deep neural network model. Background technique [0002] With the development of computer and network technology and the advent of the era of big data, digital files are growing at an alarming rate, and a large amount of information that humans come into contact with exists in the form of electronic documents. Faced with such a vast amount of information, people urgently need machines that can automatically identify the keywords that best represent the main content of the article, help us understand the main content of the article faster, and save time in reading, processing and utilizing these electronic documents. [0003] These technologies are currently called Keyword Extraction (Keyword Extraction). Keyword extraction refers to quickly obtaining multiple words or phrases from a document that can represent the subject of the document as ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06K9/62G06N3/04
CPCG06F40/289G06F40/30G06N3/044G06F18/2415
Inventor 李弘艺
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products