Keyword extraction method based on Seq2seq framework

A keyword and framework technology, applied in the field of natural language processing

Active Publication Date: 2019-08-13
ZHEJIANG UNIV OF TECH
View PDF5 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The method of the present invention well considers the deep-level semantics in the chief justice's text, combines the context information context, calculates the relevance of words, and better solves the problem of low-frequency words and generative task repetition, and improves the efficiency of keyword extraction. Accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword extraction method based on Seq2seq framework
  • Keyword extraction method based on Seq2seq framework
  • Keyword extraction method based on Seq2seq framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0072] The present invention will be described in detail below in conjunction with specific implementation examples, but the protection scope of the present invention is not limited to the following examples.

[0073] Such as figure 1As shown, the structure of the system includes: a data collection module, a data preprocessing module, a feature extraction module, a network training module, and a test evaluation module; Structuring data; data preprocessing module, deshort, deduplicated, and denoised low-quality data, and text word segmentation, keyword extraction and manual review as training corpus, word frequency statistics and sorting, text vectorization ; The feature extraction module uses the Seq2seq framework to create a sequence model, introduces the attention mechanism, and extracts the characteristics of keywords in the text; the network training module uses the cyclic neural network to train the input vector to obtain the final training model; the test evaluation modu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a keyword extraction method based on a Seq2seq framework. The method comprises: creating a sequence model by utilizing a Seq2seq framework; introducing an attention mechanism,extracting features of keywords in the text; fusing a pointer network model and a Cover mechanism at a decoding end to improve the attention distribution of potential keywords; then using a softmax loss function to train a network model, and finally in a model prediction stage, using a Beam Search cluster search algorithm to generate a key word sequence with the maximum probability to serve as a key word result set to obtain appropriate key words. According to the method, deep semantics in the long text are well considered, the word distribution probability is calculated in combination with the context information context, the problem of repetition of low-frequency words and generative tasks is better solved, and the keyword extraction accuracy is improved.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a keyword extraction method based on a Seq2seq framework. Background technique [0002] With the rapid development of mobile Internet technology, e-commerce, and social media, text data at this stage has shown explosive growth. According to market research surveys, the amount of global data doubles every two years, at such an alarming rate Growth will inevitably cause people the problem of information overload. At present, in this vast data universe, most of its components are unstructured text data. How to extract useful information from these text data and solve the problem of information overload has become an urgent need. [0003] As an important technology in text mining, keyword extraction is the basic and necessary work of information retrieval, text classification and recommendation system, and has become a research hotspot of experts and scholars. Text keywor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F17/27
CPCG06F40/205G06F18/2411G06F18/214
Inventor 孟利民郑申文蒋维应颂翔林梦嫚
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products