Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Deep learning-based blog text abstract generation method

A technology of deep learning and summarization, which is applied in the field of blog text summarization based on deep learning, can solve problems such as unobvious application, and achieve the effect of wide application prospects

Active Publication Date: 2017-07-25
SUZHOU INST FOR ADVANCED STUDY USTC
View PDF3 Cites 95 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The generation of text summaries in natural language is mainly divided into two methods: the first extraction type, text summaries based on rules and statistics, has been proved by a large number of practical applications; the second is abstract type, based on deep learning model summarization generation, It has been greatly improved in 2014, from mechanical text summarization to comprehensible text summarization generation, currently using the encoder-decoder framework and embedding a recurrent neural network to achieve it, the application in Chinese is not obvious

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep learning-based blog text abstract generation method
  • Deep learning-based blog text abstract generation method
  • Deep learning-based blog text abstract generation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0060] A method for generating Chinese blog summaries based on deep learning, the specific steps include:

[0061] Step 1. Blog training data crawling and sorting

[0062] The blog training data is crawled from the popular blogs on the csdn website. The content of the blogs obtained is diverse, but they are all highly professional texts. At the same time, there are some defects in the blog training data. For example, the blog is too short, there is no text in the blog, only Contains videos and pictures, we will discard this kind of text.

[0063] Use find and get_text in beautifulsoup to get the final blog text and select the text content of the web page tag category as article_description as the actual blog summary. If the blog does not have an abstract, the title of the expert blog and the sentence with the largest weight selected through textRank will be combined as the actual abstract of the blog, which will be used during training.

[0064] The textRank method is a text...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a deep learning-based blog text abstract generation method. The method comprises the following steps of: crawling blog data; preprocessing the crawled blog data and selecting blog text data; converting the selected blog text data into vector matrix data according to a Chinese word vector dictionary; constructing a deep learning encoder-decoder model, separately training an encoder and a decoder of the model, and connecting the encoder and the decoder for use after the training is completed; and repeating the steps S01 to S03 to obtain generated data, and generating a predicted abstract from the generated data through the trained model. According to the method, text abstracts of blogs are automatically generated on the basis of a deep learning frame encoder-decoder, and deeper semantic relation of the blogs can be obtained at the same time. The generated text abstracts can visually display the main content of the current blog, so that the text abstracts have wide application prospect.

Description

technical field [0001] The present invention relates to a method for generating text abstracts, in particular to a method for generating blog text abstracts based on deep learning. Background technique [0002] Natural Language Processing (NLP) is a particularly important part of artificial intelligence. It includes multiple sub-tasks such as text classification, sentiment analysis, machine translation, and reading comprehension. Almost one sub-task is a very important professional research field. are independent and interrelated. [0003] Deep learning is a new type of end-to-end learning method proposed in recent years. In ordinary processing tasks such as classification, the effect of ordinary neural networks may be almost the same, but in the process of high-dimensional data calculation and feature extraction, depth Learning to use deep networks to fit shows its powerful computational capabilities. At present, deep learning has been applied to many fields-image process...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06N3/08
CPCG06N3/08G06F16/3335G06F16/345G06F16/35
Inventor 杨威周叶子黄刘生
Owner SUZHOU INST FOR ADVANCED STUDY USTC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products