Method for automatically abstracting Blog on basis of feature information

A technology of feature information and automatic summarization, which is applied in special data processing applications, instruments, electrical digital data processing, etc., and can solve problems such as inability to accurately reflect the content of articles

Inactive Publication Date: 2013-08-14
SUZHOU UNIV
View PDF2 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the study of automatic summarization, on the one hand, more diverse expressions and more complex paragraph structures have brought challenges to blog-oriented automatic summarization, but on the other hand, since Blog itself has added tags, comments, etc. additional information, which also provides the possibility to generate more accurate automatic summaries
Traditional search engine...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically abstracting Blog on basis of feature information
  • Method for automatically abstracting Blog on basis of feature information
  • Method for automatically abstracting Blog on basis of feature information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0069] The present invention will be described in detail below with reference to the accompanying drawings and in combination with embodiments.

[0070] An automatic blog summary method based on feature information, including the following steps:

[0071] 1. Sentence score based on feature information

[0072] 1) The feature information score of the entry

[0073] Use the word segmentation tool to perform word segmentation and part-of-speech tagging on the blog posts to be processed, and filter out numerals, quantifiers, prepositions and other words that do not express the meaning of the sentence. Record the entry set obtained after these preprocessing as . Then consider the following factors to score the entries in WS.

[0074] Blog post word frequency score: The contribution of word frequency information to entry weight is judged by TF-IDF, and the calculation method is as follows: .

[0075] Descriptive information of pictures: Introduce these descriptive informatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for automatically abstracting a Blog on the basis of feature information. The method includes steps of scoring sentences on the basis of the feature information; scoring attention of comments on the basis of latent semantics; and checking and merging abstract to obtain an abstract sentence set. The method has the advantages that the feature information of the Blog is sufficiently utilized, and focus of the attention in the comments is fused on the basis of the latent semantics, so that the reader-friendly abstract can be generated, and theme coverage and information redundancy are balanced by a process for checking the abstract; the problem of synonymous noise among comments and a text is solved by the aid of the relevance of the latent semantics; and the abstract generated by the method is reader-friendly and is high in accuracy.

Description

technical field [0001] The invention relates to the field of automatic summarization, in particular to a blog automatic summarization method based on feature information. Background technique [0002] With the rise of Web2.0, Blog, a new way of information dissemination and interaction, is becoming more and more popular, and its influence is also expanding day by day. It has been paid more and more attention by netizens and business circles. [0003] Facing the huge amount of blog information brought by the huge blog user scale, how readers find and read the content they are interested in has become a difficult problem. In the study of automatic summarization, on the one hand, more diverse expressions and more complex paragraph structures have brought challenges to blog-oriented automatic summarization, but on the other hand, since Blog itself has added tags, comments, etc. Additional information also provides the possibility to generate more accurate automatic summaries....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 赵朋朋鲜学丰陈明刘全崔志明
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products