Method and system for simultaneously abstracting document summarization and key words

A technology of document summarization and keywords, which is applied in the fields of instruments, calculations, electrical digital data processing, etc., can solve the problem of being unable to efficiently extract document summaries and keywords at the same time

Active Publication Date: 2009-04-01
PEKING UNIV +2
View PDF0 Cites 77 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These methods separate the two closely related tasks of document summarization and keyword extraction, and cannot efficiently and simultaneously extract document summaries and keywords.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for simultaneously abstracting document summarization and key words
  • Method and system for simultaneously abstracting document summarization and key words
  • Method and system for simultaneously abstracting document summarization and key words

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] Further illustrate the technical scheme of the present invention below in conjunction with embodiment and accompanying drawing:

[0078] Such as figure 1 As shown, a method for uniformly extracting document summaries and keywords includes the following steps:

[0079] (1) Read in the document, divide the document into sentences and words;

[0080] First divide the document into a single sentence, and get the sentence set S={s i |1≤i≤m}; Then segment each sentence, filter out stop words, and get the corresponding set of words T={t j |1≤j≤n}.

[0081] (2) Construct a sentence-sentence relationship graph G for the sentence set S SS ;

[0082] Treat each sentence as a graph G SS A vertex of , for any two different sentences s in S i and s j The content similarity value is calculated using the following cosine formula:

[0083] sim ( s i , s j ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method which extracts the abstracts and key words of a file at the same time, belonging to language words processing technique. The existing method takes the extraction of abstracts of the file and the extraction of the key words of the file as two irrelative tasks and respectively processes the two tasks which have the same nature; the method can utilize the same nature of the extraction and completes the extractions of the abstracts and key words at the same time. The method utilizes a figure learning model and comprehensively utilizes the relationships between sentences in the file, between the sentence and the words in the file, and between the words in the file, exactly evaluates the importance of the sentences and the words, and finally adopts the important sentences and words as the abstracts and key words of the file. The method can extract the abstracts and key words of the file at the same time on the one hand, and can gain a better effect of the extraction of abstracts and key words on the other hand; the method can be widely applied to the fields such as text information processing and digging and the like.

Description

technical field [0001] The invention belongs to the technical field of language and word processing and information retrieval, and in particular relates to a method for uniformly extracting document summaries and keywords. Background technique [0002] Both document summarization and keyword extraction automatically extract the essence or main points from a given document. The purpose of both is to provide users with concise content descriptions by compressing and refining the original text. The main difference between document summarization and keyword extraction is that document summarization is composed of sentences, while keywords are composed of words, that is to say, the extraction granularity of the two is different. Document summarization and keyword extraction are one of the core issues in the field of natural language processing, and are widely used in document / Web search engines, enterprise content management systems and knowledge management systems (such as Found...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 万小军杨建武吴於茜肖建国
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products