Method for extracting key words of single text

An extraction method and keyword technology, applied in the field of single text keyword extraction, can solve the problems of not being able to maintain the domain characteristics, reducing the accuracy of single text keyword extraction, and reducing the quality of single text keyword extraction. The effect of calculating errors, improving domain characteristics, and improving extraction accuracy

Inactive Publication Date: 2011-02-09
SHANGHAI UNIV
View PDF2 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] (1) The domains and total number of texts involved in the irrelevant domain corpus in the TFIDF method will reduce the quality of keyword extraction for a single text
[0008] (2) Since the inverse document frequency of a word is inversely proportional to the frequency of words appearing in the corpus, the TFIDF word weight will tend to be low-frequ

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting key words of single text
  • Method for extracting key words of single text
  • Method for extracting key words of single text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] Embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

[0034] The embodiment of the present invention extracts single text keywords from 243 news webpages in the environmental field of Reuters from 2008 to 2009. Such as figure 1 Shown, the extraction method of a kind of single text keyword of the present invention, its steps are as follows:

[0035] S1. Open a single text in the field corpus, for example, open a single webpage text in the news webpage text collection in the environmental field of Reuters;

[0036] S2. Text content preprocessing, for example, word segmentation and part-of-speech tagging for the text content of the webpage;

[0037] S3. Extract meaningful content words, such as nouns and verbs;

[0038] S4. Count the word frequency of all content words, recorded as TF t ;

[0039] S5. Open all the texts in the field collection, for example, open all the texts of the webpages in the news w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for extracting key words of single text, especially comprising the following steps: (1) opening the single text in the field collection; (2) pre-processing the content of the text; (3) extracting the meaningful notional word; (4) making statistic of the word frequency of the notional word; (5) opening all the texts in the field collection; (6) making statistic of the message frequency of the notional word in the field collection; (7) making statistic of the returning pages of search engine retrieving the notional word; (8) using the developed TFIDF word right formula to calculate the weights of all the notional words in the single text to extract a certain percentage of the key words. Besides, the method can compensate the insufficient of the TFIDF algorithm and can prevent the impacts of the irrelevant field connection to extract the key words, thereby improving the extracting precision of the key words and maintaining the field features of the extracting result for the key words.

Description

technical field [0001] The invention relates to a method for extracting keywords of a single text, in particular to an improved method for extracting keywords of a single text in a field corpus using the TFIDF method. Background technique [0002] Single text keywords are the basic elements of text representation in text knowledge flow generation, semantic chain network construction, text context complexity and information volume. The extraction accuracy of keywords in a single text directly affects the quality and effect of text information processing such as text classification, clustering, word association analysis, automatic text summarization, text filtering, information retrieval, topic detection, and web page annotation. At present, the research on single text keyword extraction technology mainly includes: TFIDF method, naive Bayesian classification method, mutual information method, maximum entropy model method, maximum likelihood and prefix tree method, etc. [000...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 骆祥峰梁国宁殷晓波张顺香徐炜民
Owner SHANGHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products