Method for extracting key words of single text

An extraction method and keyword technology, applied in the field of single text keyword extraction, can solve the problems of not being able to maintain the domain characteristics, reducing the accuracy of single text keyword extraction, and reducing the quality of single text keyword extraction. The effect of calculating errors, improving domain characteristics, and improving extraction accuracy
CN101968801AInactive Publication Date: 2011-02-09SHANGHAI UNIV

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
SHANGHAI UNIV
Publication Date
2011-02-09
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a method for extracting key words of single text, especially comprising the following steps: (1) opening the single text in the field collection; (2) pre-processing the content of the text; (3) extracting the meaningful notional word; (4) making statistic of the word frequency of the notional word; (5) opening all the texts in the field collection; (6) making statistic of the message frequency of the notional word in the field collection; (7) making statistic of the returning pages of search engine retrieving the notional word; (8) using the developed TFIDF word right formula to calculate the weights of all the notional words in the single text to extract a certain percentage of the key words. Besides, the method can compensate the insufficient of the TFIDF algorithm and can prevent the impacts of the irrelevant field connection to extract the key words, thereby improving the extracting precision of the key words and maintaining the field features of the extracting result for the key words.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to a method for extracting keywords of a single text, in particular to an improved method for extracting keywords of a single text in a field corpus using the TFIDF method. Background technique

[0002] Single text keywords are the basic elements of text representation in text knowledge flow generation, semantic chain network construction, text context complexity and information volume. The extraction accuracy of keywords in a single text directly affects the quality and effect of text information processing such as text classification, clustering, word association analysis, automatic text summarization, text filtering, information retrieval, topic detection, and web page annotation. At present, the research on single text keyword extraction technology mainly includes: TFIDF method, naive Bayesian classification method, mutual information method, maximum entropy model method, maximum likelihood and prefix tree method, etc. [000...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More