Keywords extraction method

A keyword and text technology, applied in the field of keyword extraction, can solve problems such as low efficiency

Inactive Publication Date: 2012-07-04
SANDA UNIVERSITY
View PDF0 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At first, the extraction of keywords was done manually, that is, to summarize its keywords after human reading. This method has a high accuracy rate, but the efficiency is very low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keywords extraction method
  • Keywords extraction method
  • Keywords extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] refer to figure 1 As shown, a method for extracting keywords according to the present invention is disclosed, the method extracts keywords from a piece of text, and the method includes:

[0033] S10. A word segmentation step, the word segmentation step divides a piece of text into words. In one embodiment, the word segmentation step includes extracting spaces, punctuation marks, and character strings from a piece of text as tokens for word segmentation. Word segmentation is the process of dividing words in a piece of text through a program. For English texts, word segmentation is relatively simple, as long as the strings in the middle of spaces or punctuation marks are extracted, preliminary words can be extracted.

[0034] S11. A stop word elimination step, removing stop words from the words divided in the word segmentation step. In one embodiment, the step of removing stop words includes looking up a stop word table, and among the words divided in the word segmenta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a keywords extraction method which is used for extracting keywords from a paragraph of words. The method comprises the following steps: paragraph segmentation, namely, segmenting the paragraph of characters into words; stop words elimination, namely, eliminating stop words from the words obtained through segmentation in the paragraph segmentation step; word characteristic reduction, namely, carrying out reduction on word characteristics of the words with the stop words extracted, wherein a stem analytical algorithm is adopted for reducing the words into the stem prototypes; and keywords determination, namely, determining the occurring frequency of each item prototype in the paragraph of words based on the stem prototypes, and determining the keywords based on the occurring frequency. With the adoption of the keywords extraction method, the keywords can be extracted from one paragraph of words rapidly and accurately, so that the keywords extracted by a computer are approximate to keywords obtained through understanding of a human brain to the furthest.

Description

technical field [0001] The invention relates to the technical field of data retrieval, in particular to a method for extracting keywords. Background technique [0002] Keyword (Keyword) refers to the vocabulary used when making and using the index, keyword search is one of the main methods of web search index. For example, the title or part of the title, subtitle, and author name of a book can be used as keywords for retrieval. Most of the books and online searches are now in the form of keyword searches. The content of keywords can be: people's names, websites, news, novels, software, games, constellations, work, shopping, papers, etc. For example, you can search for keywords such as "windows", "World Expo", "NBA basketball", and you can enter one, two, three, four, or even a sentence. For example, you can search for "landscapes," "mp3 downloads," and "suddenly looking back, that person is in a dimly lit place." [0003] In short, keywords are the main content of an art...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 王宵栋张丽晓
Owner SANDA UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products