Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A keyword and text technology, applied in the field of keyword extraction, can solve problems such as low efficiency
Inactive Publication Date: 2012-07-04
SANDA UNIVERSITY
View PDF0 Cites 14 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
At first, the extraction of keywords was done manually, that is, to summarize its keywords after human reading. This method has a high accuracy rate, but the efficiency is very low.
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment Construction
[0032] refer to figure 1 As shown, a method for extracting keywords according to the present invention is disclosed, the method extracts keywords from a piece of text, and the method includes:
[0033] S10. A word segmentation step, the word segmentation step divides a piece of text into words. In one embodiment, the word segmentation step includes extracting spaces, punctuation marks, and character strings from a piece of text as tokens for word segmentation. Word segmentation is the process of dividing words in a piece of text through a program. For English texts, word segmentation is relatively simple, as long as the strings in the middle of spaces or punctuation marks are extracted, preliminary words can be extracted.
[0034] S11. A stop word elimination step, removing stop words from the words divided in the word segmentation step. In one embodiment, the step of removing stop words includes looking up a stop word table, and among the words divided in the word segmenta...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more
PUM
Login to view more
Abstract
The invention discloses a keywords extraction method which is used for extracting keywords from a paragraph of words. The method comprises the following steps: paragraph segmentation, namely, segmenting the paragraph of characters into words; stop words elimination, namely, eliminating stop words from the words obtained through segmentation in the paragraph segmentation step; word characteristic reduction, namely, carrying out reduction on word characteristics of the words with the stop words extracted, wherein a stem analytical algorithm is adopted for reducing the words into the stem prototypes; and keywords determination, namely, determining the occurring frequency of each item prototype in the paragraph of words based on the stem prototypes, and determining the keywords based on the occurring frequency. With the adoption of the keywords extraction method, the keywords can be extracted from one paragraph of words rapidly and accurately, so that the keywords extracted by a computer are approximate to keywords obtained through understanding of a human brain to the furthest.
Description
technical field [0001] The invention relates to the technical field of data retrieval, in particular to a method for extracting keywords. Background technique [0002] Keyword (Keyword) refers to the vocabulary used when making and using the index, keyword search is one of the main methods of web search index. For example, the title or part of the title, subtitle, and author name of a book can be used as keywords for retrieval. Most of the books and online searches are now in the form of keyword searches. The content of keywords can be: people's names, websites, news, novels, software, games, constellations, work, shopping, papers, etc. For example, you can search for keywords such as "windows", "World Expo", "NBA basketball", and you can enter one, two, three, four, or even a sentence. For example, you can search for "landscapes," "mp3 downloads," and "suddenly looking back, that person is in a dimly lit place." [0003] In short, keywords are the main content of an art...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.