Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for extracting keywords

A keyword and lexical analysis technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as low keyword accuracy and poor text keyword effect

Active Publication Date: 2012-11-14
CHINASO INFORMATION TECH
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, when using the existing technology to extract keywords, since many words with high word frequency are not necessarily keywords, when using the TF-IDF method, it is mainly used in the selection of index words in search engines and in the extraction of text keywords is less effective when used, thus resulting in a lower accuracy rate for identified keywords

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting keywords
  • Method and device for extracting keywords
  • Method and device for extracting keywords

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] Embodiments of the present invention provide a method for extracting keywords, such as figure 1 As shown, the method includes:

[0028] Step 101, obtaining the word set after lexical analysis and preprocessing of the text;

[0029] Optionally, the text is segmented and part-of-speech tagged, for example, for "Materialism-everyone who admits that existence, that is, matter is the first nature and the origin, while thinking is the second nature, is derived and attached to the existence of matter It is "materialism" for word segmentation and part-of-speech marking as: materialism / n- / weverything / d admits / v exists / v exists / p is / v matter / n is / v primary / n, / w is / v Primitive / n, / w and / c thinking / n is / v secondary / n, / w is / v is derived from / v comes out / v attaches / v to / p matter / n exists / v’s / u is / d is / v materialism / n. / w, where n represents a noun, w represents a punctuation mark, d represents an adverb, v represents a verb, and p represents a preposition.

[0030] Optionally, d...

Embodiment 2

[0047] Embodiments of the present invention provide a method for extracting keywords, such as figure 2 As shown, the method includes:

[0048] Step 201, obtaining a word set after lexical analysis and preprocessing of the text;

[0049]Optionally, the text is segmented and part-of-speech tagged, for example, for "Materialism-everyone who admits that existence, that is, matter is the first nature and the origin, while thinking is the second nature, is derived and attached to the existence of matter It is "materialism" for word segmentation and part-of-speech marking as: materialism / n- / weverything / d admits / v exists / v exists / p is / v matter / n is / v primary / n, / w is / v Primitive / n, / w and / c thinking / n is / v secondary / n, / w is / v is derived from / v comes out / v attaches / v to / p matter / n exists / v’s / u is / d is / v materialism / n. / w, where n represents a noun, w represents a punctuation mark, d represents an adverb, v represents a verb, and p represents a preposition.

[0050] Optionally, dif...

Embodiment 3

[0099] An embodiment of the present invention provides a device for extracting keywords, such as Figure 5 As shown, the device includes: an acquisition unit 501, a first processing unit 502, a second processing unit 503, and a keyword determination unit 504;

[0100] An acquisition unit 501, configured to acquire a word set after lexical analysis and preprocessing of the text;

[0101] Optionally, the text is segmented and part-of-speech tagged, for example, for "Materialism-everyone who admits that existence, that is, matter is the first nature and the origin, while thinking is the second nature, is derived and attached to the existence of matter It is "materialism" for word segmentation and part-of-speech marking as: materialism / n- / weverything / d admits / v exists / v exists / p is / v matter / n is / v primary / n, / w is / v Primitive / n, / w and / c thinking / n is / v secondary / n, / w is / v is derived from / v comes out / v attaches / v to / p matter / n exists / v’s / u is / d is / v materialism / n. / w, where n r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for extracting keywords, which belongs to the field of natural language processing. The accuracy of determined keywords can be improved. The scheme of the method comprises the steps of: obtaining a word set after lexical analysis and pre-treatment of text; determining the semantic similarity of any two words in the word set according to the word set and the semantic relation of the words in the word set in the text; calculating the comprehensive measure of words in the word set according to the semantic similarity determined; and determining the keywords according to the comprehensive measure of words. The scheme is suitable for extracting keywords.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a method and device for extracting keywords. Background technique [0002] At present, when keywords are usually extracted from texts based on statistical methods, statistical calculations are performed based on factors that have an important impact on keywords, and then the statistical results are sorted to determine a set of candidate keywords. [0003] For example, characteristic statistical information including term frequency and TF-IDF (term frequency-inverse document frequency, term frequency-inverse document frequency) may be used. When keywords are extracted based on word frequency, keywords are extracted according to the rule that the higher the word frequency of a word, the greater the probability that the word is a keyword. First, the word frequency of each word in the text is counted, and then the counted word frequency is sorted, and several words with th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
Inventor 翟周伟
Owner CHINASO INFORMATION TECH