Keyword extracting method and device

An extraction method and keyword technology, which are applied in the field of keyword extraction methods and devices, can solve the problem of high redundancy in keyword extraction, and achieve the effect of solving semantic information redundancy.

Active Publication Date: 2017-04-26
NEUSOFT CORP
View PDF3 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the present invention provides a keyword extraction method and devic

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword extracting method and device
  • Keyword extracting method and device
  • Keyword extracting method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0055] The embodiment of the present invention provides a keyword extraction method, such as figure 1 As shown, the method includes:

[0056] 101. Acquire word vectors corresponding to each word in the target document.

[0057] Among them, the specific process of obtaining the word vectors corresponding to each word in the target document is as follows: first segment the target document, then filter meaningless words and stop word...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a keyword extracting method and device, and relates to the technical field of text processing. The method mainly aims at solving the problem that the keyword extracting redundancy is high. The main technical scheme disclosed by the invention is as follows: acquiring a word vector corresponding to each word in a target document; clustering the vectorized words according to a preset clustering algorithm to obtain a cluster, wherein the semantics of the words in the cluster are the same or similar to each other; computing a weighted value of each word in the cluster; and determining the word with the highest weighted value in the each cluster as the keyword of the target document. The method and device disclosed by the invention are mainly used for extracting the keyword from the target document.

Description

technical field [0001] The invention relates to the technical field of text processing, in particular to a keyword extraction method and device. Background technique [0002] Keyword extraction is to extract words or phrases that can reflect the main information of the text from a given text. Keyword extraction plays an important role in automatic summarization, text mining, and information retrieval, especially the key method for automatic labeling. [0003] At present, the keyword extraction method based on statistics mainly obtains keywords in the target document, such as extracting keywords from the target document by the method of Average Frequency*Proportional Document Frequency (ATF*PDF), that is, according to the word The average term frequency ATF in the entire document, and the proportional document frequency PDF of the term extract keywords. [0004] However, if there are multiple synonyms with high word frequency in the target document, such as "employee", "pers...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 王伟
Owner NEUSOFT CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products