Keyword extraction method and apparatus

An extraction method and keyword technology, applied in the video field, can solve problems such as insufficient comprehensiveness and inability to reflect the position information of words, and achieve the effect of improving the accuracy rate

Inactive Publication Date: 2016-08-24
LETV INFORMATION TECH BEIJING
View PDF5 Cites 59 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage is that it is simple and fast; the disadvantage is also obvious, the simple calculation o

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword extraction method and apparatus
  • Keyword extraction method and apparatus
  • Keyword extraction method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] figure 1 It is a technical flow chart of Embodiment 1 of the present invention, combined with figure 1 , a keyword extraction method in the embodiment of the present invention mainly includes the following steps:

[0028] Step 110: using a tokenizer to segment the text to obtain words, and filtering the words to obtain candidate keywords;

[0029] In the embodiment of the present invention, the collected text is divided into separate words by using the existing word breaker and the part of speech of each word can be obtained, wherein the word breaker can include a word breaker based on a dictionary matching algorithm, a word breaker based on a thesaurus matching device, a word segmenter based on word frequency statistics, and a word segmenter based on knowledge understanding, etc., which are not limited in the embodiments of the present invention.

[0030] After the word is obtained by the tokenizer, the word needs to be further processed, such as filtering the word f...

Embodiment 2

[0047] figure 2 It is the technical flow chart of Embodiment 2 of the present invention, combining figure 2 , a keyword extraction method in the embodiment of the present invention can be further refined into the following steps:

[0048] Step 210: use the tokenizer to segment the text to obtain each word and its part of speech;

[0049] In the embodiment of the present invention, using the existing word segmentation method, the method of segmenting the text into words may be any of the following methods, or any combination of several types.

[0050] The word segmenter based on the dictionary matching algorithm uses dictionary matching, Chinese morphology or other Chinese language knowledge to perform word segmentation, such as: maximum matching method, minimum word segmentation method, etc. The word segmenter based on thesaurus matching is based on the statistical information of words and words, such as the information between adjacent words, word frequency and correspond...

Embodiment 3

[0086] image 3 It is the technical flowchart of Embodiment 3 of the present invention, combining image 3 , a keyword extraction device of the present invention mainly includes a candidate keyword acquisition module 310 , a similarity calculation module 320 , an inverse document frequency calculation module 330 , and a keyword extraction module 340 .

[0087] The candidate keyword acquisition module 310 is used to segment the text using a tokenizer to obtain each word and its part of speech, and perform stop word filtering on the word according to the part of speech and the preset blacklist to obtain candidate keywords ;

[0088] The similarity calculation module 320 is used to calculate the similarity between any two candidate keywords;

[0089] The inverse document frequency calculation module 330 is configured to use the TextRank formula to iteratively calculate the weight of each candidate keyword according to the similarity, and calculate the inverse document frequency...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention provide a keyword extraction method and apparatus. The method comprises the steps of performing word segmentation on a text by utilizing a word segmentation device to obtain words and filtering the words to obtain candidate keywords; calculating the similarity between any two candidate keywords; according to the similarity, calculating a weight of each candidate keyword, and according to a preset corpus, calculating an inverse document frequency of the candidate keyword; and according to the weight and the inverse document frequency of the candidate keyword, obtaining a key degree of the candidate keyword, and according to the key degree of the candidate keyword, selecting a keyword. Therefore, the accuracy of keyword extraction is improved.

Description

technical field [0001] Embodiments of the present invention relate to the field of video technology, and in particular, to a keyword extraction method and device. Background technique [0002] With the continuous development of information technology, a large amount of text information has begun to exist in computer-readable form, and information in many fields has shown explosive growth, such as movie reviews and short reviews on Douban. How to quickly and accurately extract useful information from massive amounts of information will be an important technical requirement. Keyword extraction is an effective means to solve the above problems. Keywords are the refinement of the main information of the article, so as to grasp important information faster and improve the efficiency of information access. [0003] There are roughly two methods of keyword extraction: the first is called keyword assignment, that is, a keyword library is given, and then an article is found to find ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/24575G06F16/31G06F16/7867G06F40/211G06F40/253G06F40/30G06F16/00
Inventor 赵九龙
Owner LETV INFORMATION TECH BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products