Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Key phrase extraction method and device

A technology of key phrases and phrases, which is applied in the field of text processing, can solve the problems of inaccurate key phrase extraction and low precision, and achieve the effect of improving precision and accuracy

Active Publication Date: 2018-01-12
BEIJING QIYI CENTURY SCI & TECH CO LTD
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a key phrase extraction method and device to solve the problem of inaccurate key phrase extraction and low precision

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Key phrase extraction method and device
  • Key phrase extraction method and device
  • Key phrase extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] refer to figure 1 , which shows the flow chart of the key phrase extraction method in Embodiment 1 of the present invention, such as figure 1 As shown, the method may include the following steps:

[0055] Step 101, preprocessing the text to obtain multiple word segmentations.

[0056] The text in the embodiment of the present invention is the text that needs to carry out keyphrase extraction, example, can be the video title of video website, or article data etc., the format of this text can be word, pdf etc. commonly used text format, the present invention The embodiment does not limit this. The participle in Chinese is the smallest and meaningful language component that can act independently, while Chinese uses characters as the basic writing unit, which will lead to no obvious distinguishing marks between words. Therefore, when the text is a Chinese text, it is necessary to preprocess the text to determine the word segmentation. By preprocessing the text to obtain ...

Embodiment 2

[0066] refer to figure 2 , which shows the flow chart of the key phrase extraction method in Embodiment 2 of the present invention, such as figure 2 As shown, the method may include the following steps:

[0067] Step 201, preprocessing the text to obtain multiple word segmentations.

[0068] Preprocessing the text in this embodiment of the present invention may be to segment the text according to a certain principle. For example, when performing word segmentation, you can use common word segmentation databases, such as common dictionaries, to perform word-by-word traversal, and traverse and match all the words in the common word segmentation database in the text according to the order of arrangement. If the match is successful Then the current word is determined as the word segmentation of the text, and so on, until all the words in the common word segmentation database are matched once, and multiple word segmentation of the text is determined.

[0069] In specific implem...

Embodiment 3

[0111] refer to image 3 , which shows a block diagram of a key phrase extraction device in Embodiment 3 of the present invention, such as image 3 As shown, the device 30 may include:

[0112] A preprocessing module 301, configured to preprocess the text to obtain multiple word segmentations;

[0113]A combination module 302, configured to combine every two adjacent word segments in the plurality of word segments to obtain a plurality of word pairs;

[0114] The first determination module 303 is used to determine the co-occurrence information of each word pair in the plurality of word pairs through the preset word collocation feature table;

[0115] The second determining module 304 is configured to determine key phrases of the text according to the co-occurrence information of each word pair.

[0116] In summary, in the key phrase extraction device provided by Embodiment 3 of the present invention, when determining key phrases, the first determination module can determine...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a key phrase extraction method and device and relates to the field of text processing technology. According to the key phrase extraction method and device, when a key phrase isdetermined, co-occurrence information of a word pair can be determined, and the key phrase in a text can be determined according to the co-occurrence information of the word pair. The co-occurrence information can represent the relation between all segmented words forming the word pair, the corresponding phrase mostly has the characteristics of a set phrase and proper nouns, and by using the co-occurrence information as a basis for determining the key phrase, the accuracy of key phrase extraction can be improved, and the precision of key phrase extraction is improved.

Description

technical field [0001] The invention relates to the technical field of text processing, in particular to a key phrase extraction method and device. Background technique [0002] In order to improve the efficiency of people's browsing, it is usually necessary to determine the key information in the text to express the text concisely. For example, when recommending a video, phrases or words in the title of the video are usually extracted as recommended content to succinctly represent the content of the video. As phrases are used more and more, how to automatically extract key phrases has become a research hotspot. [0003] In the prior art, key phrases are extracted according to grammatical rules. Usually, word segmentation is combined to meet specific grammatical rules, for example, part-of-speech sequence requirements, and then the combination is determined as a key phrase. [0004] Since phrases are generally proper nouns or fixed collocations, in the prior art only sati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
Inventor 王亮
Owner BEIJING QIYI CENTURY SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products