Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A keyword extraction method and device

A keyword and acquisition module technology, applied in the direction of unstructured text data retrieval, instrumentation, calculation, etc., can solve the negative impact and difficulty of keyword extraction quality, so as to avoid negative impact, improve quality, and prevent wrong labeling Effect

Active Publication Date: 2021-05-25
BEIJING QIYI CENTURY SCI & TECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for supervised methods, the quality of labeled data directly determines the final effect of extraction. Due to the subjectivity of manual labeling itself, it is very difficult to obtain a sufficient amount of high-quality labeled data. In this case, lower quality Labeling data will have a greater negative impact on the quality of keyword extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A keyword extraction method and device
  • A keyword extraction method and device
  • A keyword extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] figure 1 It is a flowchart of steps of a keyword extraction method provided by an embodiment of the present invention.

[0039] refer to figure 1 As shown, the keyword extraction method provided in this embodiment is applied to a natural language processing system for extracting keywords in text, and the keyword extraction method specifically includes the following steps:

[0040] S101: Obtain a keyword set.

[0041] The keyword set includes multiple keywords that have been manually marked, and the specific acquisition steps are as follows:

[0042] First, obtain the manually labeled data set; then obtain the above-mentioned keyword set through the statistics of the manually labeled data set. The statistics here refer to the classification and recording of the manually labeled keywords in the manually labeled data set.

[0043] S102: Calculate the acceptance rate of each keyword.

[0044] After obtaining the above keyword set, the acceptance rate of each keyword is ...

Embodiment 2

[0063] figure 2 Structural block diagram of a keyword extraction device provided by an embodiment of the present invention

[0064] refer to figure 2 As shown, the keyword extraction device provided in this embodiment is applied to a natural language processing system for extracting keywords in text, and the keyword extraction device specifically includes a set acquisition module 10, an acceptance rate calculation module 20, a set processing module 30 and supplementary labeling module 40 .

[0065] The set acquisition module is used to acquire keyword sets.

[0066] The keyword set includes a plurality of manually marked keywords, and the module specifically includes a data acquisition unit and a data statistics unit.

[0067] The data acquisition unit is used to obtain the manually labeled data set; the data statistics unit is used to obtain the above keyword set through the statistics of the manually labeled data set.

[0068] The acceptance rate calculation module is ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the present invention provides a keyword extraction method and device, the method and device are applied to natural language processing systems, specifically to obtain a keyword set, the keyword set includes a plurality of artificially marked keywords; calculate the keyword set The acceptance rate of each keyword in ; according to the frequency of occurrence and the acceptance rate, the keyword set is processed to obtain a set of keywords to be supplemented, and the set of keywords to be supplemented includes multiple keywords to be supplemented; the keywords to be supplemented are Make additional annotations. Supplementary labeling of keywords to be supplemented can increase the extraction probability of keywords with a high acceptance rate and prevent keywords from being missed. At the same time, it also means suppressing the extraction rate of keywords with low acceptance rates to prevent mislabeling. Through the above measures, the quality of labeled data is effectively improved, and the negative impact of low-quality labeled data on keyword extraction is avoided.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a keyword extraction method and device. Background technique [0002] Natural language processing is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language. Natural language processing is a science that combines linguistics, computer science, and mathematics. Research in this field will therefore involve natural language, the language that people use every day, so it is closely related to the study of linguistics, but has important differences. Natural language processing is not the general study of natural language, but the development of computer systems that can effectively realize natural language communication. [0003] Keyword extraction is an important basic technology of natural la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284G06F16/35
CPCG06F16/35G06F40/279
Inventor 王亮
Owner BEIJING QIYI CENTURY SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products