Short text keyword extraction method and device

An extraction method and keyword technology, which is applied in the field of short text keyword extraction method and device, can solve the problems of large difference in proportion, impact of text mining tasks, and impact on keyword accuracy, and achieve the effect of improving accuracy

Pending Publication Date: 2022-04-22
电科云(北京)科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Secondly, some common background words (such as "tomorrow", "hehe", etc.) appear very frequently in Weibo, which will also have a certain impact on text mining tasks
Furthermore, because microblogs limit the number of characters of blog posts published by users, for example, they cannot exceed 140 characters, so most of the microblog texts are short texts published or forwarded and commented on other people's microblogs, and short texts are very important for extracting keywords. have a great impact
In addition, the texts forwarded and commented on other people's microblogs are not only short, but often lack important information, which leads to the inability to effectively identify keywords
In addition, Weibo texts cover a wide range of fields, and the proportion of each field varies greatly. This feature also seriously affects the accuracy of keywords extracted based on statistical information.
[0004] Therefore, the accuracy of keyword extraction needs to be improved urgently for short texts similar to Weibo that have the characteristics of high noise, lack of important information, and rich coverage but large differences in proportion.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text keyword extraction method and device
  • Short text keyword extraction method and device
  • Short text keyword extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.

[0030] In order to solve the problem in the existing technology that it is difficult to accurately extract keywords from short texts similar to Weibo that cover a wide range of fields but have a large difference in proportion, the embodiment of the present invention provides a short text This keyword extraction method.

[0031] It should be noted in advance that the descriptions of the following embodiments or examples or the features mentioned therein can be combined with the features in other embodiments or examples in the same or similar manner, or replace the feat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a short text keyword extraction method and device.The method comprises the steps that a to-be-extracted keyword text and a text related to the source of the to-be-extracted keyword text are spliced, and a long text is obtained; performing theme classification on the long text by utilizing a set theme model to obtain theme classification data; and calculating a word importance score considering category statistical information based on the subject classification data to obtain keywords of the text to be subjected to keyword extraction. By means of the scheme, the problems that the short text is short and the data set deflects due to the fact that text differences in all fields are large can be solved, and therefore the accuracy of keyword extraction on the short text can be improved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a short text keyword extraction method and device. Background technique [0002] In recent years, Weibo has developed rapidly due to its platform's openness and concise content, and is becoming an important channel and carrier for social relationship maintenance and information dissemination in human society. People can use Weibo to share data in real time. At the same time, because Weibo content can be released in real time through various communication means (such as mobile phones, etc.), it is easy to generate a large amount of data in a short time. However, these data are usually messy, and it is difficult to obtain interesting information from them in a timely and accurate manner. Therefore, it is particularly important to extract keywords of microblog text. Accurately extracting keywords can effectively identify hot words and hot topics of the day. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/216G06F40/284G06F16/33G06F16/35
CPCG06F40/216G06F40/284G06F16/3335G06F16/35
Inventor 汪涛张守菊黄佳佳戴永恒刘学谦
Owner 电科云(北京)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products