Short text topic mining method based on semantic word network

A short text and theme technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of insufficient quality of mining topics

Active Publication Date: 2019-08-16
NANJING UNIV
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Purpose of the invention: The technical problem to be solved by the present invention is that when the traditional topic model considers word co-occurrence information i

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text topic mining method based on semantic word network
  • Short text topic mining method based on semantic word network
  • Short text topic mining method based on semantic word network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0086] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention All modifications of the valence form fall within the scope defined by the appended claims of the present application.

[0087] Such as figure 1 It is a flowchart of a short text topic mining method based on a semantic word network implemented in the present invention. The specific steps are described as follows:

[0088] Step 0 is the initial state of the present invention;

[0089] During the model initialization phase (steps 1-3):

[0090] Step 1 is to collect external corpus in related fields, and there is no requirement for the length of the text;

[0091] Step 2 is to perform...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a short text topic mining method based on a semantic word network. The short text topic mining method comprises the following steps: 1) a model initialization stage: collectingexternal corpora in related fields, preprocessing the corpora, setting parameters and the like; 2) a theme unit construction stage: constructing a semantic word network, searching a specific word triangular structure, calculating model prior parameters and the like; 3) a model training stage: sampling model variables by using a Gibbs sampling method, and judging whether the model reaches a convergence condition or not; and 4) a result output stage: obtaining topic distribution of each word triangle according to a sampling result of each variable after model training is finished, and calculating topic distribution of the original document. According to the method, semantic information learned by an external corpus is combined with the word triangular theme structure, the method is appliedto short text theme mining, compared with a traditional word pair theme model, the method provides a solution for integrating external priori knowledge into the traditional theme model, and the quality of the mined theme is remarkably improved.

Description

technical field [0001] The invention relates to a short text topic mining method, in particular to a short text topic mining method based on a semantic word network, which solves the problem of low topic quality in the case of sparse short text features in common topic mining methods. Background technique [0002] With the continuous acceleration of the pace of social development and the "short, smooth and fast" user experience brought by smart mobile terminals, people's communication on the Internet is becoming more and more fragmented. Therefore, short text data occupies an increasingly important position in today's network information interaction, such as social network status, Weibo text messages, traditional news titles, short video titles, and question-and-answer websites, etc., all appear in the form of short text. And with the rise of large-scale companies such as Weibo, Zhihu, Facebook, and Twitter, short text data is also generated and accumulated at a great speed....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/258G06F40/284G06F40/30
Inventor 张雷经伟蔡洋陆恒杨徐鸣王崇骏
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products