Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Short Text Topic Mining Method Based on Semantic Word Network

A technology of short text and topics, applied in semantic analysis, natural language data processing, instruments, etc., can solve the problem of insufficient quality of mining topics, and achieve the effect of improving the quality of topics

Active Publication Date: 2021-05-18
NANJING UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Purpose of the invention: The technical problem to be solved by the present invention is that when the traditional topic model considers word co-occurrence information in response to the scarcity of short text data features, the quality of mining topics is not high enough due to the introduced noise information and ignored semantic information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Short Text Topic Mining Method Based on Semantic Word Network
  • A Short Text Topic Mining Method Based on Semantic Word Network
  • A Short Text Topic Mining Method Based on Semantic Word Network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0086] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention All modifications of the valence form fall within the scope defined by the appended claims of the present application.

[0087] Such as figure 1 It is a flowchart of a short text topic mining method based on a semantic word network implemented in the present invention. The specific steps are described as follows:

[0088] Step 0 is the initial state of the present invention;

[0089] During the model initialization phase (steps 1-3):

[0090] Step 1 is to collect external corpus in related fields, and there is no requirement for the length of the text;

[0091] Step 2 is to perform...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short text topic mining method based on a semantic word network, comprising the following steps: 1) model initialization stage: external corpus collection in related fields, corpus preprocessing, parameter setting, etc.; 2) topic unit construction stage: constructing semantic words 3) Model training stage: use the Gibbs sampling method to sample model variables, and judge whether the model has reached the convergence condition; 4) Result output stage: according to the model After the training, the sampling results of each variable are obtained to obtain the topic distribution of each word triangle, and then the topic distribution of the original document is calculated. The present invention combines the semantic information learned from the external corpus with the word triangular topic structure and applies it to short text topic mining. Compared with the traditional word pair topic model, this method provides a method that integrates external prior knowledge into the traditional topic model. solution, and the quality of mining topics has improved significantly.

Description

technical field [0001] The invention relates to a short text topic mining method, in particular to a short text topic mining method based on a semantic word network, which solves the problem of low topic quality in the case of sparse short text features in common topic mining methods. Background technique [0002] With the continuous acceleration of the pace of social development and the "short, smooth and fast" user experience brought by smart mobile terminals, people's communication on the Internet is becoming more and more fragmented. Therefore, short text data occupies an increasingly important position in today's network information interaction, such as social network status, Weibo text messages, traditional news titles, short video titles, and question-and-answer websites, etc., all appear in the form of short text. And with the rise of large-scale companies such as Weibo, Zhihu, Facebook, and Twitter, short text data is also generated and accumulated at a great speed....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/258G06F40/30G06F40/284
CPCG06F40/258G06F40/284G06F40/30
Inventor 张雷经伟蔡洋陆恒杨徐鸣王崇骏
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products