Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text subject detection method and system

A detection method and topic technology, applied in the computer field, can solve the problem of low accuracy of text topic recognition, and achieve the effect of close connection, strong semantics, and improved accuracy

Active Publication Date: 2016-09-28
SHENZHEN UNIV
View PDF4 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a text topic detection method and system, aiming to solve the problem of low accuracy of text topic recognition in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text subject detection method and system
  • Text subject detection method and system
  • Text subject detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0024] figure 1 The implementation flow of the text topic detection method provided by Embodiment 1 of the present invention is shown. For the convenience of description, only the parts related to the embodiment of the present invention are shown, and the details are as follows:

[0025] In step S101, an LDA model is used to train the input target text to obtain an initial assignment between each word and topic in the target text.

[0026] In the embodiment of the present invention, the target text may be a public traditional topic detection data sample, or may be a social media data document such as microblog, blog, forum, etc. from the Internet. Of course, after these documents are acquired, preprocessing should be performed on these documents, for example, word segmentation, removal of stop words, high and low frequency words, and illegal characters, etc., to obtain the target text in the embodiment of the present invention. Afterwards, the target text to be detected is tr...

Embodiment 2

[0037] figure 2 It shows the implementation process of the topic detection step in the text topic detection method provided by Embodiment 1 of the present invention. For the convenience of description, only the parts related to the embodiment of the present invention are shown, and the details are as follows:

[0038] In step S201, according to the formula Calculate the distribution probability of the training target words under each topic, and sample topics for the training target words.

[0039]In the embodiment of the present invention, from the perspective of vector and word frequency statistics, the formula comprehensively considers the distribution functions under two different representations: topic vector-word embedding and conditional probability distribution of topic-word in the LDA model, to jointly provide training Target word sampling topic. Word embedding is rich in semantic and meaning information, which can effectively capture the internal relationship betw...

Embodiment 3

[0050] image 3 The structure of the text topic detection system provided by the third embodiment of the present invention is shown. For the convenience of description, only the parts related to the embodiment of the present invention are shown, including:

[0051] The first training unit 31 is used to use the LDA model to train the input target text, so as to obtain the initial distribution between each word and topic in the target text;

[0052] The value setting unit 32 is used to set the word embedding of the external corpus obtained in advance as the initial value of the word embedding of the target text;

[0053] The second training unit 33 is used to use the model according to the obtained initial assignment Train the target text to obtain the word embedding and topic vector of the target text, where V represents the total number of words in the dictionary corresponding to the target text, c represents the size of the sliding window in the model L, and w i is the t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention is applicable to the technical field of computers, and provides a text subject detection method and system. The method comprises: training an input target text by using an LDA model, to obtain initial allocation between words and subjects in the target text; setting word embedding of a pre-acquired external corpus as an initial value of word embedding of the target text; according to the obtained initial allocation, training the target text by using a model as shown in the description, to obtain the word embedding and a subject vector of the target text; scanning each document in the target text according to the acquired initial allocation, and the word embedding and subject vector of the target text; and executing a preset subject detection step for each training target word that is obtained through scanning, to obtain a subject related to the target text. Therefore, deeper semantics of the learned word embedding and subject vector can be ensured, and an association with other words is closer, and the accuracy of subject identification is effectively improved.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a text theme detection method and system. Background technique [0002] The main goal of topic detection is to identify the content discussed by analyzing and processing a large number of text collections to discover hidden semantic structures. In recent years, with the rapid development of modern network technology and the popularization of web 2.0 applications, network media has gradually become a public platform for people to express their views and opinions, and the information gathered on the network has also shown an explosive growth trend. How to effectively organize, collate, mine and analyze its content to accurately identify the subject information contained in it plays a decisive role for people from all walks of life to keep abreast of public demands, grasp market trends, and discover unforeseen crises role. [0003] At present, most of the topic detec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 傅向华李晶
Owner SHENZHEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products