Text subject detection method and system

A detection method and topic technology, applied in the computer field, can solve the problem of low accuracy of text topic recognition, and achieve the effect of close connection, strong semantics, and improved accuracy

Active Publication Date: 2016-09-28
SHENZHEN UNIV
View PDF4 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a text topic detection method and s

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text subject detection method and system
  • Text subject detection method and system
  • Text subject detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0023] Example one:

[0024] figure 1 The implementation process of the text topic detection method provided in the first embodiment of the present invention is shown. For ease of description, only the parts related to the embodiment of the present invention are shown, which are detailed as follows:

[0025] In step S101, the LDA model is used to train the input target text to obtain the initial assignment between each word and topic in the target text.

[0026] In the embodiment of the present invention, the target text may be a public traditional topic detection data sample, or may be a social media data document such as microblogs, blogs, and forums from the Internet. Of course, after obtaining these documents, these documents should be preprocessed, such as word segmentation, removal of stop words, high and low frequency words, and illegal characters, etc., to obtain the target text in the embodiment of the present invention. After that, the target text to be detected is train...

Example Embodiment

[0036] Embodiment two:

[0037] figure 2 The implementation process of the subject detection step in the text subject detection method provided in the first embodiment of the present invention is shown. For ease of description, only the parts related to the embodiment of the present invention are shown, which are described in detail as follows:

[0038] In step S201, according to the formula Calculate the distribution probability of the training target word under each topic, and sample the topics for the training target word.

[0039] In the embodiment of the present invention, the formula comprehensively considers the distribution functions in two different representation modes from the perspective of vector and word frequency statistics: topic vector-word embedding and the topic-word conditional probability distribution in the LDA model to jointly serve as training The subject of the target word sampling. Word embedding is rich in semantic and word meaning information, which ca...

Example Embodiment

[0049] Embodiment three:

[0050] image 3 The structure of the text topic detection system provided in the third embodiment of the present invention is shown. For ease of description, only the parts related to the embodiment of the present invention are shown, including:

[0051] The first training unit 31 is used to train the input target text using the LDA model to obtain the initial assignment between each word and topic in the target text;

[0052] The value setting unit 32 is configured to set the word embedding of the external corpus obtained in advance as the initial value of the word embedding of the target text;

[0053] The second training unit 33 is used to use the model according to the obtained initial allocation Train the target text to obtain the word embedding and topic vector of the target text, where V represents the total number of words in the dictionary corresponding to the target text, c represents the size of the sliding window in the model L, w i Is the trai...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention is applicable to the technical field of computers, and provides a text subject detection method and system. The method comprises: training an input target text by using an LDA model, to obtain initial allocation between words and subjects in the target text; setting word embedding of a pre-acquired external corpus as an initial value of word embedding of the target text; according to the obtained initial allocation, training the target text by using a model as shown in the description, to obtain the word embedding and a subject vector of the target text; scanning each document in the target text according to the acquired initial allocation, and the word embedding and subject vector of the target text; and executing a preset subject detection step for each training target word that is obtained through scanning, to obtain a subject related to the target text. Therefore, deeper semantics of the learned word embedding and subject vector can be ensured, and an association with other words is closer, and the accuracy of subject identification is effectively improved.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a text theme detection method and system. Background technique [0002] The main goal of topic detection is to identify the content discussed by analyzing and processing a large number of text collections to discover hidden semantic structures. In recent years, with the rapid development of modern network technology and the popularization of web 2.0 applications, network media has gradually become a public platform for people to express their views and opinions, and the information gathered on the network has also shown an explosive growth trend. How to effectively organize, collate, mine and analyze its content to accurately identify the subject information contained in it plays a decisive role for people from all walks of life to keep abreast of public demands, grasp market trends, and discover unforeseen crises role. [0003] At present, most of the topic detec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 傅向华李晶
Owner SHENZHEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products