Text analysis method based on various deep topic models

A topic model, text analysis technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc. Training and other issues to achieve the effect of improving text analysis capabilities, improving model practicability, and fast parallel training

Pending Publication Date: 2021-02-23
XIDIAN UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] There are deficiencies in existing methods: the LDA topic model cannot be used to extract deep semantic feature topics, and it is difficult to perform hierarchical text analysis. Although the existing deep topic models can extract deep features, the diversity of high-level topics extracted is poor. The ability to express high-level semantic features is limited, which affects the effect of hierarchical feature extraction, resulting in poor performance of subsequent tasks such as text classification; moreover, the traditional Gibbs sampling method is used to train deep topic models, which requires a large amount of calculation and convergence. The speed is slower, and the existing improved Gibbs sampling method with faster convergence speed is not suitable for large data scenarios that require online training, it is difficult to train in parallel, and its practicability is limited
The disadvantage of this method is that although the deep-level features of the text can be extracted, as the number of layers deepens, the extracted topic keywords have high similarity and poor diversity, and do not have good separability. Will affect subsequent text analysis capabilities
However, this method requires one-time input of text data for sampling to learn the parameters of the topic model. When the amount of data is large, it is difficult to perform parallel training due to the limitation of the computing power of current computer hardware. It is not suitable for large data scenarios and has limited practicability.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text analysis method based on various deep topic models
  • Text analysis method based on various deep topic models
  • Text analysis method based on various deep topic models

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be described in further detail below in conjunction with specific examples, but the embodiments of the present invention are not limited thereto.

[0033] See figure 1 , figure 1 It is a block diagram of steps of a text analysis method based on a variety of deep topic models provided by an embodiment of the present invention, including:

[0034] Construct a training sample set and a test sample set of text data;

[0035] Construct a variety of deep topic models according to the training sample set, and initialize the initial model parameters of the various deep topic models;

[0036] Training a variety of deep topic models according to the training sample set to obtain training model parameters, and updating the initial model parameters according to the training model parameters to obtain a variety of deep topic models after training;

[0037] According to the test sample set, train the various deep topic models after training to obtain som...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text analysis method based on various deep topic models. The method comprises the following steps: constructing a training sample set and a test sample set of text data; constructing various deep topic models according to the training sample set, and initializing initial model parameters of the various deep topic models; training the diversified deep topic models according to the training sample set to obtain training model parameters, and updating the initial model parameters according to the training model parameters to obtain trained diversified deep topic models;training the trained diverse deep topic model according to the test sample set to obtain a plurality of test hidden layer features; performing visual analysis on the training model parameters according to the plurality of hidden layer features to obtain a plurality of text topics; and classifying the text data according to the plurality of text topics, the training sample set, the test hidden layer features and the tested diversified deep topic model. According to the invention, text data characteristics can be comprehensively reflected, so that the text theme has relatively good separability,and the text analysis capability is high.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a text analysis method based on various deep topic models. Background technique [0002] With the rapid development of mobile Internet and information technology, the era of big data has arrived. Massive data in a vast network urgently needs effective processing and analysis methods. Especially text-type data often contain a huge amount of information. Governments, enterprises and individuals have an increasing demand for intelligent text analysis, so natural language processing technology can be further developed. Among them, topic model, as a text mining method, can effectively extract text features and discover potential semantic themes in text data, and is widely used in text analysis tasks in the field of machine learning and data mining, such as text clustering, hotspot mining, Sentiment analysis, information retrieval, recommendation syste...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 陈渤陈文超赵倩茹刘应祺刘宏伟
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products