Text subject detection method and system
A detection method and topic technology, applied in the computer field, can solve the problem of low accuracy of text topic recognition, and achieve the effect of close connection, strong semantics, and improved accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0024] figure 1 The implementation flow of the text topic detection method provided by Embodiment 1 of the present invention is shown. For the convenience of description, only the parts related to the embodiment of the present invention are shown, and the details are as follows:
[0025] In step S101, an LDA model is used to train the input target text to obtain an initial assignment between each word and topic in the target text.
[0026] In the embodiment of the present invention, the target text may be a public traditional topic detection data sample, or may be a social media data document such as microblog, blog, forum, etc. from the Internet. Of course, after these documents are acquired, preprocessing should be performed on these documents, for example, word segmentation, removal of stop words, high and low frequency words, and illegal characters, etc., to obtain the target text in the embodiment of the present invention. Afterwards, the target text to be detected is tr...
Embodiment 2
[0037] figure 2 It shows the implementation process of the topic detection step in the text topic detection method provided by Embodiment 1 of the present invention. For the convenience of description, only the parts related to the embodiment of the present invention are shown, and the details are as follows:
[0038] In step S201, according to the formula Calculate the distribution probability of the training target words under each topic, and sample topics for the training target words.
[0039]In the embodiment of the present invention, from the perspective of vector and word frequency statistics, the formula comprehensively considers the distribution functions under two different representations: topic vector-word embedding and conditional probability distribution of topic-word in the LDA model, to jointly provide training Target word sampling topic. Word embedding is rich in semantic and meaning information, which can effectively capture the internal relationship betw...
Embodiment 3
[0050] image 3 The structure of the text topic detection system provided by the third embodiment of the present invention is shown. For the convenience of description, only the parts related to the embodiment of the present invention are shown, including:
[0051] The first training unit 31 is used to use the LDA model to train the input target text, so as to obtain the initial distribution between each word and topic in the target text;
[0052] The value setting unit 32 is used to set the word embedding of the external corpus obtained in advance as the initial value of the word embedding of the target text;
[0053] The second training unit 33 is used to use the model according to the obtained initial assignment Train the target text to obtain the word embedding and topic vector of the target text, where V represents the total number of words in the dictionary corresponding to the target text, c represents the size of the sliding window in the model L, and w i is the t...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com