Text subject detection method and system
A detection method and topic technology, applied in the computer field, can solve the problem of low accuracy of text topic recognition, and achieve the effect of close connection, strong semantics, and improved accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0023] Example one:
[0024] figure 1 The implementation process of the text topic detection method provided in the first embodiment of the present invention is shown. For ease of description, only the parts related to the embodiment of the present invention are shown, which are detailed as follows:
[0025] In step S101, the LDA model is used to train the input target text to obtain the initial assignment between each word and topic in the target text.
[0026] In the embodiment of the present invention, the target text may be a public traditional topic detection data sample, or may be a social media data document such as microblogs, blogs, and forums from the Internet. Of course, after obtaining these documents, these documents should be preprocessed, such as word segmentation, removal of stop words, high and low frequency words, and illegal characters, etc., to obtain the target text in the embodiment of the present invention. After that, the target text to be detected is train...
Example Embodiment
[0036] Embodiment two:
[0037] figure 2 The implementation process of the subject detection step in the text subject detection method provided in the first embodiment of the present invention is shown. For ease of description, only the parts related to the embodiment of the present invention are shown, which are described in detail as follows:
[0038] In step S201, according to the formula Calculate the distribution probability of the training target word under each topic, and sample the topics for the training target word.
[0039] In the embodiment of the present invention, the formula comprehensively considers the distribution functions in two different representation modes from the perspective of vector and word frequency statistics: topic vector-word embedding and the topic-word conditional probability distribution in the LDA model to jointly serve as training The subject of the target word sampling. Word embedding is rich in semantic and word meaning information, which ca...
Example Embodiment
[0049] Embodiment three:
[0050] image 3 The structure of the text topic detection system provided in the third embodiment of the present invention is shown. For ease of description, only the parts related to the embodiment of the present invention are shown, including:
[0051] The first training unit 31 is used to train the input target text using the LDA model to obtain the initial assignment between each word and topic in the target text;
[0052] The value setting unit 32 is configured to set the word embedding of the external corpus obtained in advance as the initial value of the word embedding of the target text;
[0053] The second training unit 33 is used to use the model according to the obtained initial allocation Train the target text to obtain the word embedding and topic vector of the target text, where V represents the total number of words in the dictionary corresponding to the target text, c represents the size of the sliding window in the model L, w i Is the trai...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap