A method, device, equipment, and computer storage medium for generating topic graphs suitable for text analysis or data mining
A text analysis and data mining technology, applied in the computer field, can solve problems such as weak topic relevance and insufficient visualization, and achieve the effects of improving depth and breadth, searching precision, improving efficiency and accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0062] Such as Figure 1~7 As shown, the subject map generation method suitable for text analysis or data mining provided by this embodiment is executed by the subject map engine, and may include but is not limited to the following steps.
[0063] S101. Obtain a corpus containing a large number of documents.
[0064] In the step S101, the corpus is used to provide a sufficient amount of training corpus for the training process of the LDA topic model. The training corpus can be composed of various document data provided by the user or collected by existing collection software. The document can be, but is not limited to, part or several fields in the title, abstract, keywords, body, attachment title, attachment content, and author information. In addition, the massive documents are generally more than ten thousand documents, for example, one hundred thousand documents are selected to form the corpus.
[0065] S102. Perform numerical processing on the word set of each document in the c...
Embodiment 2
[0087] Such as Picture 8 As shown, this embodiment provides a hardware device for implementing the method for generating a topic map suitable for text analysis or data mining described in Embodiment 1, including an acquisition module, a training module, an analysis module, a search module, and a generation Module; The acquisition module is used to obtain a corpus containing a large number of documents; The training module is used to perform numerical processing on the word set of each document in the corpus, and then import the numerical processing result as a training sample into the LDA theme The model is trained to obtain a topic-word matrix and a document-topic matrix, where the topic-word matrix represents the probability of each word appearing in each topic, and the document-topic matrix indicates that each topic appears in each document The analysis module is used to obtain the feature word set of each topic according to the topic-word matrix, and obtain the related topi...
Embodiment 3
[0090] Such as Picture 9 As shown, this embodiment provides a hardware device that implements the method for generating a topic map suitable for text analysis or data mining described in the first embodiment, including a memory and a processor connected in communication, wherein the memory is used to store a computer A program, and the processor is configured to execute the computer program to implement the steps of the topic map generation method suitable for text analysis or data mining as described in the first embodiment.
[0091] For the working process, working details, and technical effects of the theme map generating device provided in this embodiment, please refer to Embodiment 1, which will not be repeated here.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com