A method, device, equipment, and computer storage medium for generating topic graphs suitable for text analysis or data mining

A text analysis and data mining technology, applied in the computer field, can solve problems such as weak topic relevance and insufficient visualization, and achieve the effects of improving depth and breadth, searching precision, improving efficiency and accuracy

Active Publication Date: 2020-03-24
郑敏杰
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problems of weak inter-subject correlation and insufficient visualization in existing text analysis or data mining, the purpose of the present invention is to provide a method, device, device and method for generating topic maps suitable for text analysis or data mining. computer storage media

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method, device, equipment, and computer storage medium for generating topic graphs suitable for text analysis or data mining
  • A method, device, equipment, and computer storage medium for generating topic graphs suitable for text analysis or data mining
  • A method, device, equipment, and computer storage medium for generating topic graphs suitable for text analysis or data mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] Such as Figure 1~7 As shown, the subject map generation method suitable for text analysis or data mining provided by this embodiment is executed by the subject map engine, and may include but is not limited to the following steps.

[0063] S101. Obtain a corpus containing a large number of documents.

[0064] In the step S101, the corpus is used to provide a sufficient amount of training corpus for the training process of the LDA topic model. The training corpus can be composed of various document data provided by the user or collected by existing collection software. The document can be, but is not limited to, part or several fields in the title, abstract, keywords, body, attachment title, attachment content, and author information. In addition, the massive documents are generally more than ten thousand documents, for example, one hundred thousand documents are selected to form the corpus.

[0065] S102. Perform numerical processing on the word set of each document in the c...

Embodiment 2

[0087] Such as Picture 8 As shown, this embodiment provides a hardware device for implementing the method for generating a topic map suitable for text analysis or data mining described in Embodiment 1, including an acquisition module, a training module, an analysis module, a search module, and a generation Module; The acquisition module is used to obtain a corpus containing a large number of documents; The training module is used to perform numerical processing on the word set of each document in the corpus, and then import the numerical processing result as a training sample into the LDA theme The model is trained to obtain a topic-word matrix and a document-topic matrix, where the topic-word matrix represents the probability of each word appearing in each topic, and the document-topic matrix indicates that each topic appears in each document The analysis module is used to obtain the feature word set of each topic according to the topic-word matrix, and obtain the related topi...

Embodiment 3

[0090] Such as Picture 9 As shown, this embodiment provides a hardware device that implements the method for generating a topic map suitable for text analysis or data mining described in the first embodiment, including a memory and a processor connected in communication, wherein the memory is used to store a computer A program, and the processor is configured to execute the computer program to implement the steps of the topic map generation method suitable for text analysis or data mining as described in the first embodiment.

[0091] For the working process, working details, and technical effects of the theme map generating device provided in this embodiment, please refer to Embodiment 1, which will not be repeated here.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of computers, and discloses a topic map generation method and device suitable for text analysis or data mining, equipment and a computer storage medium. Inthis way, the invention provides a new method for enabling the application of the probability topic model LDA to reach a brand new level, so each theme can become a node in a semantic network; the complex semantic association among the topics is reflected more deep; the knowledge graph and the traditional knowledge graph are mutually corresponding and complemented; the theme map having an independent application value is achieved; Therefore, the efficiency and accuracy of traditional search and recommendation can be effectively improved, the defects of depth and visualization of traditional data mining or text analysis can be overcome, potential semantic association difficult to find by a traditional method can be mined out, real scientific discovery is achieved, and particularly, the method has a huge potential value in data mining.

Description

Technical field [0001] The invention belongs to the field of computer technology, and particularly relates to a method, device, equipment and computer storage medium for generating a topic map suitable for text analysis or data mining. Background technique [0002] Data mining is also translated as data exploration or data mining. It is a step in database knowledge discovery (English: Knowledge-Discovery in Databases, referred to as KDD). Data mining generally refers to the process of searching for information hidden in a large amount of data through algorithms. Data mining is usually related to computer science and achieves the above goals through many methods such as statistics, online analysis and processing, information retrieval, machine learning, expert systems (relying on past rules of thumb), and pattern recognition. [0003] At present, there are many technologies suitable for text analysis or data mining. Among them, the LDA topic model (LatentDirichlet Allocation, prob...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F16/36
Inventor 郑敏杰
Owner 郑敏杰
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products