Theme map generation method, device and equipment suitable for text analysis or data mining and computer storage medium

A text analysis and data mining technology, applied in the computer field, can solve problems such as insufficient visualization and weak topic relevance, and achieve the effect of improving depth and breadth, improving efficiency and accuracy, and searching accurately

Active Publication Date: 2019-06-18
郑敏杰
View PDF5 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problems of weak inter-subject correlation and insufficient visualization in existing text analysis or data mining, the purpose of...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Theme map generation method, device and equipment suitable for text analysis or data mining and computer storage medium
  • Theme map generation method, device and equipment suitable for text analysis or data mining and computer storage medium
  • Theme map generation method, device and equipment suitable for text analysis or data mining and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] Such as Figure 1-7 As shown, in the method for generating a topic map suitable for text analysis or data mining provided in this embodiment, the main body of the executed software is a topic map engine, and may but not be limited to include the following steps.

[0063] S101. Acquire a corpus containing a large number of documents.

[0064] In the step S101, the corpus is used to provide a sufficient amount of training corpus for the training process of the LDA topic model, and the training corpus can be provided by the user or constitute various document data collected by existing acquisition software, each A document may, but is not limited to, consist of a part or several fields of title, abstract, keywords, text, attachment title, attachment content, and author information. In addition, the mass of documents is generally more than 10,000 documents, for example, 100,000 documents are selected to form the corpus.

[0065] S102. Carry out numerical processing on the...

Embodiment 2

[0087] Such as Figure 8 As shown, this embodiment provides a hardware device that implements the method for generating a topic map suitable for text analysis or data mining described in Embodiment 1, including an acquisition module, a training module, an analysis module, a search module, and a generation module; the acquisition module is used to obtain a corpus that includes a large amount of documents; the training module is used to carry out numerical processing to the word collection of each document in the corpus, and then import the numerical processing results into the LDA theme as a training sample The model is trained to obtain a topic-word matrix and a document-topic matrix, wherein the topic-word matrix represents the probability of each word appearing in each topic, and the document-topic matrix represents the probability of each topic appearing in each document The probability of; the analysis module is used to obtain the feature word set of each topic accordin...

Embodiment 3

[0090] Such as Figure 9 As shown, this embodiment provides a hardware device for implementing the method for generating a topic map suitable for text analysis or data mining described in Embodiment 1, including a memory and a processor connected by communication, wherein the memory is used to store computer program, and the processor is used to execute the computer program to realize the steps of the method for generating a topic map suitable for text analysis or data mining as described in Embodiment 1.

[0091] For the working process, working details and technical effects of the topic map generating device provided in this embodiment, please refer to Embodiment 1, and details are not repeated here.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of computers, and discloses a topic map generation method and device suitable for text analysis or data mining, equipment and a computer storage medium. Inthis way, the invention provides a new method for enabling the application of the probability topic model LDA to reach a brand new level, so each theme can become a node in a semantic network; the complex semantic association among the topics is reflected more deep; the knowledge graph and the traditional knowledge graph are mutually corresponding and complemented; the theme map having an independent application value is achieved; Therefore, the efficiency and accuracy of traditional search and recommendation can be effectively improved, the defects of depth and visualization of traditional data mining or text analysis can be overcome, potential semantic association difficult to find by a traditional method can be mined out, real scientific discovery is achieved, and particularly, the method has a huge potential value in data mining.

Description

technical field [0001] The invention belongs to the field of computer technology, and in particular relates to a method, device, equipment and computer storage medium for generating a subject map suitable for text analysis or data mining. Background technique [0002] Data mining is also translated as data mining or data mining. It is a step in database knowledge discovery (English: Knowledge-Discovery in Databases, referred to as: KDD). Data mining generally refers to the process of searching for information hidden in a large amount of data through algorithms. Data mining is often associated with computer science and accomplishes the above goals through methods such as statistics, online analytical processing, intelligence retrieval, machine learning, expert systems (relying on past rules of thumb), and pattern recognition. [0003] At present, there are many technologies suitable for text analysis or data mining, among which, LDA topic model (LatentDirichlet Allocation, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F16/36
Inventor 郑敏杰
Owner 郑敏杰
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products