Unlock instant, AI-driven research and patent intelligence for your innovation.

Topic model-based text representation method

A text representation and topic model technology, applied in biological neural network models, text database clustering/classification, unstructured text data retrieval, etc. achieve the effect of ensuring quality

Inactive Publication Date: 2021-01-26
SHANXI UNIV
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order for the present invention to overcome the deficiencies in the prior art, a text representation method based on a topic model is provided, through which the topic model is modeled, and then mapped to the Word2vec space to represent the text in a vectorized representation, which solves the topic ambiguity in the LDA topic model Distinguish between flaws and word sequence problems that existing bag-of-words models ignore

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic model-based text representation method
  • Topic model-based text representation method
  • Topic model-based text representation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below. Obviously, the described embodiments are part of the embodiments of the present invention, rather than All the embodiments; based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts all belong to the protection scope of the present invention.

[0047] Such as figure 1 As shown, the embodiment of the present invention provides a text representation method based on a topic model, by establishing an LDA topic model, and then mapping its topic information into the Word2vec vector space, the text information is processed into a vector, and the similarity in the vector space Indicates the semantic similarity of the text, effectively solving the defects...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of text mining, and discloses a topic model-based text representation method, which comprises the following steps of S1, preprocessing a text; S2, performing LDA topic modeling of Gibbs sampling on the preprocessed text; S3, selecting a theme keyword of each theme; carrying out normalization processing on the theme keywords; S4, mapping each topic keyword into a word2vec space; S5, setting a distance threshold value, and deleting the theme keywords of which the cosine distances with other theme keywords are greater than the distance threshold value;S6, calculating the coordinates of each theme in the word2vec space; calculating the coordinates of each text in the word2vec space; and calculating the Euclidean distance between each text and eachtopic, and classifying each text to the topic with the minimum Euclidean distance to the text. According to the method, modeling is carried out through the topic model, and then the topic model is mapped into the Word2vec space to carry out vectorized representation on the text, so that the defects of topic ambiguity distinguishing in the LDA topic model are overcome, and the problem of word sequentiality is solved.

Description

technical field [0001] The invention belongs to the technical field of text mining, and in particular relates to a text representation method based on a topic model. The topic model is modeled and then mapped to Word2vec space to perform vectorized representation of the text. Background technique [0002] In recent years, the number of network texts (such as news, blogs, etc.) has exploded, and they have extremely high research value and commercial value. How to better dig out the hidden value in massive network texts is a key issue that researchers, enterprises and countries are concerned about. [0003] Generally speaking, the traditional unsupervised clustering algorithm based on machine learning does not work well when dealing with text clustering. This is because text clustering uses word bag method and TF-IDF for feature extraction. These methods represent an article as a text-to-word vector based on a corpus. The word vector generated by this text representation is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/289G06F40/216G06F40/30G06N3/04G06F16/35
CPCG06F40/289G06F16/35G06F40/216G06F40/30G06N3/045
Inventor 牛奉高苏雅李廷进
Owner SHANXI UNIV