Multi-level topic vector space construction method and device, apparatus and storage medium

A vector space and construction method technology, applied in the field of devices, construction methods of multi-level topic vector spaces, equipment and storage media, can solve the problems that it is difficult to directly define the numerical distance between the topic and the topic, and the topic cannot be represented by real-valued vectors, etc., to achieve Reduced computational effort, low-impact effects

Active Publication Date: 2020-03-17
浙江大搜车软件技术有限公司
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the above-mentioned topic models can extract the potential features in the text, they have the following disadvantages: the vector size of algorithms such as LDA can only be the dictionary dimension, a

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-level topic vector space construction method and device, apparatus and storage medium
  • Multi-level topic vector space construction method and device, apparatus and storage medium
  • Multi-level topic vector space construction method and device, apparatus and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the concepts of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

[0050] Furthermore, the drawings are merely schematic illustrations of the application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus repeated descriptions thereof will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a multi-level topic vector space construction method and device, an apparatus and a storage medium. The construction method comprises the following steps of extracting a global word co-occurrence matrix from a corpus; modeling the global word co-occurrence matrix, generating topic libraries of different levels, and respectively generating the topic correlation matrixes ofdifferent levels according to the topic libraries of different levels; and constructing a topic vector space for the topic correlation matrix of each level. Therefore, a topic can be expressed as thetopic vector of any dimension, and the defect that the vector size of the algorithms, such as LDA, etc., can only be the dictionary dimension, is avoided; meanwhile, based on the modeling of the noisewords, the influence of the noise words on clustering can be weakened.

Description

technical field [0001] The present application relates to the technical field of semantic analysis, in particular to a construction method, device, device and storage medium of a multi-level subject vector space. Background technique [0002] The commonly used vectorization method is to represent the text as a vector composed of real-valued elements (binary value, word frequency value or TF-IDF value). Although these algorithms are simple, they treat words as independent individuals without considering the semantic relationship between words, which affects the accuracy of classification. In order to overcome this shortcoming, a topic-based vectorization algorithm, Latent Semantic Indexing (LSI) algorithm, is proposed, which uses singular value decomposition to reduce the dimension of the document-word matrix. Later, the variant Probabilistic Latent Semantic Analysis (PLSA) algorithm of the LSI algorithm introduced the text generation model method and defined the text-word p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/30G06F40/284G06F16/35
CPCG06F16/35
Inventor 吴欣辉
Owner 浙江大搜车软件技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products