Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi-dimensional text clustering method based on metric learning

A technology of text clustering and metric learning, applied in the fields of machine learning and natural language processing, can solve the problems of ignoring the heterogeneity of different feature spaces, no potential relationship, etc., and achieve the effect of improving high-dimensional sparsity

Pending Publication Date: 2019-11-29
GUIZHOU UNIV
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Such algorithms not only ignore the heterogeneity between different feature spaces, but also have no mechanism to explore the potential relationship between clustering results and multiple dimensions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-dimensional text clustering method based on metric learning
  • Multi-dimensional text clustering method based on metric learning
  • Multi-dimensional text clustering method based on metric learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0018] Embodiment 1: as attached figure 1 Shown, a kind of multidimensional text clustering method based on metric learning, described method comprises the following steps:

[0019] Step 1: Select two dimensions from the data set, denoted as: dimension A and dimension B, and perform feature vectorization representation;

[0020] Step 2: Use the K-Means clustering method combined with metric matrix learning to perform initial clustering on the two dimensions A and B respectively;

[0021] Step 3: Determine whether the current clustering result meets the termination condition. If not, set the constraint to the upper limit constant and execute step 4. Otherwise, end the algorithm and output the clustering result to assist downstream tasks;

[0022] Step 4: Use the current clustering results of dimensions A and B to select constraint pairs that meet the conditions in dimensions A and B, and do not exceed the upper limit of constraint pairs given in step 3 to form constraint sets ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-dimensional text clustering method based on metric learning. The multi-dimensional text clustering method mainly comprises the following steps: 1, selecting two dimensions from a data set and carrying out feature vectorization representation; 2, using a K-Means clustering method and combining metric matrix learning to respectively carry out initialization clusteringon the two dimensions; 3, judging whether the current clustering result reaches an end condition, if not, setting constraints to execute the step 4 on the upper limit constant, otherwise, ending thealgorithm, and outputting the clustering result to assist the downstream task; 4, selecting constraint pairs meeting conditions by utilizing the clustering result of the current dimension; 5, adding the constraint set generated in the step 4 into a dimension clustering process, and adjusting learning of a target function and a metric matrix to obtain clustering results of two dimensions; and 6, repeating the steps 3-5. According to the method, expression modes of data in different feature spaces are comprehensively considered based on a multi-dimensional text clustering algorithm of metric learning, multi-dimensional auxiliary clustering is carried out, and a good clustering effect is achieved.

Description

technical field [0001] The invention relates to a text clustering method, in particular to a multi-dimensional text clustering method based on metric learning, and belongs to the technical fields of machine learning and natural language processing. Background technique [0002] Multidimensional data is very common in practical applications in the era of big data. For example, the description of a webpage can be not only the words involved in itself, but also the related links of the webpage; in text-related tasks, text features, semantic information, and even user behaviors such as liking, reposting, and commenting on the text can be used. Also describe the text. Correspondingly, multidimensional clustering, as a basic task in machine learning, pattern recognition and data mining, has become an important extension of clustering. Traditional text clustering generally extracts features from text content, and then directly combines them with other dimension attribute features...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35
CPCG06F16/35Y02D10/00
Inventor 黄瑞章白瑞娜秦永彬陈艳平
Owner GUIZHOU UNIV