Iteration text clustering method based on self-adaptation subspace study

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of subspace learning and text clustering, applied in the field of iterative text clustering, which can solve problems such as overfitting and limited application scope.

Active Publication Date: 2013-09-04

广东南方报业传媒集团新媒体有限公司

View PDF2 Cites 19 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the limitation of NAML is that its optimization process must depend on multiple key parameters, which can easily lead to overfitting when the data is insufficient.

[0008] Although the idea of adaptive dimensionality reduction and related methods can solve specific text clustering problems, there are also some technical defects pointed out above, which limit its application scope and leave a certain space for the improvement of text clustering algorithms.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0065] Such as figure 1 As shown, the iterative text clustering method based on adaptive subspace learning includes the following steps:

[0066] (1) Clustering initialization of the text vector space: from the word segmentation expressions of all documents in the text corpus, a set of representative terms is selected using the mutual information method to form a term index; then each document is represented according to the term index is a text vector, the dimension of the text vector corresponds to the size of the selected term index, and the value of each element of the vector is represented by tfidf weight; all documents in the text corpus constitute an original text vector space; in the original In the text vector space, the affine propagation clustering algorithm is adopted to generate the specified K initial clusters (K-AP), and each document obtains its initial category, and the category information of all document clusters is summarized to form an initial category ind...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an iteration text clustering method based on self-adaptation subspace study. The method includes the following steps: (1) initiation: text linguistic data is expressed as a text vector space, initial K clusters are generated through an affine propagation clustering method, and all text clustering categories are expressed as an initial category affiliation indication matrix; and (2) iteration between the subspace projection and the clusters: the initial category affiliation indication matrix is used as prior knowledge, a maximum average neighborhood edge is used as a target to solve a subspace projection matrix, the text vector space is projected to a subspace, K clusters are generated through the affine propagation clustering method in the subspace, and a category affiliation indication matrix is updated; and a convergent function is calculated based on the subspace projection matrix and the category affiliation indication matrix till the function is converged, iteration exits, and text clustering is finished. The iteration text clustering method does not limit the capacity and distribution of text data, subspace solution and clusters are fused under a uniform frame, and an overall optimal clustering result is obtained through an iteration strategy.

Description

technical field [0001] The present invention relates to the field of machine learning and pattern recognition, in particular to an iterative text clustering method based on adaptive subspace learning, which is an adaptive subspace learning method based on the maximization of the average neighborhood edge, and adopts an iterative strategy Use it to solve text clustering problems. Background technique [0002] With the popularization and development of Internet technology and database technology, people can easily acquire and store large amounts of data. Most of the data in reality exists in the form of text. As a means, text clustering can organize, summarize and navigate text information, and help to accurately obtain the required information from the vast text information resources. Therefore, in recent years, it has been Gain widespread attention. [0003] In text clustering, text is often represented by Vector Space Model (VSM), but this representation is characterized ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06K9/66

Inventor吴娴杨兴锋张东明何崑

Owner广东南方报业传媒集团新媒体有限公司

Iteration text clustering method based on self-adaptation subspace study

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology