Method for realizing fast-speed short text bi-cluster

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A short text, double clustering technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve problems such as unreachable, poor results, and low clustering accuracy.

Active Publication Date: 2013-06-26

中科国力(镇江)智能技术有限公司

View PDF5 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] (2) Accurate calculation of short text similarity

At present, although there are many similarity algorithms (such as Euclidean distance method, cos distance method, Pearson coefficient method, VDM method, etc.), according to our research, they all have defects, and the effect is not good in practical applications.

[0007] (3) Fast and accurate clustering of short texts

Traditional single clustering (such as K nearest neighbor method, hierarchical clustering method, etc.) is difficult to achieve accurate clustering. When facing open corpus, the clustering accuracy is generally very low, which cannot meet the needs of practical applications.

Moreover, when the length of the short text is slightly higher, the clustering accuracy is lower

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0036] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0037] Such as figure 1 As shown, a fast short text biclustering method includes the following steps:

[0038] Step 1) Preprocessing of short text distractors, with the support of irrelevant word dictionary and part of speech dictionary, quickly identify and process irrelevant words and part of speech for short text.

[0039] Step 2) Based on the short text similarity calculation, the preprocessed two short text similarities are calculated to form a short text similarity sparse matrix.

[0040] Step 3) Perform first-level clustering of short texts on the short text similarity sparse matrix, and divide similar short texts into clusters one by one according to the settlement results of short text similarity.

[0041] Step 4) Perform secondary clustering of short texts on the basis of primary clustering results.

[0042] The above steps will b...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a method for realizing fast-speed short text bi-cluster. The method comprises the following steps of: (1) preprocessing short text disturbance items, and carrying out fast-speed unrelated-language and word-class recognition and processing recognition on short texts with the support of an unrelated-language dictionary and a word-class dictionary; (2) calculating the similarity of two preprocessed short texts to form a short text similarity sparse matrix; (3) carrying out short text first-level clustering on the short text similarity sparse matrix, and dividing similar short texts into clusters one by one according to the calculation result of the short text similarity; and (4) carrying out second-level clustering on the basis of the result of the first-level clustering.

Description

technical field [0001] The invention relates to natural language processing in the field of artificial intelligence computers, in particular to a fast short text bi-clustering method and its realization by using natural language processing and data clustering. Background technique [0002] In a large number of natural language applications, there is a basic and common problem: for a corpus composed of short texts (hereinafter referred to as short text corpus or corpus), how to organize the short texts according to a certain similarity clustered into different classes. [0003] Generally speaking, the basic idea of text clustering is to cluster "similar" texts into a class; in this class, the "differences" between texts are small. Texts that are not "similar" are clustered into other classes. The "gap" between different classes is large. Here, "similarity" / "gap" is a measure between some texts, which depends on different application requirements. There are many traditio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06F17/27

Inventor符建辉刘亮亮王石王卫民

Owner中科国力(镇江)智能技术有限公司

Method for realizing fast-speed short text bi-cluster

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology