Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for text clustering

A text clustering and text technology, applied in the field of text processing, can solve the problems of high center point dimension, increase the calculation cost, and affect the speed of cluster convergence.

Inactive Publication Date: 2018-11-23
CHINA CONSTRUCTION BANK
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The center point calculated by the existing K-means conventional text clustering algorithm has high dimensions, low calculation efficiency, and is sensitive to noise and isolated data points. A small amount of noise and isolated data will have a great impact on the average value , affects the speed of cluster convergence, and also increases the computational cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for text clustering
  • Method and device for text clustering
  • Method and device for text clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0013] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without cre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the present invention provide a method and a device for text clustering, and relates to the technical field of text processing. After determining k text data which was obtained text feature vector concentrated as the center point of k cluster, and traversing the text feature vectors set, dividing each text datum into the cluster with the highest similarity, and then executing for each cluster: calculating a mutual information between each word in a text data of a cluster and the clustered data in the cluster, and ranking the calculated amount of mutual information, based on a preset evaluation function of the mutual information according to the method. Extracting T words with high mutual information quantity as the feature vector of the text data, and determining an averagevalue of the feature vectors of all text data in the calculated cluster as a new center point of the cluster, until the preset condition is satisfied. The method and device for text clustering reducing the influence of an isolated data point on the calculation of the new center point, and the method is more accurately, and in the condition of cluster characteristics are ensured to be complete, reducing the dimension, so that the calculation efficiency is improved.

Description

technical field [0001] The present invention relates to the technical field of text processing, in particular to a text clustering method and device. Background technique [0002] Text clustering (Text clustering) document clustering is mainly based on the well-known clustering assumption: documents of the same type have a greater similarity, while documents of different types have a smaller similarity. As an unsupervised machine learning method, clustering has certain flexibility and high automatic processing ability because it does not require training process and manual labeling of documents in advance, and has become an effective method for text information. An important means of organizing, summarizing and navigating, attracting more and more researchers' attention. The center point calculated by the existing K-means conventional text clustering algorithm has high dimensions, low calculation efficiency, and is sensitive to noise and isolated data points. A small amount...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F40/216G06F40/289G06F40/30
Inventor 汪博邹斯韬刘远浩鹿江锋胡汝坤邵小亮徐文静汪平谢隆飞郑坚钢
Owner CHINA CONSTRUCTION BANK
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products