Text clustering method on basis of automatic threshold fish swarm algorithm

A technology of text clustering and fish swarm algorithm, applied in the field of text clustering

Inactive Publication Date: 2013-06-05
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF4 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The use of artificial fish swarm algorithm for clustering can overcome the drawbacks of traditional clustering a...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text clustering method on basis of automatic threshold fish swarm algorithm
  • Text clustering method on basis of automatic threshold fish swarm algorithm
  • Text clustering method on basis of automatic threshold fish swarm algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0064] figure 1 It is a flow chart of a specific embodiment of the text clustering method based on the automatic threshold fish swarm algorithm of the present invention. like figure 1 Shown, the present invention comprises the following steps:

[0065] S101: text preprocessing;

[0066] A word segmentation tool is used to perform word segmentation on the N text objects to be clustered, and the words or words after word segmentation are used as feature items of the text object, and the feature items constitute the feature space of the text object. Then remove the stop words from the initial text object, delete the stop words in the feature space such as "的", "是", "是", etc., to obtain a text feature space with a higher dimension, and perform dimensionality reduction processing on the text feature space , and then count the term frequency of the feature item of the text object to be clustered, use the TF-IDF function to calculate the weight of the feature item, and finally rep...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text clustering method on the basis of an automatic threshold fish swarm algorithm. The text clustering method includes computing a similarity matrix of feature vectors of texts, acquiring an initial equivalent partitioning threshold of each text by a corresponding row of elements of the similarity matrix, performing initial equivalent partitioning for the texts and determining an initial clustering number and an initial clustering center; and adopting the artificial fish swarm algorithm in a combination manner, updating the state of each artificial fish according to global optimal information and local optimal information, searching a global optimal clustering center and clustering initial clustering results again. The text clustering method has the advantages that the initial clustering number and the initial clustering center are acquired by a process for automatically acquiring the thresholds, the global optimal clustering center is searched by the aid of the artificial fish swarm algorithm, accordingly, shortcomings that the traditional clustering method is sensitive to initial values and only relies on local data characteristics and the like are overcome, and the text clustering accuracy and the text clustering intelligence can be improved.

Description

technical field [0001] The invention belongs to the technical field of text clustering, and more specifically relates to a text clustering method based on an automatic threshold fish swarm algorithm. Background technique [0002] The continuous growth of network information makes it more important to organize and manage massive text information and facilitate users to obtain useful information. Text information is mostly unstructured or semi-structured data, from which to discover potentially useful knowledge patterns, text clustering technology is a very important method. Because clustering does not require pre-category labels, text clustering has been widely studied and applied. Text clustering can be used as a preprocessing step in natural language processing applications such as multi-document automatic summarization. It can also mine the interest patterns of different users for information services such as information filtering and personalized recommendation. It can a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27G06N3/00
Inventor 孙健梁雪芬徐杰隆克平艾丽丽周云龙唐明王晓丽
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products