EA (Evolutionary Algorithm)-based English text clustering method

A text clustering and evolutionary algorithm technology, which is applied in text database clustering/classification, unstructured text data retrieval, calculation, etc., can solve problems such as too many classes and clustering failure to converge, and achieve accurate clustering division , clear thinking, clear algorithm expression effect

Inactive Publication Date: 2015-05-27
NANJING UNIV OF POSTS & TELECOMM
View PDF2 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Unnecessary clustering results due to too many

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • EA (Evolutionary Algorithm)-based English text clustering method
  • EA (Evolutionary Algorithm)-based English text clustering method
  • EA (Evolutionary Algorithm)-based English text clustering method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0027] The present invention uses text mining as the background to cluster multiple English texts. The purpose is to obtain more valuable information according to different categories. figure 1 Perform text preprocessing and vectorization; according to figure 2 Perform clustering between texts. Specific examples are as follows:

[0028] 1. Split each of the 4 texts into words, and analyze the length of the words in each text; delete words with a length less than 2 and delete stop words;

[0029] 2. Count the total number of words in 4 texts, the number of each word in each text, and calculate the word frequency f of the word a in the text. i (d), to determine whether word a appears in the text, the occurrence is marked as 1; the no occurrence is marked as 0, and the number of texts that have appeared in word a is counted; here is the number one text D1 as an example, D1 The total number of words is 1000, word a appears in 3 texts, word b appears in 3 texts, word c appears in 4 t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an EA (Evolutionary Algorithm)-based English text clustering method. The EA-based English text clustering method comprises the steps of firstly preprocessing English texts into vector space models, and then proceeding a clustering process, wherein the clustering process comprises the steps of step one, randomly selecting n clustering centers, carrying out clustering division on the clustering centers by utilizing Euclidean distance, enabling texts in the same type to be classified in the same cluster, and thus completing and obtaining local optimum clustering division; step two, carrying out EA processing, selecting clustering centers of a new generation by using an alliance thought and a gene crossover mutation process, carrying out the clustering division through an inter-text closet distance principle, and thus achieving global optimum. According to the EA-based English text clustering method provided by the invention, the English text can be effectively clustered, unnecessary clustering results can be removed, and thus a clustering process can be more quickly converged.

Description

technical field [0001] The invention relates to an English text clustering method, which uses a local clustering method to select the clustering center of the text, and then uses an evolutionary algorithm to perform global clustering, which belongs to machine learning, text mining, statistical analysis, information Search for cross-technical application fields. Background technique [0002] With the popularization and development of database technology and Internet technology, people have fallen into the embarrassing situation of "rich data but poor knowledge" because of a large amount of data. Facing the vast ocean of data, I am at a loss. Although the amount of information is huge, for users, the required information is only a small part of it. How to accurately obtain the required information from the vast text information resources has become a key issue in information processing. Text mining refers to the process of discovering potential patterns and knowledge from l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35G06N3/126
Inventor 陈志陈骏岳文静
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products