Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

EA (Evolutionary Algorithm)-based English text clustering method

A text clustering and evolutionary algorithm technology, which is applied in text database clustering/classification, unstructured text data retrieval, calculation, etc., can solve problems such as too many classes and clustering failure to converge, and achieve accurate clustering division , clear thinking, clear algorithm expression effect

Inactive Publication Date: 2015-05-27
NANJING UNIV OF POSTS & TELECOMM
View PDF2 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Unnecessary clustering results due to too many classes make the clustering unable to converge, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • EA (Evolutionary Algorithm)-based English text clustering method
  • EA (Evolutionary Algorithm)-based English text clustering method
  • EA (Evolutionary Algorithm)-based English text clustering method

Examples

Experimental program
Comparison scheme
Effect test

specific example

[0027] The present invention takes text mining as the background, clusters multiple English texts, and the purpose is to obtain more valuable information according to different categories. figure 1 Perform text preprocessing and vectorization; according to figure 2 Clustering between texts. Specific examples are as follows:

[0028] 1. Split each of the 4 texts into words, and analyze the length of the words in each text; delete words with a length less than 2, and delete stop words;

[0029] 2. Count the total number of words in the 4 texts, the number of each word in each text, and calculate the word frequency f of the word a in the text i (d), judge whether the word a appears in the text, and mark it as 1 if it has appeared; mark it as 0 if it has not appeared, and count the number of texts in which word a has appeared; here, take the No. 1 text D1 as an example, D1's The total number of words is 1000, the word a appears in 3 texts, the word b appears in 3 texts, the wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an EA (Evolutionary Algorithm)-based English text clustering method. The EA-based English text clustering method comprises the steps of firstly preprocessing English texts into vector space models, and then proceeding a clustering process, wherein the clustering process comprises the steps of step one, randomly selecting n clustering centers, carrying out clustering division on the clustering centers by utilizing Euclidean distance, enabling texts in the same type to be classified in the same cluster, and thus completing and obtaining local optimum clustering division; step two, carrying out EA processing, selecting clustering centers of a new generation by using an alliance thought and a gene crossover mutation process, carrying out the clustering division through an inter-text closet distance principle, and thus achieving global optimum. According to the EA-based English text clustering method provided by the invention, the English text can be effectively clustered, unnecessary clustering results can be removed, and thus a clustering process can be more quickly converged.

Description

technical field [0001] The invention relates to an English text clustering method, which uses a local clustering method to select the clustering center of the text, and then uses an evolutionary algorithm to perform global clustering, which belongs to machine learning, text mining, statistical analysis, information Search for cross-technical application fields. Background technique [0002] With the popularization and development of database technology and Internet technology, people have fallen into the embarrassing situation of "rich data but poor knowledge" because of a large amount of data. Facing the vast ocean of data, I am at a loss. Although the amount of information is huge, for users, the required information is only a small part of it. How to accurately obtain the required information from the vast text information resources has become a key issue in information processing. Text mining refers to the process of discovering potential patterns and knowledge from l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/35G06N3/126
Inventor 陈志陈骏岳文静
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products