Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text classification method based on correlation analysis and KNN

A technology of correlation analysis and text classification, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of further improvement of efficiency and accuracy, and achieve the effect of improving efficiency and accuracy.

Active Publication Date: 2013-10-09
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0015] The purpose of the present invention is to provide a text classification method based on association analysis and KNN, which is used to solve the problem that the efficiency and accuracy of text classification based on traditional KNN need to be further improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method based on correlation analysis and KNN
  • Text classification method based on correlation analysis and KNN
  • Text classification method based on correlation analysis and KNN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] For the convenience of description, we assume the following application examples: collect news from the Internet and store them in categories for data analysis. To determine the category of the document, the text classification method based on association analysis and KNN proposed by the present invention can be applied.

[0036] The specific embodiment of the present invention is:

[0037] (1) Use web crawlers or related network information grabbing tools to grab a certain number of representative articles in various fields from the Internet as a training sample set for the text classification system.

[0038] (2) Preprocess these texts, remove stop words after word segmentation, obtain feature words, count word frequency and reverse document frequency, and calculate the weight of a feature word relative to each category according to the χ2 feature evaluation method And sum to get the feature evaluation value. Set the final weight of each feature word as: TF-IDF*feat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text classification method based on correlation analysis and a KNN, and the text classification method is used for solving the problem that the efficiency and the accuracy of text classification method based on a traditional KNN need to be further improved. The text classification method is a strategic method, the fact that much coincidence of the characteristic attribute of a tested document and the characteristic attribute of neighbor documents exists is taken into consideration, according to the text classification method based on correlation analysis and the KNN, based on results obtained after correlation analysis is conducted on all classes of documents, a neighbor number k which is suitable for a document of an unknown class is quickly determined, k neighborhoods are selected from documents of unknown classes, therefore, the class of the unknown document is determined according to the classes of the neighborhoods, the defects that according to the text classification method based on the traditional KNN, the value of k is difficult to determine and the time complexity is high are overcome, and the efficiency and the accuracy of text classification are improved.

Description

technical field [0001] The invention relates to the technical field of text mining, in particular to a text classification method based on association analysis and KNN. Background technique [0002] With the development of computer technology and the popularization of the Internet, the number of online texts is increasing rapidly. The previous method of manually screening texts for classification is no longer suitable. There is an urgent need for a fast and efficient technology for collecting data and organizing the required information. , thus resulting in text classification technology. Text classification refers to the process of classifying texts into corresponding predefined categories according to their content under a given classification system. The process of text classification is actually to identify the pattern features of the text, and the key technologies include text preprocessing, feature extraction, classification model, etc. [0003] At present, the commo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 成卫青范恒亮杨庚黄卫东梁胜
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products