Unlock instant, AI-driven research and patent intelligence for your innovation.

A knn text classification method based on improved k-medoids

A text classification and text technology, applied in the fields of instrumentation, computing, electrical digital data processing, etc., can solve the problems of classifier performance impact, low classification efficiency, classification performance impact, etc., to reduce the impact of initial center point sensitivity and classification accuracy. and the effect of improving classification efficiency and improving classification efficiency

Inactive Publication Date: 2018-11-02
BEIJING UNIV OF TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] As one of the classic classification methods, KNN has the advantages of simple implementation and high robustness; but there are also many shortcomings that make it unsuitable for many practical applications.
The shortcomings of KNN mainly include the following two aspects: First, the classification process consumes a lot of time due to the huge amount of similarity calculations, resulting in low classification efficiency.
Second, the classification performance is easily affected by the training samples. When the data is severely unevenly distributed, the performance of the classifier may be seriously affected, or even become extremely poor.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A knn text classification method based on improved k-medoids
  • A knn text classification method based on improved k-medoids
  • A knn text classification method based on improved k-medoids

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention is realized by adopting the following technical means:

[0039] A KNN text classification method based on improved K-Medoids. Firstly, the training text set and the test text set are preprocessed, including word segmentation, stop word processing, DF feature selection, and both the training text and the test text are expressed as vectors; then the training text is processed by the improved K-Medoids method Crop to get a new training text set S new ;Finally, the representative degree function is defined and introduced into the category attribute function of the original KNN algorithm for KNN classification.

[0040] The above-mentioned improved KNN text classification method comprises the following steps:

[0041] Step 1, download the publicly released Chinese corpus from the Internet - the training text set and the test text set;

[0042] Step 2, using the word segmentation software ICTCLAS to perform word segmentation and stop word removal prep...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A KNN text classification method based on improved K‑Medoids, which involves the field of computer text data processing; first, the training text set and test text set will be preprocessed, including word segmentation, stop word removal, DF feature selection and vector Indicates that the training text vector space and the test text vector space are obtained; then the training sample clipping based on the improved K-Medoids method is carried out, that is, optimization is carried out from the perspective of the initial center point selection and the replacement center point search strategy, and it is applied to the training The sample is clipped to obtain a new training text space; finally, KNN classification is performed, and the representativeness function is defined, which is applied to the category attribute function for KNN classification to obtain the final result. Experimental results show that, compared with traditional KNN methods and K-Medoids-based KNN methods, the present invention has higher classification accuracy and classification efficiency.

Description

technical field [0001] The invention relates to the field of computer text data processing, in particular to a K-nearest-neighbor (K-Nearest-Neighbor, KNN) text classification method based on improved K-Medoids. Background technique [0002] With the development of the Internet, the Internet of Things, and cloud computing, data is growing exponentially, leading us into the era of big data. The Internet Data Center (IDC) of the United States pointed out that the data on the Internet is increasing at a rate of 50% every year, and more than 90% of the data in the world is generated in recent years. At present, the amount of global data has reached the ZB level, and with the generation of a large amount of data, there is also great potential value contained in it. [0003] In today's era of big data, it is very important to mine the potential value of data. As a technology to discover the potential value of data, data mining has attracted great attention. Text data accounts f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/355G06F18/241
Inventor 汪友生樊存佳王信
Owner BEIJING UNIV OF TECH