Text classification method based on improved firefly algorithm and K neighbors

A firefly algorithm and text classification technology, applied in the field of Chinese text classification, can solve the problems of low efficiency in finding optimal solutions, slow convergence, and not much consideration of algorithms, etc.

Active Publication Date: 2020-03-24
重庆信科设计有限公司 +1
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, most researchers mainly focus on improving the accuracy of feature subsets, without much consideration of the inherent defects of this type of algorithm: easy to fall into local o

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method based on improved firefly algorithm and K neighbors
  • Text classification method based on improved firefly algorithm and K neighbors
  • Text classification method based on improved firefly algorithm and K neighbors

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The technical solutions in the embodiments of the present invention will be described clearly and in detail below in conjunction with the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0055] The technical scheme that the present invention solves the problems of the technologies described above is:

[0056] Such as figure 1 As shown, firstly, the text input set is segmented using jieba software, and then the text set is processed with stop words according to the stop word list of Harbin Institute of Technology. Then calculate the information gain of each word, and sort according to the obtained value from large to small, and keep the top-ranked features to obtain a pre-selected set of text features. The information gain calculation formula is as follows:

[0057] Such as figure 1 As shown, firstly, the text input set is segmented using jieba software, and then the text set is proce...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method based on an improved firefly algorithm and K neighbors. A text feature selection model is constructed by combining information gain and the fireflyalgorithm. The method comprises the following steps: all features are sorted by using information gain, and then a more representative feature subset on a feature set sorted in the front is found out by using the relatively strong optimization capability of an improved firefly algorithm. The step length factor alpha in the firefly algorithm is adjusted, so that the global search capability of the algorithm is ensured, and the local search capability is also ensured. A new fitness function is introduced, so that the dimensionality of the features is properly reduced on the basis of improvingthe precision of the feature subsets. And finally, the model is used for text feature selection, and the obtained feature subset is used for KNN text classification. According to the method, the defects that a firefly algorithm is prone to early maturing and falling into local optimum, the convergence speed is low and the like in the process of searching for the optimal text feature subset can bewell overcome, so that a more accurate subset is obtained, and the text classification accuracy is improved.

Description

technical field [0001] The invention belongs to the field of Chinese text classification, and specifically refers to a text classification method based on the improved firefly algorithm and K-nearest neighbors. Background technique [0002] With the rapid development of Internet technology, more and more users can not only conveniently obtain information resources on the Internet, but also publish information on the Internet, that is, users are the carriers of information release and reception at the same time. Although the representation of information is becoming more and more abundant, so far, the main representation of information is still text. Faced with such a large amount of text data, it is difficult for people to find the information they are interested in. If we only rely on traditional manual methods to organize and manage these text data, it will not only consume a lot of material and manpower, but also difficult to achieve. This forces people to look for a ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06K9/62G06N3/00
CPCG06F16/35G06N3/006G06F18/24147
Inventor 文武赵成刘颖解如风范荣妹
Owner 重庆信科设计有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products