The invention discloses a short-text
data stream classification method based on topic models and
concept drift detection. The method includes: 1, acquiring an external corpus from a knowledge libraryto construct the LDA
topic model; 2, dividing a short-text
data stream into data blocks according to a sliding window mechanism, and using the LDA
topic model to expand short text in the data blocks to obtain an expanded
data stream; 3, constructing the online BTM
topic model for each data block in the expanded short-text data
stream, and obtaining a topic representation of each piece of short text; 4, selecting data blocks of Q topic representations to construct a classifier to use the same to predict a class
label of a newly arrived data block; 5, dividing the data blocks of the Q topic representations into category clusters according to class
label distribution, and calculating semantic distances between the category clusters and the newly arrived data block to judge whether
concept drift occurs; and 6, updating the classifier according to a
concept drift situation. The method can be used for the short-text data
stream classification problem of unceasingly changed class
label distribution.