The invention provides a topic clustering method and device,
electronic equipment and a storage medium. According to the method,
regression analysis can be carried out on the text
data set based on aBERT model to obtain a base dataset,
paragraph information of each text can be better expressed, the accuracy of text representation is improved, a configuration quantity of data are selected from thebasic
data set for labeling, a small amount of
annotation information is used for assisting overall
unsupervised clustering, by adopting an Agglomerative Clustering model, clustering is carried out by combining a first inter-class distance and a second inter-class distance, and a
similarity model trained based on a BERT
algorithm is further adopted to obtain the target clustering result, so thatthe clustering result under the large inter-class distance is adopted as guidance, the clustering results under the small inter-class distance are combined, meanwhile, the
recall rate and accuracy areensured, the dependence on the clustering distance and category is reduced, and the clustering effect is improved. The invention further relates to a block chain technology. The BERT model, the Agglomerative Clustering model and the
similarity model can be stored on the block chain.