Unlock instant, AI-driven research and patent intelligence for your innovation.

Hot topic detection method based on RoBERTa-WWM and HDBSCAN algorithms

A hot topic and detection method technology, applied in computing, unstructured text data retrieval, semantic analysis, etc., can solve the problems of poor vector distinguishability, improve accuracy, avoid topic drift and evolution, and avoid poor distinguishability Effect

Active Publication Date: 2022-01-28
CHINA ELECTRONICS TECH CYBER SECURITY CO LTD +1
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to, in order to overcome the defect of prior art, provide a kind of hot topic detection method based on RoBERTa-WWM and HDBSCAN algorithm, the hot topic detection method of the present invention has avoided the vector difference that is caused by the keyword vector representation topic. The problem of poor distinguishability between them fundamentally improves the accuracy of topic detection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hot topic detection method based on RoBERTa-WWM and HDBSCAN algorithms
  • Hot topic detection method based on RoBERTa-WWM and HDBSCAN algorithms
  • Hot topic detection method based on RoBERTa-WWM and HDBSCAN algorithms

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] refer to figure 1 As shown, the present invention discloses a hot topic detection method based on RoBERTa-WWM and HDBSCAN algorithm, and the hot topic detection method includes offline hot topic detection and online hot topic detection.

[0044] Offline hot topic detection is to detect hot topics contained in the existing data in the database. During the processing, the data is fixed and no new topics will be generated.

[0045] Online hot topic detection is to detect hot topics that occur on Internet media platforms within a certain time interval. During this process, the data is constantly updated, and it is necessary to consider the similarity between newly arrived reports and existing topics, as well as the impact of topic drift and evolution on topic detection results. In addition, the calculation efficiency of the algorithm also needs to be considered , to ensure the real-time performance of the calculation results.

[0046] Preferably, the offline hot topic det...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a hot topic detection method based on RoBERTa-WWM and HDBSCAN algorithms. The hot topic detection method comprises the steps of offline hot topic detection and online hot topic detection. The offline hot topic detection is to detect hot topics contained in existing data in a database, and the online hot topic detection is to detect hot topics occurring in an internet media platform within a certain time interval; by means of the hot topic detection method, the problem that in the prior art, due to the fact that the topic is represented by keyword vectors, the distinguishability between the vectors is poor is solved, and the topic detection accuracy is fundamentally improved.

Description

technical field [0001] The invention belongs to the technical fields of natural language processing and network cognitive security, and in particular relates to a hot topic detection method based on RoBERTa-WWM and HDBSCAN algorithms. Background technique [0002] Hot topic detection is a technology that can dig out hot topics or events that people care about and discuss from the current massive network public opinion data. Traditional hot topic detection has two categories: topic detection technology based on topic model and topic detection technology based on text clustering. [0003] With the development of natural language processing technology, the most commonly used is the topic detection technology based on text clustering. This technology first expresses the text data into a vector form that can facilitate mathematical calculations, and then calculates the relationship between the collected text data. Similarity, divide these text data into different clusters, and f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F16/34G06F40/194G06F40/30
CPCG06F16/35G06F40/194G06F40/30G06F16/34
Inventor 刘锟曾曦邱梓珩陈天莹王效武魏刚
Owner CHINA ELECTRONICS TECH CYBER SECURITY CO LTD