Topic clustering method and device, electronic equipment and storage medium

A clustering method and topic technology, applied in the field of data processing, can solve problems such as sparseness, inaccurate clustering results, and access to key information, reduce distance and category dependencies, ensure recall and accuracy, and improve clustering. effect of effect

Pending Publication Date: 2020-10-09
ONE CONNECT SMART TECH CO LTD SHENZHEN
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the complexity, redundancy, and rapidity of updates and dissemination of online news, it is difficult for users to quickly and accurately obtain the key information they need, and there are many difficulties in fine-grained topic clustering for news.
[0003] In the existing technical solutions, the clustering method often needs to set the number of clusters or the distance, and this information cannot be known in advance, which brings a huge challenge to the clustering task
Since it is an unsupervised task, it is impossible to determine the number of clusters or distances of the test data by obtaining the training data
[0004] In addition, when clustering, there are also problems that the text cannot be fully understood and the features are sparse, which will lead to inaccurate clustering results, so that the same topic is divided into different categories, or the same category is divided into different categories. topic

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic clustering method and device, electronic equipment and storage medium
  • Topic clustering method and device, electronic equipment and storage medium
  • Topic clustering method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0061] Such as figure 1 Shown is a flow chart of a preferred embodiment of the topic clustering method of the present invention. According to different requirements, the order of the steps in the flowchart can be changed, and some steps can be omitted.

[0062] The topic clustering method is applied to one or more electronic devices, and the electronic device is a device that can automatically perform numerical calculation and / or information processing according to preset or stored instructions, and its hardware includes but not Limited to microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable gate arrays (Field-Programmable Gate Array, FPGA), digital processors (Digital Signal Proc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a topic clustering method and device, electronic equipment and a storage medium. According to the method, regression analysis can be carried out on the text data set based on aBERT model to obtain a base dataset, paragraph information of each text can be better expressed, the accuracy of text representation is improved, a configuration quantity of data are selected from thebasic data set for labeling, a small amount of annotation information is used for assisting overall unsupervised clustering, by adopting an Agglomerative Clustering model, clustering is carried out by combining a first inter-class distance and a second inter-class distance, and a similarity model trained based on a BERT algorithm is further adopted to obtain the target clustering result, so thatthe clustering result under the large inter-class distance is adopted as guidance, the clustering results under the small inter-class distance are combined, meanwhile, the recall rate and accuracy areensured, the dependence on the clustering distance and category is reduced, and the clustering effect is improved. The invention further relates to a block chain technology. The BERT model, the Agglomerative Clustering model and the similarity model can be stored on the block chain.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a topic clustering method, device, electronic equipment and storage medium. Background technique [0002] With the rapid development of Internet technology, major news portals have emerged as the times require, and these websites have become the main channels for news media to release news and people to obtain information. However, due to the complexity, redundancy, and rapidity of updates and dissemination of online news, it is difficult for users to quickly and accurately obtain the key information they need, and there are many difficulties in fine-grained topic clustering for news. . [0003] In the existing technical solutions, the clustering method often needs to set the number of clusters or the distance, and this information cannot be known in advance, which brings a huge challenge to the clustering task. Since it is an unsupervised task, it is impossible...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F40/194G06F40/216G06K9/62
CPCG06F16/35G06F40/216G06F40/194G06F18/2321
Inventor 杨凤鑫徐国强
Owner ONE CONNECT SMART TECH CO LTD SHENZHEN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products