Unlock instant, AI-driven research and patent intelligence for your innovation.

Public opinion topic data clustering method and device, and storage medium

A data clustering and clustering technology, which is applied in the field of devices and storage media, and public opinion theme data clustering methods, can solve the problems of no clustering results and cannot inherit historical clustering results, etc., so as to improve speed and accuracy, reduce The pressure, the applicability of the strong effect

Active Publication Date: 2019-10-25
艾媒咨询(广州)有限公司
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The problem with this method is that the clustering results do not have a general classification threshold standard that can be automatically learned and adjusted. At the same time, historical clustering results cannot be inherited. The huge increase in public opinion articles during the long-term monitoring process also brings continuous growth to clustering calculations. pressure

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Public opinion topic data clustering method and device, and storage medium
  • Public opinion topic data clustering method and device, and storage medium
  • Public opinion topic data clustering method and device, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] At first the terms and terms involved in the present invention are explained and illustrated:

[0054] Word2vec: is a group of related models used to generate word vectors. These models are shallow, two-layer neural networks trained to reconstruct linguistic word texts. The network is represented by words and needs to guess the input words in adjacent positions. Under the assumption of the word bag model in word2vec, the order of words is not important. After the training is complete, the word2vec model can be used to map each word to a vector, which can be used to represent the relationship between words and words, and the vector is the hidden layer of the neural network.

[0055] Bag-of-words model: It is a simplified expression model under natural language processing and information retrieval (IR). Under this model, text such as sentences or documents can be represented by a bag containing these words, which does not consider the grammar and the order of the words....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a public opinion topic data clustering method and a device, and a storage medium, and the method comprises the steps: carrying out the text processing of a to-be-clustered article, obtaining a sentence set, and enabling the text processing to comprise segmentation; adopting a word2vec algorithm to calculate the distance from the sentence set to the existing clustered keyword group; and performing topic clustering according to the calculated distance and a self-adaptive distance threshold to obtain a clustering result, and writing the clustering result into a new clustering topic list or an existing clustering topic list, wherein the existing clustering topic list is composed of existing clusters. According to the method, a universal classification threshold standardcapable of being automatically learned and adjusted is provided through a self-adaptive distance threshold, and the applicability is high; theme clustering is carried out by combining an existing clustering result which is a historical clustering result, so that the clustering result is optimized; a neural network learning method, namely the word2vec algorithm, is adopted to be matched with the distance characteristics of the keyword groups, so that the clustering speed and accuracy are improved, and the method can be widely applied to the field of public opinion monitoring.

Description

technical field [0001] The invention relates to the field of public opinion monitoring, in particular to a public opinion subject data clustering method, device and storage medium. Background technique [0002] Public opinion monitoring integrates Internet information collection technology and information intelligent processing technology. Through automatic capture of massive Internet information, automatic classification and clustering, topic detection, and topic focus, users' information needs such as network public opinion monitoring and news topic tracking are realized, forming Briefings, reports, charts and other analysis results provide customers with a comprehensive grasp of the ideological trends of the masses, make correct guidance of public opinion, and provide analysis basis. [0003] In public opinion monitoring, public opinion data clustering is one of the important means of topic discovery. The current public opinion topic data clustering method includes the fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/35
CPCG06F16/35G06F16/3344
Inventor 张毅
Owner 艾媒咨询(广州)有限公司