Unlock instant, AI-driven research and patent intelligence for your innovation.

Semantic clustering method

A clustering method and semantic technology, applied in the field of semantic clustering, can solve problems such as fatigue, consumption of computing resources, and obscure concepts for non-professionals, and achieve the effect of low understanding cost, reduced cost, and easy-to-understand concepts.

Pending Publication Date: 2022-03-29
北京尘锋信息技术有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] (1) Some clustering methods, such as KMeans, need to set the number of cluster categories in advance. If there is no prior, it is difficult to determine an appropriate number of clusters
Although there are some theories to guide the determination of the number of clusters, it is still weak and consumes more computing resources when more accurate clustering is required.
[0004] (2) For other clustering methods, although the number of clustering categories does not need to be determined in advance, it requires a lot of calculations and some hyperparameter settings, and the clustering results are often unsatisfactory, such as hierarchical clustering
[0005] (3) In addition, there are some clustering methods, the concept is obscure to non-professionals, when interaction clustering is required, it is difficult to explain the meaning of interaction parameters, such as spectral clustering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The technical solution of this patent will be further described in detail below in conjunction with specific embodiments.

[0030] Embodiments of the present patent are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are only used for explaining the patent, and should not be construed as limiting the patent.

[0031] refer to figure 1 , a semantic clustering method specifically includes the following steps:

[0032] S1: data preprocessing;

[0033] S2: Statistical representation of deep network representation;

[0034] S3: Perform word segmentation representation;

[0035] S4: Similarity calculation;

[0036] S5: Convert to adjacency matrix;

[0037] S6: build graph;

[0038] S7: Calculate the connected domain;

[003...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of data mining, particularly relates to a semantic clustering method, and provides the following scheme that the semantic clustering method specifically comprises the following steps of S1, data preprocessing; s2, performing deep network characterization statistical characterization; s3, carrying out word segmentation representation; s4, calculating the similarity; s5, converting into an adjacent matrix; s6, constructing a graph; s7, calculating a connected domain; and S8, result clustering: in the step S1, many information which interferes with or is useless to semantics exists in dialogue data, such as noise of Emoji expressions, special custom symbols, websites, dialogue references and the like, the noise is cleaned through regular expressions, and dialogues with short dialogue data length and insufficient semantics after cleaning are filtered out. According to the method, on the basis of combined representation of deep learning and a TFIDF statistical method, similarity of the combined representation is converted into a graph adjacency matrix to construct a graph, and the clustering purpose is achieved by calculating a connected domain of the graph.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to a semantic clustering method. Background technique [0002] Clustering is a commonly used analysis method in machine learning and data mining. The traditional clustering idea is to extract features from data, use these features to represent corresponding data, and then quantify the similarity and correlation between features according to metrics. characteristics, and group the same and similar features into one category, so as to achieve the purpose of data clustering. Commonly used metrics include Euclidean distance, cosine similarity, etc. Commonly used features for semantic clustering include TFIDF, TopicModel, and text representation based on deep learning, combined with clustering algorithms to achieve the goal. Commonly used clustering algorithms include KMeans, hierarchical clustering, spectral clustering, etc., but these methods have the following shortcomings: [...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06K9/62G06F40/216G06F40/30
CPCG06F16/35G06F40/30G06F40/216G06F18/2321G06F18/22
Inventor 赵继帆吉庆琳
Owner 北京尘锋信息技术有限公司