Text clustering method and device, and computing device

A technology of text clustering and computing equipment, applied in the computer field, can solve the problem of low accuracy, and achieve the effect of improving the accuracy and speed of clustering

Active Publication Date: 2016-05-11
HUAWEI CLOUD COMPUTING TECH CO LTD
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, some types of text, such as logs, contain content that varies with input parameters and output parameters, so clustering these texts according to their content is not very accurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text clustering method and device, and computing device
  • Text clustering method and device, and computing device
  • Text clustering method and device, and computing device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

[0023] Throughout this specification, the term "borderless language" refers to a language without punctuation marks or spaces used to delimit boundaries between characters, and common borderless languages ​​include Chinese, Japanese, and the like. Correspondingly, a bordered language refers to a language that has punctuation marks or spaces between characters to demarcate boundaries. The most common bordered languages ​​include English.

[0024] Throughout this specification, the term "clustering" refers to the process of classifying objects into different clusters according to the characteristics of different objects. Each cluster contains multiple objects with certain commonality or high similarity.

[0025] Throughout this specification, the term "regular expression" refers to a series of character ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a text clustering method, which comprises the following steps: after equipment used for text clustering obtains a text to be clustered, replacing digits in the text to be clustered with first identifiers, combining the adjacent first identifiers in the text to be clustered, obtaining a preprocessing text of the text to be clustered, and clustering the preprocessing text of the text to be clustered. The text to be clustered is preprocessed to extract the format of the text to be clustered, the text to be clustered is clustered according to the format of the text to be clustered, and text clustering precision is improved.

Description

technical field [0001] The invention relates to the technical field of computers, in particular to a text clustering method, a text clustering device and a computing device for text clustering. Background technique [0002] When there are a large number of texts, it is often necessary to cluster these texts, that is, to classify a large number of texts into a certain number of clusters (English: cluster), so as to facilitate subsequent processing of these texts. [0003] The clustering process of text is also the process of gathering similar texts together. In the prior art, the similarity between texts is often calculated according to the content contained in the texts, and generally multiple texts containing more identical content are considered to have a higher degree of similarity. [0004] However, some types of text, such as logs, contain content that varies with input parameters and output parameters, so clustering these texts based on their content is less accurate....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/00G06F16/355G06F16/353
Inventor 胡斐然王楠楠
Owner HUAWEI CLOUD COMPUTING TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products