Method and device for text clustering and electronic device

A text clustering and clustering technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems that cannot be separated from word segmentation, affect clustering speed, accuracy and recall, and achieve accuracy And the effect of high recall rate, fast speed and simple steps

Inactive Publication Date: 2017-06-13
HUBEI UNIV OF ARTS & SCI
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For languages ​​such as Chinese and Uyghur, the support of word segmentation is often inseparable, and the corresponding accur

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for text clustering and electronic device
  • Method and device for text clustering and electronic device
  • Method and device for text clustering and electronic device

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0046] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. The components of the embodiments of the present invention generally described and illustrated in the drawings herein may be arranged and designed in various different configurations.

[0047] Therefore, the following detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present invention.

[0048] It shou...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and device for text clustering and an electronic device. The method for text clustering comprises the steps of combining a plurality of original document sets of different themes into a document union set; arranging documents in the document union set in an ascending order, and obtaining an ascending order document union set; sequentially calculating similarity of the first document in the ascending order document union set and all document after the first document; if the similarity is larger than or equal to a first threshold, assigning the document and the first document to a class; if the similarity is smaller than the first threshold, marking the document as a non-classified document; sequentially executing similarity calculation and classification of the first document in the ascending order document union set and all non-classified documents after the first document. According to the method and device for text clustering and an electronic device, operations of word segmentation and feature extraction and the like are avoided, steps are simple, the accuracy rate is high, and the method and device for text clustering and the electronic device have language irrelevance and suitable for text clustering of various languages. Besides, the clustering speed and precision can be flexibly adjusted to meet different actual requirements.

Description

technical field [0001] The present invention relates to the technical field of text mining, in particular to a text clustering method, device and electronic equipment. Background technique [0002] As the name implies, clustering is the process of dividing the entire data set into several groups according to certain characteristics and rules. Elements within each group have high similarity in certain characteristics, while elements between groups have greater similarity in these characteristics. The resulting groups are a cluster, also often referred to as a "cluster". Currently, text clustering methods include partition clustering, hierarchical clustering, density-based clustering, semantic-based clustering, and clustering based on various model theories. [0003] Most of the above clustering methods need word segmentation or feature item support, so feature selection or dimensionality reduction is an important research content. For languages ​​such as Chinese and Uyghur,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/355G06F18/22
Inventor 谷琼王贤明宁彬王毅丁函曹文平吴钊华丽胡春阳屈俊峰
Owner HUBEI UNIV OF ARTS & SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products