Document clustering method and device

A document and clustering technology, applied in the Internet field, can solve the problems of the impact of clustering effect, low intelligence, and difficulty in clustering operations, so as to avoid manual participation and improve accuracy and intelligence.

Active Publication Date: 2013-12-18
NHORIZON INNOVATION BEIJING SOFTWARE LMT
View PDF1 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The methods for clustering documents in the prior art are less intelligent and need to rely on manual participation, that is, manually input clustering values ​​in advance, and determine the clustering of documents into several categories before clustering can begin. For example, manual input will document Clustered into 3 or 4 categories
When the clustering value entered manually is inaccurate, the clustering effect will be greatly affected. Furthermore, when the number of documents is massive, a clustering value cannot be manually given, and the clustering operation is difficult to perform

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document clustering method and device
  • Document clustering method and device
  • Document clustering method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to make the object, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.

[0028] The present invention will now be described in further detail with reference to the accompanying drawings. This invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. These embodiments are provided as examples only in order to provide a complete understanding of the present invention to those skilled in the art.

[0029] figure 1 is a flowchart of a method for clustering documents according to an embodiment of the present invention. Such as figure 1 As shown, step 102 to step 110 are included.

[0030] In step...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a document clustering method and device. The method includes the steps of A, vectorizing each document to allow each vectorized document to correspond to a document coordinate in a multi-dimensional space; B, clustering the documents into two clusters and acquiring geometric center of each cluster in the multi-dimensional space; C, calculating average radius of each cluster, clustering documents corresponding to the document coordinates in the two clusters into a inseparable category if the average radius satisfies a preset condition, and corresponding the two clusters into two separable categories if the average radius does not satisfy the preset condition; D, executing step B and C in each separable category; E, terminating clustering when each document belongs to the inseparable category; wherein the average radius the average value of the distance from all document coordinates to the geometric centers. By the method, document clustering accuracy and intelligence are increased.

Description

technical field [0001] The invention relates to the Internet field, in particular to a method and device for clustering documents. Background technique [0002] In the environment of the rapid increase of Internet information, how to effectively and accurately obtain the required information has become a technical problem that needs to be solved urgently. Among them, how to cluster network documents to obtain multiple document categories is particularly critical. [0003] The methods for clustering documents in the prior art are less intelligent and need to rely on manual participation, that is, manually input clustering values ​​in advance, and determine the clustering of documents into several categories before clustering can begin. For example, manual input will document Clustered into 3 or 4 categories. When the clustering value entered manually is inaccurate, the clustering effect will be greatly affected. Furthermore, when the number of documents is massive, it is im...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/93
Inventor 黄平春
Owner NHORIZON INNOVATION BEIJING SOFTWARE LMT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products