Unlock instant, AI-driven research and patent intelligence for your innovation.

Document clustering method, document clustering device and network equipment

A document clustering and clustering algorithm technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of waste of resources and high cost, and achieve the effect of saving computing resources and shortening clustering time.

Active Publication Date: 2015-11-25
珠海豹好玩科技有限公司
View PDF10 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This solution achieves a multiplied reduction in time by distributing a task to multiple computers connected through the network for parallel computing, but this method requires the use of multiple computers connected through the network, so it is a waste of resources and the cost is relatively high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document clustering method, document clustering device and network equipment
  • Document clustering method, document clustering device and network equipment
  • Document clustering method, document clustering device and network equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0123] Taking documents as web pages as an example, the document clustering method of the present invention will be described in detail below. Specifically, the document clustering method of this embodiment includes the following steps:

[0124] 301) Segment the web page, that is, split the text in the web page into multiple words, filter the split words according to the split result, and remove some interfering words (such as "my", "this", etc.) , using the words after removing noise words as the initial feature words of the webpage, and storing the initial feature words corresponding to each webpage into the database;

[0125] Specifically, when removing noise words, a noise lexicon can be maintained in the background in advance, and the words after splitting are compared with words in the noise lexicon. words, they are removed as noise words.

[0126] 302) Read the word segmentation results of each web page, that is, read the initial feature words of each web page, and fi...

Embodiment 2

[0152]The document clustering method of the present invention can be applied in the function of favorites (which may be local favorites and web favorites), and in the application of recommending other webpages to the user according to the webpages saved by the user.

[0153] Specifically, this embodiment includes the following steps:

[0154] 401) After a user bookmarks a web page, the crawler program can be used to grab the source code of the web page, and then split all the content of the web page into multiple words, record the number of occurrences of each word, and then store the information in the database;

[0155] 402) After satisfying the preset judgment conditions, use the document clustering method of Embodiment 1 to cluster the webpages collected by the user, and store the clustering results in the database; for example, set the time node for document clustering in advance , when each preset time node is reached, use the document clustering method of Embodiment 1 t...

Embodiment 3

[0162] The document clustering method of the present invention can be applied in the advertisement push function to achieve the purpose of accurate advertisement delivery.

[0163] Specifically, this embodiment includes the following steps:

[0164] 501) When the user browses the webpage, the crawler program can be used to grab the source code of the webpage the user browses, and then split all the content of the webpage browsed by the user into multiple words, and record the number of occurrences of each word, and then store the information in database;

[0165] 502) Using the document clustering method in Embodiment 1 to cluster the webpages browsed by the user, store the clustering results in the database, and save the feature words of all webpages in the cluster;

[0166] 503) For each advertisement, find out the characteristic words of each advertisement, specifically, step 301-302 in Embodiment 1 can be used to find out the characteristic words of each advertisement;

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a document clustering method, a document clustering device and network equipment, pertaining to technical fields of data mining, document clustering and web page clustering. The method comprises following steps of: step a, dividing documents to be clustered into multiple groups; step b, clustering one group of documents by clustering algorithm and acquiring an initial cluster corresponding to a frequent item set; step c, acquiring feature words of another group of the rest documents and clustering documents having feature words incorporated in frequent item sets to initial clusters corresponding to frequent item sets based on feature words and frequent item sets corresponding to initial clusters and clustering documents having feature words not incorporated in frequent item sets in order to acquire new corresponding initial clusters corresponding to frequent item sets; step d, determining the presence of document groups which are not clustered; going to step c if present, and storing multiple initial clusters and frequent item sets corresponding to initial clusters by clustering if not present. The technical scheme of the document clustering method is capable of increasing document clustering speed and saving computing resource.

Description

technical field [0001] The invention relates to the technical fields of data mining, document clustering and webpage clustering, in particular to a document clustering method and device, and network equipment. Background technique [0002] At present, the search engine technology has become mature, and users can easily obtain the webpage content they want to find through the search engine. In order to facilitate users to browse target webpages, webpage-based recommendation technology appears, that is, recommend some related webpages to users according to some webpages that users pay attention to, which saves users the trouble of searching for webpages. In the prior art, webpage-based recommendation techniques are all based on webpage clustering to obtain recommended related webpages, that is, firstly cluster webpages within a certain range, and then select from the clusters to which the user's favorite webpages belong. One or more web pages are recommended to the user. Web...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 万振张凯达
Owner 珠海豹好玩科技有限公司