Document clustering method, document clustering device and network equipment
A document clustering and clustering algorithm technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of waste of resources and high cost, and achieve the effect of saving computing resources and shortening clustering time.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0123] Taking documents as web pages as an example, the document clustering method of the present invention will be described in detail below. Specifically, the document clustering method of this embodiment includes the following steps:
[0124] 301) Segment the web page, that is, split the text in the web page into multiple words, filter the split words according to the split result, and remove some interfering words (such as "my", "this", etc.) , using the words after removing noise words as the initial feature words of the webpage, and storing the initial feature words corresponding to each webpage into the database;
[0125] Specifically, when removing noise words, a noise lexicon can be maintained in the background in advance, and the words after splitting are compared with words in the noise lexicon. words, they are removed as noise words.
[0126] 302) Read the word segmentation results of each web page, that is, read the initial feature words of each web page, and fi...
Embodiment 2
[0152]The document clustering method of the present invention can be applied in the function of favorites (which may be local favorites and web favorites), and in the application of recommending other webpages to the user according to the webpages saved by the user.
[0153] Specifically, this embodiment includes the following steps:
[0154] 401) After a user bookmarks a web page, the crawler program can be used to grab the source code of the web page, and then split all the content of the web page into multiple words, record the number of occurrences of each word, and then store the information in the database;
[0155] 402) After satisfying the preset judgment conditions, use the document clustering method of Embodiment 1 to cluster the webpages collected by the user, and store the clustering results in the database; for example, set the time node for document clustering in advance , when each preset time node is reached, use the document clustering method of Embodiment 1 t...
Embodiment 3
[0162] The document clustering method of the present invention can be applied in the advertisement push function to achieve the purpose of accurate advertisement delivery.
[0163] Specifically, this embodiment includes the following steps:
[0164] 501) When the user browses the webpage, the crawler program can be used to grab the source code of the webpage the user browses, and then split all the content of the webpage browsed by the user into multiple words, and record the number of occurrences of each word, and then store the information in database;
[0165] 502) Using the document clustering method in Embodiment 1 to cluster the webpages browsed by the user, store the clustering results in the database, and save the feature words of all webpages in the cluster;
[0166] 503) For each advertisement, find out the characteristic words of each advertisement, specifically, step 301-302 in Embodiment 1 can be used to find out the characteristic words of each advertisement;
...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 