Keyword calculation method based on document clustering

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of document clustering and calculation method, applied in the direction of text database clustering/classification, calculation, unstructured text data retrieval, etc., can solve the problem of no technical solution and so on

Inactive Publication Date: 2015-12-16

HAINAN UNIVERSITY

View PDF5 Cites 25 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, there is no specific technical plan to implement how to integrate a series of technologies, further refine the grouping of document collections, and mine representative keywords on the groupings

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0028] The technology involved in the present invention and its notes:

[0029] 1. Text clustering:

[0030] Text clustering (TextClustering) document clustering is mainly based on the well-known clustering assumption: documents of the same type have a greater similarity, while documents of different types have a smaller similarity. Text clustering can divide a relatively large collection of documents into several subcategories, so that similar documents can be organized in the same category. As an unsupervised machine learning method, clustering has certain flexibility and high automatic processing ability because it does not require a training process and manual labeling of documents in advance, and has become an effective method for organizing text information. , summary, and navigation.

[0031] The applications of text clustering technology mainly include:

[0032] Perform clustering operations on documents that users are interested in (such as news or products that us...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a keyword calculation method based on document clustering. The method comprises the following steps of: (1) obtaining a text document set; (2) performing word entry segmentation on all document contents in the document set by a word segmentation algorithm; (3) building a document vector; (4) calculating the document vector by the TF-IDF (Term Frequency-Inverse Document Frequency); (5) performing dimension compression on the document vector; (6) performing document clustering calculation; and (7) calculating representative keywords of each group of documents. The keyword calculation method has the beneficial effects that complete feasible calculation steps are provided; the document vector dimension compression is innovatively supported; and the calculation efficiency is high. When the dimension compression of the document vector is executed, a concise and efficient novel method different from any one technology in the prior art is adopted. The keyword calculation method belongs to a first technical scheme capable of calculating the representative keywords from the document set by connecting different links through feasible calculation steps.

Description

technical field [0001] The invention belongs to the field of computer data mining, and in particular relates to a method for calculating keywords based on document clustering. Background technique [0002] In the Internet industry, users often use keyword group searches to find articles that represent their interests. In the prior art, a given document collection is regarded as a complete and indivisible whole, and representative keywords are calculated on it. Typical applications include the personalized reading system of news websites, which can calculate a set of keywords representing user interests based on the news browsed by users, and recommend new articles based on this set of keywords. But in fact, a user's interest often includes multiple aspects and is scattered. Therefore, the corresponding document collection can be divided into several groups of documents, each group corresponds to a point of interest of the user, and the correlation between documents in each...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

CPCG06F16/35

Inventor周辉段玉聪叶春杨王磊

OwnerHAINAN UNIVERSITY

Keyword calculation method based on document clustering

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology