Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for clustering Chinese texts for safety management of network content

A security management and text clustering technology, applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of not reaching the global optimum, achieve fast iteration speed, reduce complexity, accuracy and recall The effect of rate increase

Inactive Publication Date: 2012-04-25
军工思波信息科技产业有限公司
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Model-based methods attempt to optimize the fit between given data and a mathematical model, assuming a model for each cluster, and finding the best fit of the data to the given model. In practice, it converges quickly, but may not reach the global optimum

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for clustering Chinese texts for safety management of network content
  • Method for clustering Chinese texts for safety management of network content
  • Method for clustering Chinese texts for safety management of network content

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] According to above scheme, design a kind of computer system that adopts this clustering method, this computer system comprises:

[0029] A text collection and input device is used to input text information into the system and number the text;

[0030] Text library, used to store words, features, and vectorization results of text;

[0031] A text word segmentation device for expressing text sentences into words;

[0032] The text feature extraction and vectorization device is used to further vectorize the text expressed as words for clustering.

[0033] The text clustering device is used to cluster the vectorized texts, and finally generate a text clustering result according to the one-to-one correspondence of the text database.

[0034] The text collection and entry device is connected to the text word segmentation device, the text word segmentation device is connected to the text library and the text feature extraction polar vectorization device, the text library...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a brand new method for clustering Chinese texts based on network content analysis. A clustering number and an initial central point for clustering are automatically confirmed according to a density-based clustering concept, and meanwhile, a convergence criterion for the clustering number is optimized and a complex rate of a clustering algorithm is reduced, thereby being capable of confirming the clustering number and the initial central point on a whole sample base, ensuring the clustering comprehensiveness, avoiding influence of excessive personal factors on a clustering result, and meanwhile, acquiring higher clustering accuracy and efficiency.

Description

technical field [0001] The invention relates to a Chinese text clustering method for network content security management. Background technique [0002] In the application field of network content security management, the focus includes text classification and text clustering technology research. The purpose of these two types of technologies is to group large-scale text data objects into multiple categories. Among them, text clustering is an unsupervised machine learning method. The technical implementation process does not require the participation of more human factors such as preset document classification and manual labeling of categories. It is the main technical solution for effectively organizing, summarizing and navigating massive text information. The method has become an important research topic in the direction of mass text information fusion, and has significant technical support and practical application value for important application fields of information cont...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 杨更
Owner 军工思波信息科技产业有限公司
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More