Network text data detection method based on fuzzy cluster

A network text and fuzzy clustering technology, which is applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as slow execution speed, insufficient mining depth, and low clustering accuracy

Inactive Publication Date: 2010-06-30
SHAANXI DEVTEK TECH DEV
View PDF0 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to overcome the deficiencies of the prior art, such as insufficient depth of mining, slow execution speed, and low clustering accuracy, the present invention provides a network text data detection method based on fuzzy clustering, which can effectively improve the text classification in network security audits. Accuracy and reliability of the network content, thereby improving the acquisition efficiency of the target text in the network content, and realizing the intelligent retrieval of the network content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network text data detection method based on fuzzy cluster
  • Network text data detection method based on fuzzy cluster
  • Network text data detection method based on fuzzy cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The invention has the functions of four parts: network content preprocessing, network content feature extraction, fuzzy clustering and clustering result output. Among them, the network content preprocessing completes the dimensionality reduction processing of the multi-dimensional network content document feature vector, and performs feature extraction; the network content feature extraction completes the processing of the network content in the captured network stream, including the establishment of the network content document, and the feature vector of the document. Representation; Fuzzy clustering is the core of the present invention, adopts the initial clustering center based on density function selection, average information entropy is used as the standard of judging the number of clusters, sets the initial number of clusters, and revises the number of clusters in the iterative process of the algorithm, when The number of clusters when the average information entro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a network text data detection method based on fuzzy cluster. The method comprises the following steps: firstly preconditioning the extracted network content; extracting features of preconditioned network content which is needed to cluster, clustering, setting initial clustering number, wherein during the clustering process, a clustering number is matched with a membership matrix, each membership matrix contains an average information entropy, the average information entropy selects initial clustering center according to density function, the clustering number is modified in algorithm iteration process, and when the average information entropy is the minimum value, the corresponding clustering number is an optimal clustering number; and finally returning the clustering result to the user. The invention has efficient intelligent clustering effect and can adjust the clustering precision while considering the clustering speed according to different applications.

Description

technical field [0001] The invention relates to a data detection method, in particular to a network text data detection method. Background technique [0002] About 80% of the information in the network is in the form of text, so the research on text data mining technology has become an increasingly popular and very important research topic in data mining. Web content clustering is a fully automatic processing process for grouping similar texts in web content into a group, and it is an unsupervised learning process. The purpose of clustering is to distinguish and classify physical or abstract objects according to the similarity between objects. According to the form of data division, the clustering method can be divided into: when there is a clear boundary in the division, it is called hard division, that is, the data is divided into a certain class; the division without clear boundaries is called fuzzy division, that is, the given data is divided into The form of the degre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 赵安军王磊王礼杨宗良
Owner SHAANXI DEVTEK TECH DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products