Clustering method for network behavior habits based on K-means and LDA (Latent Dirichlet Allocation) two-way authentication

A technology of two-way verification and clustering methods, applied in text database clustering/classification, character and pattern recognition, instruments, etc., can solve problems such as poor efficiency and very bad answers

Active Publication Date: 2016-12-07
HUAIYIN INSTITUTE OF TECHNOLOGY
View PDF13 Cites 39 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In some special cases, heuristic algorithms will get bad answers or be extremely inefficien

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering method for network behavior habits based on K-means and LDA (Latent Dirichlet Allocation) two-way authentication
  • Clustering method for network behavior habits based on K-means and LDA (Latent Dirichlet Allocation) two-way authentication

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] The technical solution of the present invention will be described in detail below in conjunction with the drawings:

[0066] As attached figure 1 , Step A1 to Step A26 of the main process of the simulated annealing algorithm:

[0067] Step A1: Set the set of all personnel-label-frequency as PERSONLABELFREQ={(PERSON p1 , LABEL p1 ,FREQ p1 ), (PERSON p2 , LABEL p2 , FREQ p2 ), …, (PERSON pa , LABEL pa , FREQ pa )}, where PERSON p1 , PERSON p2 , …, PERSON pa The unique identification of the representative, LABEL p1 , LABEL p2 , …, LABEL pa Represents the overall attributes of personnel browsing content on the Internet. A unique identifier of a person can correspond to multiple attributes, FREQ p1 ,FREQ p2 , …,FREQ pa On behalf of the weight of the overall attribute of the personnel browsing content, set the personnel browsing record-person-keyword set as RECORDIDPERSONKEYWORD={(RECORDID r1 , PERSON r1 , KEYWORD r1 ), (RECORDID r2 , PERSON r2 ,KEYWORD r2 ), …, (RECORDID ra ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a clustering method for network behavior habits based on K-means and LDA (Latent Dirichlet Allocation) two-way authentication. According to the clustering method, webpage properties, keywords and frequency in internet browsing records of persons are utilized to combine with a K-means algorithm, an LDA document topic extracting model and an annealing algorithm. The clustering method comprises the following steps: firstly, performing K-means algorithm clustering and LDA document topic extracting model generation on a staff-label-frequency set and a person browsing record-person-keyword set; secondly, storing and calculating an intermediate result, and then performing K-means and LDA two-way authentication by using the annealing algorithm; calculating a global best topic-classification label sequence, and optimizing a network behavior habit clustering result by taking the global best topic-classification label sequence as a reference. By means of the K-means and LDA two-way authentication, the sensitivity to person-classification labels is improved; by using the annealing algorithm, the optimizing efficiency of the clustering result can be improved, and further the clustering accuracy is improved.

Description

Technical field [0001] The invention belongs to the field of clustering analysis and optimization algorithms, and particularly relates to a network behavior habit clustering method based on two-way verification of K-means and LDA, which is used to optimize clustering results, thereby improving clustering accuracy, and increasing The use value of information recorded by people online. Background technique [0002] Mastering the clustering method of network behavior habit data has an important role and significance for researchers' surfing habits. With the continuous popularization of the Internet, more and more people choose to obtain interesting information through the Internet. The amount of information that people browse online is huge. Relying on manual analysis of these data is not only inefficient, but also not accurate. Through cluster analysis, coupled with two-way verification with another clustering method, the efficiency and accuracy of analysis can be improved. Gener...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/313G06F16/35G06F18/23213
Inventor 朱全银辛诚李翔许康潘舒新孙青怡周泓严云洋胡荣林冯万利王留洋王海云袁媛唐海波
Owner HUAIYIN INSTITUTE OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products