A clustering method of network behavior habits based on k-means and lda two-way verification

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A two-way verification and behavioral technology, applied in text database clustering/classification, character and pattern recognition, unstructured text data retrieval, etc., can solve problems such as poor efficiency and bad answers

Active Publication Date: 2019-06-11

HUAIYIN INSTITUTE OF TECHNOLOGY

View PDF13 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In some special cases, heuristic algorithms will get bad answers or be extremely inefficient, but the data structures that cause those special cases may never appear in the real world

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0065] The technical scheme of the present invention is described in detail below in conjunction with accompanying drawing:

[0066] as attached figure 1 , simulated annealing algorithm main flow step A1 to step A26:

[0067] Step A1: Set all personnel-label-frequency set as PERSONLABELFREQ={(PERSON p1 , LABEL p1 ,FREQ p1 ), (PERSON p2 , LABEL p2 , FREQ p2 ), …, (PERSON pa , LABEL pa , FREQ pa )}, where PERSON p1 , PERSON p2 , …, PERSON pa The unique identifier of the representative, LABEL p1 , LABEL p2 , …, LABEL pa Represents the overall attribute of the online browsing content of a person. A unique identifier of a person can correspond to multiple attributes. FREQ p1 , FREQ p2 ,…, FREQ pa Represents the weight of the overall attribute of a person's online browsing content. Set the person's online browsing record-personnel-keyword set as RECORDIDPERSONKEYWORD={(RECORDID r1 , PERSON r1 , KEYWORD r1 ), (RECORDID r2 ,PERSON r2 , KEYWORD r2 ), …, (RECORDI...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a network behavior habit clustering method based on two-way verification of K-means and LDA. The invention utilizes the web page attributes, keywords and frequencies in personnel online records, combined with the K-means algorithm and the LDA document topic extraction model and annealing algorithm, first perform K-means algorithm clustering and LDA document topic extraction model generation on all personnel-tag-frequency sets, personnel browsing records-personnel-keyword sets, store and calculate intermediate results, and then use the annealing algorithm to K-means Means and LDA carry out two-way verification to calculate the global best topic-category label sequence, based on which the results of network behavior habit clustering are optimized. K-means and LDA two-way verification improves the sensitivity to person-category labels, and the annealing algorithm It can improve the efficiency of optimizing the clustering results, and then improve the clustering accuracy.

Description

technical field [0001] The invention belongs to the field of clustering analysis and optimization algorithms, in particular to a network behavior habit clustering method based on K-means and LatentDirichlet Allocation (LDA) two-way verification, which is used to optimize clustering results, thereby improving clustering accuracy, And to increase the use value of personnel online record information. Background technique [0002] Mastering the clustering method of network behavior habit data has an important role and significance for researchers' surfing habits. With the continuous popularization of the Internet, more and more people choose to obtain interesting information through the Internet. There is a huge amount of information on the content that people browse on the Internet. It is not only inefficient but also not accurate to rely solely on manual analysis of these data. The efficiency and accuracy of analysis can be improved through cluster analysis, coupled with two-...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/35G06F16/31G06K9/62

CPCG06F16/313G06F16/35G06F18/23213

Inventor 朱全银辛诚李翔许康潘舒新孙青怡周泓严云洋胡荣林冯万利王留洋王海云袁媛唐海波

Owner HUAIYIN INSTITUTE OF TECHNOLOGY

A clustering method of network behavior habits based on k-means and lda two-way verification

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology