Text mining based attribute analysis method for internet media users

A user attribute and text mining technology, which is applied in data mining, semantic analysis, network data retrieval, etc., can solve problems such as unidentifiable and mining of attributes, inaccurate analysis of known attributes, etc.

Active Publication Date: 2015-10-21
CHENGDU YUNDUI MOBILE INFORMATION TECH CO LTD
View PDF3 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The above-mentioned traditional methods have the following defects: a known user sample is required, and then machine learning is performed through the behavior preferences of the user sample, and then the user attrib

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text mining based attribute analysis method for internet media users
  • Text mining based attribute analysis method for internet media users
  • Text mining based attribute analysis method for internet media users

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0064] The Internet media user attribute analysis method based on text mining of the present invention comprises the following steps:

[0065] (1) Text Mining:

[0066] 1.1: Create the label main corpus:

[0067] 1.1.1: Extract article samples, clean the samples, and clean out audio, video, pictures, incomplete articles, garbled characters, and illegal characters;

[0068] 1.1.2: Manual classification according to the tag class library;

[0069] 1.1.3: Perform dynamic clustering and fuzzy clustering on samples at the same time, and set cluster parameters;

[0070] 1.1.4: Perform semantic analysis, cluster feature analysis, modify cluster parameters, and density noise reduction in sequence to obtain the noise value M;

[0071] 1.1.5: Compare the noise value M with the threshold a, if the noise value M is smaller than the threshold a, then go to step 1.1.6, if the noise value M is greater than or equal to the threshold a, then go to step 1.1.3;

[0072] 1.1.6: Then perform m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a text mining based attribute analysis method for internet media users. The method comprises the following steps of: (1) text mining: 1.1: creating a label main corpus; 1.2: creating a feature corpus; and 1.3: updating and maintaining the corpus; and (2) acquiring an attribute set of the internet media users: 2.1: extracting total historical article samples of the internet media users and washing the samples; 2.2: performing processing on the samples and obtaining a noise value; and 2.3: comparing the noise value with a threshold a, and if the noise value is smaller than the threshold a, performing model classification to form the attribute set of the internet media users. By using the method disclosed by the present invention, not only be the basic attributes of the users can analyzed and mined and the application range of identifying the attributes of the users can greatly enlarged, but also the basic attributes of the internet media users can be analyzed and the omnidirectional attributes of the internet media users can be provided with support; and the method not only has a wide commercial application value but also points out a studying direction for a mining algorithm of internet media user tags and the application of a knowledge atlas.

Description

technical field [0001] The invention relates to a method for analyzing Internet media user attributes, in particular to a method for analyzing Internet media user attributes based on text mining. Background technique [0002] At present, the world's Internet has formed a scale, Internet applications are becoming more diversified, and the Internet is more and more profoundly changing people's study, work and lifestyle. In network data analysis, being able to accurately know the habits, needs and other attributes of Internet users is an important prerequisite for accurate content promotion or advertising. At present, the existing technical solutions for identifying media user attributes in the Internet are all based on user article samples. It is necessary to first collect a full amount of historical samples of users, sort out the data of sample users, sort out the sample database, and classify the sample database into tag corpus, for example, A corpus represents content such...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/9535G06F16/951G06F40/30G06F16/435G06Q50/01H04L67/306G06F16/215G06F2216/03G06F18/23
Inventor 王飞张国鸿张何君
Owner CHENGDU YUNDUI MOBILE INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products