Microblog user interest recognizing method based on text mining

A technology of user interest and identification method, which is applied in the direction of unstructured text data retrieval, text database clustering/classification, special data processing application, etc. It can solve the problem of effective new word addition, low accuracy, and no reasonable proposal, etc. question

Inactive Publication Date: 2014-07-23
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF2 Cites 88 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] 1. The short text feature extraction of Weibo did not add effective new words, resulting in a high rate of missed detection results
[0011] 2. The existing technology generally analyzes massive microblog texts or introduces microblog functions to mine hot topics, events, etc., but does not reasonably propose an analysis method for a single user's microblog texts to correlate user interests, and User's Weibo text is an important source of information for identifying user interests
[0012] 3. Due to the unstructured and sparse characteristics of Weibo short texts, the accuracy of feature extraction is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microblog user interest recognizing method based on text mining
  • Microblog user interest recognizing method based on text mining
  • Microblog user interest recognizing method based on text mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0078] refer to figure 1 , 2 , 9, a method for identifying interests of microblog users based on text mining, the steps are:

[0079] (1) Collect the latest topical microblog text data of the microblog style collection and the microblog text data of designated users.

[0080] Such as image 3 As shown, the data collection is to grab two parts of data from the Internet, the topic microblog text data and the specified user microblog text data. The topic microblog text data is to obtain a URL from the URL list, obtain the communication in the URL, crawl the webpage after obtaining the communication, analyze the webpage after the webpage is crawled, and then build an index after the webpage analysis; in addition, according to the webpage analysis, The links in the webpage are extracted for the next webpage crawling, and a URL is extracted from ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a microblog user interest recognizing method based on text mining, and belongs to the field of text mining and natural language processing. The method includes the steps of collecting the newest topical microblog text data of a microblog text set and microblog text data of a designated user, standardizing the collected microblog text data, recognizing the newest microblog words and renewing a new word dictionary for the standardized topical microblog text data through the microblog new word recognition method, conducting Chinese character word separation on the standardized microblog text data of the designated user through the new word dictionary word separation method to achieve text vector expression, clustering the microblog text data, expressed through text vectors, of the designated user, recombining original microblog text data, extracting new text set features through a topic model, presetting topic dictionaries, calculating the weight of each topic dictionary based on the new text set features to obtain the final topic, and enabling the final topic to serve as the microblog user interest recognition, thereby improving accuracy of feature extraction.

Description

technical field [0001] A text mining-based microblog user interest identification method, text clustering - use the improved K-Means algorithm for short text clustering, and topic model - use the method of combining VSM and LDA model to extract text feature words, It belongs to the fields of text mining, natural language processing, and machine learning. Background technique [0002] Text feature extraction is a key link in text mining. According to the extracted features, the similarity between texts is calculated and applied to text classification, clustering, etc. The wide application of microblogs has made text mining technology widely used in microblog texts. By analyzing microblog texts, current hot topics and event tracking can be mined. [0003] The topic model has an ideal effect when applied to text feature extraction. It regards the text as a set of topics subject to a certain probability distribution, and each topic is composed of terms with a certain probabilit...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 屈鸿王晓斌李浩方正袁建
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products