A microblog information priority collection method based on multiple strategies

A technology with information priority and collection method, applied in digital data information retrieval, special data processing applications, website content management, etc., can solve the problems of wasting time, a large number of microblogs, and the inability to collect hot bloggers in time.

Active Publication Date: 2019-04-23
BEIJING UNIV OF TECH
View PDF3 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Finally, there are a large number of potential zombie accounts or marketing accounts among Weibo bloggers, and the Weibo posts are not only insufficient in informa

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A microblog information priority collection method based on multiple strategies
  • A microblog information priority collection method based on multiple strategies
  • A microblog information priority collection method based on multiple strategies

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0072] The specific implementation of the present invention will be further described in detail below in conjunction with the diagrams and examples. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0073] The method that the present invention proposes is to realize by following steps successively:

[0074] Step (1) Spam blogger detection

[0075] Through the blogger collector, 782632 bloggers to be collected are obtained, and the collection of bloggers to be collected is recorded as U={u 1 , u 2 ,...,u n}, where n is 782632.

[0076] Step (1.1) Construct spam microblog detection model

[0077] Step (1.1.1) constructs the training data set, as follows:

[0078] Use crawlers to crawl and manually label a set of Weibo blog post data: G=[(x 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n )], where n represents the total number of microblogs, x i Represents the i-th microblog, where y i = 0 mea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a microblog information priority collection method based on multiple strategies, which can timely and effectively obtain the information of a blogger by constructing a multi-strategy priority collection method under the condition of limited collection capability. The method comprises the steps of firstly, constructing a classification model to screen bloggers, removing junk bloggers, and dividing the remaining bloggers into three categories according to the number of microblogs and the number of fans; secondly, aiming at different types, constructing different acquisition strategies; clustering the blog sending time of the large V bloggers, and extracting the optimal collection time of the large V bloggers; and through the microblog statistics of the bloggers, training a regression model, predicting the activity values of the bloggers, and sorting the bloggers according to the activity values; and finally, designing a multi-strategy microblog priority acquisition method by integrating the three types of acquisition strategies, and maintaining the timeliness of the acquisition strategies by periodically updating the acquisition queue. Experiments show that the hotspot microblog information can be effectively obtained in time, and the collection number is greatly increased.

Description

technical field [0001] The invention belongs to the field of text information processing, and specifically designs a method for preferentially collecting microblog information based on multiple strategies. Background technique [0002] Weibo has become one of the most important information exchange platforms in China. Daily news at home and abroad, celebrity activities, and interesting events in life will all become topics of discussion for people, so a large number of related information will be updated every day. information, these are often referred to as hotspot information. By analyzing the hotspot information on Weibo, we can effectively obtain the hotspots that netizens pay attention to. Web crawlers can effectively collect the page information of Weibo bloggers, so as to obtain hotspot information in a timely manner. However, with the huge increase of microblog bloggers and the limited collection capacity, it is difficult to collect microblog information in real ti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/958G06F16/35G06F16/9535
Inventor 刘磊陈浩孙应红吴爽侯良文李静
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products