A Multi-Strategy Based Microblog Information Priority Collection Method

A technology with information priority and collection method, applied in digital data information retrieval, website content management, unstructured text data retrieval, etc., can solve the problem of wasting time, bloggers hot information cannot be collected in time, and the amount of Weibo information is insufficient, etc. problem, to achieve the effect of increasing the number of collections

Active Publication Date: 2021-04-27
BEIJING UNIV OF TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Finally, there are a large number of potential zombie accounts or marketing accounts among Weibo bloggers, and the Weibo posts are not only insufficient in information, but also have a large number of Weibo, which wastes a lot of time in the collection process, causing normal bloggers hotspot information cannot be collected in time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Multi-Strategy Based Microblog Information Priority Collection Method
  • A Multi-Strategy Based Microblog Information Priority Collection Method
  • A Multi-Strategy Based Microblog Information Priority Collection Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0072] The specific implementation of the present invention will be further described in detail below in conjunction with the diagrams and examples. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0073] The method that the present invention proposes is to realize by following steps successively:

[0074] Step (1) Spam blogger detection

[0075] Through the blogger collector, 782632 bloggers to be collected are obtained, and the collection of bloggers to be collected is recorded as U={u 1 ,u 2 ,...,u n}, where n is 782632.

[0076] Step (1.1) Construct spam microblog detection model

[0077] Step (1.1.1) constructs the training data set, as follows:

[0078] Use crawlers to crawl and manually label a set of Weibo blog post data: G=[(x 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n )], where n represents the total number of microblogs, x i Represents the i-th microblog, where y i = 0 mean...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a multi-strategy-based microblog information priority collection method. In the case of limited collection capability, by constructing a multi-strategy priority collection method, blogger information can be obtained in a timely and effective manner. First of all, the bloggers are screened by building a classification model, spam bloggers are eliminated, and the remaining bloggers are divided into three categories according to the number of microblogs and the number of fans. Second, construct different collection strategies for different categories. By clustering the posting time of the big V bloggers, the best collection time of the big V bloggers is extracted; through the microblog statistics of the bloggers, the regression model is trained and the blogger’s activity value is predicted. According to the activity value, the Bloggers sorted. Finally, a multi-strategy microblog priority collection method is designed by synthesizing the three types of collection strategies, and the timeliness of the collection strategy is maintained by regularly updating the collection queue. Experiments show that the present invention can not only acquire hot microblog information in a timely and effective manner, but also greatly increase the number of acquisitions.

Description

technical field [0001] The invention belongs to the field of text information processing, and specifically designs a method for preferentially collecting microblog information based on multiple strategies. Background technique [0002] Weibo has become one of the most important information exchange platforms in China. Daily news at home and abroad, celebrity activities, and interesting events in life will all become topics of discussion for people, so a large number of related information will be updated every day. information, these are often referred to as hotspot information. By analyzing the hotspot information on Weibo, we can effectively obtain the hotspots that netizens pay attention to. Web crawlers can effectively collect the page information of Weibo bloggers, so as to obtain hotspot information in a timely manner. However, with the huge increase of microblog bloggers and the limited collection capacity, it is difficult to collect microblog information in real ti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/958G06F16/35G06F16/9535
Inventor 刘磊陈浩孙应红吴爽侯良文李静
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products