Microblog event abstract extracting method based on multiple storylines

An extraction method and story line technology, applied in file management systems, special data processing applications, instruments, etc., can solve problems such as inability to describe the development and evolution of events, and achieve the effect of reducing complexity

Active Publication Date: 2016-07-20
DALIAN UNIV OF TECH
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, a relatively complex event will contain many different aspects, and a timeline will mix multiple aspects o

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microblog event abstract extracting method based on multiple storylines
  • Microblog event abstract extracting method based on multiple storylines
  • Microblog event abstract extracting method based on multiple storylines

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0066] In order to illustrate the working process of the method in detail, the specific process of the present invention will be introduced below in combination with specific examples.

[0067] Step 1. Microblog corpus preprocessing

[0068] There are 43,152 microblog event corpus about the Qingdao explosion, and each microblog contains the sending time of the microblog. Use the public tokenizer to segment the corpus and remove punctuation marks. Microblogs with less than 5 words after word segmentation are removed. For the remaining microblogs in the corpus, obtain their time information and number the microblogs. Information such as Weibo number, Weibo content, and Weibo release time are stored in the dictionary database. Afterwards, the content of the microblog and the publishing time of the microblog can be quickly obtained through the microblog number.

[0069] Step 2. Weibo vectorization

[0070]Use word embedding technology to vectorize the words after word segment...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A microblog event abstract extracting method based on multiple storylines comprises steps as follows: S1, microblog corpus preprocessing; S2, microblog vectorization; S3, primary extraction of microblog event storylines; S4, merging of storylines; S5, reconstruction of the storylines; S6, displaying of abstract results. The microblog is vectorized with a word embedding technology, a similarity matched improvement condition random field method of microblogs is obtained by aid of the vector cosine values, and construction and merging of the storylines are realized. A microblog event abstract containing multiple storylines can be generated for one microblog event, and node content in the storylines is the most representative microblog in the time period. Multiple aspects of the event are depicted through multiple storylines, so that a user can better efficiently and better comprehensively know a certain microblog event. In order to evaluate advantages and disadvantages of the abstracts, precision P@N in the position n is selected as the measurement standard. The precision is basically kept to be higher than 0.6, and the method is remarkably superior to an existing method.

Description

technical field [0001] The invention relates to the fields of data mining and natural language processing, in particular to a method for extracting microblog event summaries based on multiple story lines. Background technique [0002] With the rapid development of the Internet, Weibo has become a typical application in the popular social network. Weibo allows users to publish short messages (usually with a maximum length of 140 Chinese or English characters) at any time and any place. Weibo has almost become a real-time publishing application. Some events in life will cause extensive discussions among Weibo users and generate a large number of Weibo related to the event. Such events are called Weibo events. Microblog sites often collect the keywords of these microblogs and display them in the list of popular microblogs. But these microblog keywords can not make microblog users have a comprehensive understanding of these microblog events, especially for those microblog use...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06Q50/00
CPCG06F16/93G06Q50/01
Inventor 林鸿飞刘龙飞
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products