Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Microblog rumor detection method

A detection method and rumor technology, applied in natural language data processing, unstructured text data retrieval, instruments, etc., can solve problems such as vector feature difference, distinction, and ignoring context word order features, and achieve the effect of improving accuracy

Active Publication Date: 2020-11-20
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, most of the pre-training models used in the research on rumor detection by constructing neural networks through deep learning are word2vec word vectors or ELMo, but the word vectors obtained in the former cannot solve the problem of polysemous words, so that each word trained can only correspond to one Vector representation, and the latter can dynamically adjust the word embedding according to the context, but use LSTM for feature extraction instead of Transformer, and ELMo uses context vector splicing as the current vector, so the fused vector features are poor
The training model mostly uses CNN or RNN network, but although the CNN network can extract the semantic features, it ignores the contextual word order features, and the CNN network cannot distinguish the more obvious features when splicing the pooled features after the full connection operation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microblog rumor detection method
  • Microblog rumor detection method
  • Microblog rumor detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0070] In order to prove the effectiveness of the present invention, we selected a series of event data based on Weibo platform compiled by Ma et al. and used in the paper. This data set is the original information captured through the Weibo API and all reposts and Reply, also grabbed general topic posts that were not reported as rumors and collected a similar number of rumor events, the detailed statistics are listed in the table below:

[0071]

[0072] We divide all the data according to the ratio of 4:1 between the training set and the test set. The specific division is listed in the following table:

[0073]

[0074]

[0075] The evaluation indicators we use to evaluate the effectiveness of the model are accuracy rate, precision rate, recall rate and F1 value. The results of prediction and actual results are listed in the following table:

[0076]

[0077] There are four baseline methods we use for comparison, namely SVM-TS, CNN-1, CNN-2, and CNN-GRU. The deta...

Embodiment 2

[0081] In order to prove the feasibility of our method, we also selected another Weibo dataset CED_Data set[23] for experimentation, and compared the accuracy rates obtained by using the sentence vectors obtained from the same pre-trained model to train on different training models. The data set contains 1538 rumor events and 1849 non-rumor events. We conduct experiments according to the ratio of 4:1 between the training set and the test set. The experimental data are listed in the table below. The MATLAB simulation diagram of the experimental results is as follows Figure 7 Shown:

[0082]

[0083] The experimental results show that the sentence vectors obtained through the BERT pre-training model will still have deviations in accuracy when trained on different training models, but the magnitude of the deviation is smaller than that of using different pre-training models before. Through experiments, it can be concluded that the accuracy rate of SVM-TS is about 86.7%, follo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a microblog rumor detection method, which considers an attention mechanism. The method comprises the following steps of collecting a microblog event and a corresponding commentdata set as sample data; preprocessing the sample data, and respectively extracting text contents of original microblogs and comments; pre-training the text by adopting a BERT pre-training model, andgenerating a sentence vector with a fixed length for each sentence of text; constructing a dictionary, and extracting original microblogs and a plurality of corresponding comments to form a microblogevent vector matrix; training the vector matrix by adopting a deep learning method Text CNN-Attention, and constructing a multi-level training model; and performing classification detection on the vector matrix according to the multi-level training model to obtain a rumor detection result corresponding to the social network data. Compared with a traditional rumor detection method, accuracy is improved.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a microblog rumor detection method. Background technique [0002] Rumors generally refer to unverified statements or statements, often related to an event. With the rapid development of social media, rumors can spread rapidly through social media at the speed of nuclear fission. Microblog, one of social media, is a new type of open Internet social service in the Web2.0 era. Users can update their microblogs with short text anytime and anywhere with the help of the Internet or mobile phones and other media, and share information with more users. Compared with traditional blogs, Weibo has the characteristics of communication: instant sharing of blog posts, innovative interactive methods, and vivid live performances. In terms of communication effect, it shows: popularity accumulation, economical and fast brand marketing. However, in the diversifie...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F16/35G06F40/211G06K9/62G06N3/04G06Q50/00
CPCG06F16/353G06F16/3344G06F40/211G06Q50/01G06N3/045G06F18/214
Inventor 宋玉蓉潘德宇
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products