False information early detection method based on text structure algorithm

A technology of false information and text structure, applied in the field of false information detection based on text structure algorithm, can solve the problem of ignoring the advanced text structure of documents and effective modeling of key context information, consuming a lot of time and human resources, and increasing the difficulty of false information detection tasks and workload issues, to achieve the effect of safety and reliability, not easy to detect and fight, and facilitate the difficulty of data collection

Active Publication Date: 2021-04-23
TAIYUAN UNIV OF TECH
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The existing false information detection algorithms mainly focus on the use of information text content and other external information. The false news detection algorithm based on external information takes the derived characteristics of information generated during the dissemination of social networks as the research object, and usually considers such as information forwarding and Comments, user portraits, dissemination time, and article sources, etc., although the algorithm has achieved certain results, it also has shortcomings. Because it involves external information, it takes a lot of time and human resources to collect data, and some data is sensitive information ( Such as user portrait), which involves privacy protection issues. In addition, the collected unstructured data types are complex, contain a lot of noise and missing values, and need to be further cleaned and preprocessed, which undoubtedly increases the difficulty and workload of false information detection tasks; therefore , the above-mentioned false information detection algorithm cannot directly judge whether the information is true or false without external auxiliary information, and faces problems such as data collection difficulties, data loss and noise, resulting in low efficiency and poor timeliness for false information detection, which is not conducive to the detection of false information. Timely containment and stop losses in the early stages of an outbreak
[0003] Text is the main carrier of information, and it is the most direct and convenient method to realize the automatic detection of false information only relying on text content; most of the current research on false information detection based on text content focuses on the difference between false information and real information in text language features , realize false information detection through machine learning or deep learning algorithms; the false information detection algorithm based on machine learning first constructs features such as Ngrams, punctuation marks, psycholinguistic words and emotional polarity through feature engineering, and extracts the feature set Input support vector machine (SVM: SupportVectorMachine), logistic regression (LR: LogisticRegression) and other machine learning models to realize false information detection; these algorithms need to manually construct discrete language features, which is cumbersome and time-consuming, and it is difficult to find the optimal feature combination , the detection efficiency is low, and it cannot adapt to the evolution of false information
[0004] Deep learning is considered to be more capable of discovering potential information in text than machine learning, and has shown obvious advantages in utilizing semantic knowledge; the false information detection algorithm based on deep learning applies recurrent neural network (RNN: Recurrent Neural Network) and convolutional neural network (CNN). :ConvolutionalNeuralNetwork) and other models, which can encode the contextual information in the text and the long-distance dependencies between words, and automatically learn the deep semantic representation of the text content; however, the existing deep learning-based false information detection algorithms are more focused on The simple representation learning of documents or sentences ignores the effective modeling of high-level text structures and key contextual information of documents; on the other hand, since the above work uses words or sentences in the text as calculation objects, there are noise problems and too long sentences. Insufficient problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • False information early detection method based on text structure algorithm
  • False information early detection method based on text structure algorithm
  • False information early detection method based on text structure algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0067] like figure 1 and figure 2 As shown, since the purpose of the present invention is to detect the true or false situation of information, the specific task target can be summarized as a binary classification problem of whether the document to be detected is classified as false information, and the embodiments of the present invention are as follows:

[0068] Establish calculation module 1, which is used to obtain the textual unit of the document. This module divides the document to be detected into segments to obtain its smallest textual unit.

[0069] The present invention takes the discourse unit as the calculation object, and abandons the problems generated in the model learning process with words or sentences as the object, so the design scheme is used to obtain the smallest discourse unit at first. The smallest discourse unit (EDU: Elementary DiscourseUnit) is the basic language unit of a document, generally expressed as a clause, and the shortest can be a phrase....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a false information early detection method based on a text structure algorithm, and belongs to the technical field of false information detection based on the text structure algorithm. The technical problem to be solved is to provide an improvement of a false information early detection method based on a text structure algorithm. According to the technical scheme, the method comprises the steps of document discourse unit obtaining, discourse unit representation learning and structure representation learning, a document discourse structure chart is constructed on the basis of a rhetortic structure theory, and global structure representation of discourse units is obtained through a multi-relational graph neural network; in context representation learning, the position adjacent relation of discourse units in a document serves as a calculation object, local context representations of the discourse units are obtained, all the discourse units of the document are fused to form document representations based on a gating recursion unit fused with a global attention mechanism, the generated document representations are used for false information detection, and a probability value of whether the input document belongs to false information or not is obtained. The method is applied to false information detection.

Description

technical field [0001] The invention discloses a false information early detection method based on a text structure algorithm, and belongs to the technical field of false information detection based on a text structure algorithm. Background technique [0002] The existing false information detection algorithms mainly focus on the use of information text content and other external information. The false news detection algorithm based on external information takes the derived characteristics of information generated during the dissemination of social networks as the research object, and usually considers such as information forwarding and Comments, user portraits, dissemination time, and article sources, etc., although the algorithm has achieved certain results, it also has shortcomings. Because it involves external information, it takes a lot of time and human resources to collect data, and some data is sensitive information ( Such as user portrait), which involves privacy pr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/216G06F40/30G06N3/04G06N3/08
CPCG06F40/211G06F40/216G06F40/30G06N3/049G06N3/08G06N3/047G06N3/045
Inventor 王莉王宇航杨延杰
Owner TAIYUAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products