A false news identification method based on image-text multi-scale features and background knowledge

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By extracting and fusing multi-scale features from news text, images, and background knowledge, the problem of insufficient utilization of multimodal information in existing technologies is solved, achieving high accuracy and robustness in fake news detection.

CN119888764BActive Publication Date: 2026-06-26SICHUAN UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SICHUAN UNIV
Filing Date: 2023-10-24
Publication Date: 2026-06-26

Application Information

Patent Timeline

24 Oct 2023

Application

26 Jun 2026

Publication

CN119888764B

IPC: G06V30/262; G06V30/186; G06V30/19; G06V30/41

CPC: G06V30/262; G06V30/189; G06V30/1918; G06V30/41; G06V30/19147; Y02D10/00

AI Tagging

Technology Topics

Pattern recognition Artificial intelligence

Technical Efficacy Phrases

improve accuracy Improve robustness

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing fake news detection technologies lack effective mining of multimodal information and full utilization of rich background information, resulting in insufficient detection accuracy and robustness.

Method used

This paper adopts a method based on multi-scale features of text and images and background knowledge. By extracting, comparing and fusing news text features, image features and external knowledge features, it uses SpaCy, SKEP, BERT models and multilayer perceptron (MLP) to identify the authenticity of news and realize the effective use of cross-modal multi-features.

Benefits of technology

It significantly improves the accuracy and robustness of fake news detection, enabling efficient identification of fake news.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure HDA0004509732510000011
Figure HDA0004509732510000012
Figure HDA0004509732510000021

Patent Text Reader

Abstract

The application provides a false news identification method based on text and image multi-scale features and background knowledge, which fully and effectively extracts news content from three aspects of news text features, news image features and news background knowledge features, compares the extracted text features and image features through a comparison network to obtain a comparison result of multi-scale text and image features, compares the background knowledge and the news text to obtain a comparison result of the news and the background knowledge, finally performs cross-modal fusion on the two comparison results and the news text features, and sends the fusion result to a true and false identifier, so that the true and false of the news is identified. The method can effectively improve the accuracy and robustness of false news detection.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of online fake news identification technology, specifically to a method for identifying fake news based on multi-scale features of images and text and background knowledge. Background Technology

[0002] Current fake news detection technologies can be divided into two categories: unimodal and multimodal technologies. Unimodal technologies typically only input extracted text or image information into the authenticity detector, ignoring the modal correlation between text and images. Existing multimodal technologies usually directly concatenate encoded text and images, then input the resulting multimodal information into the authenticity detector to verify the news's authenticity. However, these methods not only lack effective mining of multimodal information but also fail to fully utilize rich background information. Fake news itself carries rich multimodal information, and there are strong correlations between these modalities. By organically combining multimodal information with relevant background knowledge, the accuracy and robustness of fake news detection can be significantly improved. Summary of the Invention

[0003] This invention proposes a method for identifying fake news based on multi-scale features of text and images and background knowledge. This method fully and effectively extracts news content from three aspects: news text features, news image features, and news background knowledge features. The extracted text features are compared with image features to obtain a text-image comparison result, and the background knowledge is compared with the news text to obtain a comparison result between the news and external knowledge. Finally, the two sets of comparison results, news text features, and image features are fused, and the fused result is fed into a fake news detector to achieve the identification of the news's authenticity. This method can significantly improve the accuracy and robustness of fake news detection. The method includes six processes: news text feature extraction, news image feature extraction, external knowledge extraction, cross-modal multi-feature comparison, cross-modal multi-feature fusion, and fake news identification.

[0004] The news text feature extraction process first uses named entity recognition technology based on the SpaCy framework to identify important information such as people, targets, events, time, and location from the news text. Then, a pre-trained text sentiment recognition model based on the SKEP framework is used to identify the sentiment of the news text. Next, a text topic dictionary is used to classify the text topics. Finally, text style analysis technology is used to analyze the writing style of the news text from the dimensions of parts of speech, sentence length, and text symbols.

[0005] The news image feature extraction process employs a method of coarse-fine granular annotation on the dataset. Fine-grained annotation is used for human emotions, while coarse-grained annotation is used for human figures, targets, time, and space. This enables the extraction of multiple features from the image, including human figures, targets, emotions, time, and space.

[0006] The process of extracting background knowledge from the news first adopts a dictionary-based external knowledge construction method to build an external knowledge base based on Wikipedia entries. Then, important background knowledge related to the news is retrieved from the external knowledge base, including knowledge about people, targets, events, and common sense.

[0007] The cross-modal multi-feature comparison process first uses a pre-trained BERT model to encode the news text, extracted multi-scale text features, multi-scale image features, and background knowledge. Then, the encoded multi-scale text features and multi-scale image features are fed into a comparison network for semantic feature comparison, and the encoded news text and background knowledge are fed into the comparison network for semantic comparison of text content and background knowledge. This yields the comparison results of the text and image multi-scale features, as well as the comparison results of the text and background knowledge.

[0008] The cross-modal multi-feature fusion process uses tensor concatenation to concatenate the comparison results of text and image multi-scale features, the comparison results of text and background knowledge, and the encoded text tensor. The concatenated fusion result is then sent to the authenticity detector.

[0009] The authenticity identification process constructs a news authenticity detector based on a multilayer perceptron (MLP). This detector includes an input layer, a hidden layer, an output layer, and a sigmoid function. Then, the cross-modal multi-feature fusion result is nonlinearly processed, and the sigmoid function outputs a normalized news authenticity confidence score. The authenticity of the news is then determined based on the authenticity confidence score.

[0010] The technical problem addressed by this invention is a method for identifying fake news based on multi-scale features of images and text and background knowledge. It provides an efficient and robust method for identifying fake news. This method fully extracts multi-scale features of images and text, along with relevant background knowledge, to provide rich cross-modal multi-scale feature information for the authenticity identification model, thereby improving the accuracy and robustness of fake news identification and achieving a breakthrough in online fake news identification technology. Attached Figure Description

[0011] Figure 1 This is a flowchart of the fake news identification method based on multi-scale features of images and text and background knowledge of the present invention.

[0012] Figure 2 This is a flowchart of the comparison network in this invention.

[0013] Figure 3 This is a flowchart of the authenticity detector based on multilayer perceptron (MLP) in this invention. Detailed Implementation

[0014] The invention will now be further described with reference to the accompanying drawings.

[0015] A method for identifying fake news based on multi-scale features of text and images and background knowledge includes six processes: news text feature extraction, news image feature extraction, external knowledge extraction, cross-modal multi-feature comparison, cross-modal multi-feature fusion, and authenticity verification. The overall process is shown in the attached figure. Figure 1 As shown. In the multi-scale text feature extraction section, the features of people, targets, events, time, and location are first extracted and converted into named entity recognition. Then, named entity recognition technology based on the SpaCy framework is used to accurately identify important information such as people, targets, events, time, and location from the news text. Finally, a pre-trained text sentiment recognition model based on the SKEP framework is used to identify the sentiment of the news text. A topic dictionary based on text entity information is constructed. By querying the topic classification of people, targets, and events in the topic dictionary, the text topic is determined. Finally, multi-dimensional statistics are performed on the part of speech, sentence length, and text symbols of the text. The multi-dimensional statistical results are used as the writing style of the news text, thereby realizing the analysis of text style. In the multi-scale text feature extraction section, a multi-target recognition framework based on the YOLO model is used to extract multi-scale information from the image. It should be noted that, in order to achieve accurate sentiment recognition, a coarse-fine granular annotation method is used to annotate the training dataset. Fine-grained annotation is used for human facial expressions, and coarse-grained annotation is used for people, targets, and spatiotemporal information. In the background knowledge query process, a background knowledge base dictionary was first constructed based on Wikipedia. Then, a dictionary key-value query method was used to search for background knowledge related to people, targets, and events in the background knowledge base. After extracting text features, image features, and background knowledge, a pre-trained BERT model was used to encode the text, background knowledge, and all features. The text encoding results were then convolved to obtain a text feature tensor. Finally, the encoded external background knowledge tensor, text feature tensor, image feature tensor, and text feature tensor were fed into a comparison network.

[0016] The comparison network is as follows: Figure 2As shown, the comparison network has two feature input interfaces: Feature Input Interface 1 and Feature Input Interface 2. When performing multi-scale feature comparison between text and images, Feature Input Interface 1 receives the encoded text feature tensor, and Feature Input Interface 2 receives the encoded image feature tensor. When comparing text with external background knowledge, Feature Input Interface 1 receives the encoded text tensor, and Feature Input Interface 2 receives the encoded external background knowledge tensor. After receiving the two input features, the comparison network performs Hadamard product and point cross-difference operations on the input features, as shown below:

[0017] (1)

[0018] (2)

[0019] in, It is the tensor of input interface 1. It is the tensor of input interface 2. This indicates a point difference operation. This represents the Hadamard product operation. Then, tensor concatenation is used to join the above results together, and a fully connected layer is used to connect them. This process is represented as follows:

[0020] (3)

[0021] in, This indicates a comparison of the network outputs. Indicates a fully connected operation. This represents the tensor concatenation operation. After obtaining the comparison feature results, the tensor concatenation operation is used again to concatenate the comparison results of the multi-scale features of the image and text, the comparison results of the text and background knowledge, and the text features together to obtain cross-modal multi-scale features. The concatenation process is represented as follows:

[0022] (4)

[0023] in, This represents the comparison results of multi-scale features of text and images. This indicates the comparison result between the text and background knowledge. This represents the text feature tensor. The results are then fed into a discriminator for authenticity verification.

[0024] The discriminator is as follows: Figure 3As shown, the discriminator consists of an input layer, hidden layers, an output layer, and a sigmoid activation function. The input layer passes the input features to the hidden layers via a fully connected layer. During the transfer between hidden layers, ReLU activation layers and dropout layers are added to ensure the discriminator has the ability to learn non-linear features, thus learning more useful features from cross-modal, multi-scale features. The addition of dropout layers also helps the model avoid overfitting. The output layer feeds the features learned by the hidden layers into the sigmoid function, outputting the confidence score for news identification. Finally, the authenticity of the news is determined based on the confidence score. In this discriminator, news with a confidence score less than 0.5 is identified as fake news, and news with a confidence score greater than 0.5 is identified as real news.

Claims

1. A method for identifying fake news based on multi-scale features of images and text and background knowledge, characterized in that, The method includes the following steps: Step 1: Extract multi-scale text features from news texts through named entity recognition, topic dictionary construction and query, text writing style statistics, and text sentiment recognition. The multi-scale text features include people, targets, events, time, location, sentiment, and text style. Step 2: Train the multi-object recognition model using the image training dataset with coarse and fine granular annotations, and use the trained multi-object recognition model to extract multi-scale image features from news images. The multi-scale image features include people, objects, time, location, and people's emotions. Step 3: Using Wikipedia data, construct an external background knowledge base dictionary based on Wikipedia data, and obtain background knowledge related to the news content through dictionary key-value lookup; Step 4: Use the pre-trained BERT model to encode the news text, the extracted multi-scale text features, multi-scale image features, and background knowledge. Then, feed the encoded multi-scale text features and multi-scale image features into the contrast network for feature semantic comparison. Feed the encoded news text and background knowledge into the contrast network for semantic comparison of text content and background knowledge. This will give you the comparison results of text and image multi-scale features and the comparison results of text and background knowledge. Step 5: Use multi-feature tensor concatenation to concatenate the text feature tensor, the comparison results of text and image multi-scale features, and the comparison results of text and background knowledge to obtain cross-modal multi-scale fusion features; the text feature tensor is obtained by encoding the news text using a pre-trained BERT model and performing convolution operations on the text encoding results; Step 6: Construct an MLP-based authenticity discriminator and feed the multi-scale fused features into the authenticity discriminator for authenticity identification.

2. The method for identifying fake news based on multi-scale features of images and text and background knowledge according to claim 1, characterized in that, The dataset in step 2 is labeled using a mixed coarse-grained and fine-grained method, and the emotions of the characters are labeled using fine-grained methods.

3. The method for identifying fake news based on multi-scale features of images and text and background knowledge according to claim 1, characterized in that, The background knowledge base construction in step 3 adopts a dictionary-based key-value construction method to ensure the efficiency of background knowledge retrieval.

4. The method for identifying fake news based on multi-scale features of images and text and background knowledge according to claim 1, characterized in that, The comparison network in step 4 includes parallel Hadamard product operations and point cross-difference operations, and the results of the two operations are concatenated and sent to the fully connected layer.

Citation Information

Patent Citations

Knowledge-fused multi-modal false news identification method and device
CN113946683A
False news detection method based on topic and structure perception neural network
CN115269854A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Knowledge-fused multi-modal false news identification method and device

False news detection method based on topic and structure perception neural network