Unlock instant, AI-driven research and patent intelligence for your innovation.

Image-text content duplication judgment method and device

A technology of graphics, text and content, applied in the field of information processing, can solve the problems of low similarity judgment accuracy, difficult choice, and low recall rate, so as to improve the overall efficiency and judgment effect, improve the keyword extraction effect, and improve the recall rate. rate effect

Active Publication Date: 2020-04-03
XIAMEN MEET YOU INFORMATION TECH
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current methods for judging the repeatability of graphic and text content are mainly: cosine similarity algorithm, text simhash algorithm, etc., but the cosine similarity algorithm has the problems of large amount of comparison calculation, low efficiency and difficult threshold value, and the text simhash algorithm has high requirements for similarity. The problem of low recall rate, and common comparison models are based on single-dimensional feature comparison, it is easy to choose between recall rate and accuracy rate, and the accuracy of similarity judgment is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image-text content duplication judgment method and device
  • Image-text content duplication judgment method and device
  • Image-text content duplication judgment method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0057] Embodiment 1 of the present invention provides a method for judging repetition of graphic content, figure 1 An implementation flowchart of a method for judging repetition of graphic and text content provided by an embodiment of the present invention, as shown in figure 1 As shown, the method includes the following steps:

[0058] S1: Establish a word weight model.

[0059] S2: Generate a graphic-text comparison data set: collect sample graphic content, use the word weight model to extract the first preset number of article keywords in each sample graphic content, and calculate the second preset number of images in the sample graphic content The image comparison value of each sample is constructed to construct a graphic-text comparison data set including article keywords and image comparison values ​​of each sample graphic content.

[0060] In this embodiment, the first preset number may be 18, and the second preset number may be 3, which can be adaptively changed acco...

Embodiment 2

[0111] This embodiment provides a device for judging repetition of graphic and text content, which is used to implement the method described in Embodiment 1, such as Image 6 As shown, it is a structural block diagram of the device for judging repetition of graphic and text content in this embodiment, including:

[0112] Word weight model building module 10: for setting up word weight model;

[0113] Generating graphic-text comparison data set module 20: used to collect sample graphic content, use the word weight model to extract the first preset number of article keywords in each sample graphic content, and calculate the second preset number of sample graphic content The image comparison value of the number of images, constructing a graphic comparison data set including article keywords and image comparison values ​​of each sample graphic content;

[0114] Carry out image-text similarity comparison module 30: used to obtain the first preset number of target article keywords ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an image-text content duplication judgment method and device, and relates to the field of information processing. The method comprises the following steps: establishing a wordweight model; constructing an image-text comparison data set; obtaining a first preset number of target article keywords of the target image-text content and a second preset number of target picture comparison values, respectively comparing and calculating with an article keyword and a picture comparison value of each sample picture and text content in the image-text comparison data set; and obtaining the text similarity and the picture similarity number between the target image-text content and each sample image-text content, and according to the text similarity and the picture similarity number, judging the duplication condition between the target image-text content and the image-text comparison data set by utilizing a deduplication strategy. According to the invention, image-text judgment is carried out in combination with multiple dimension features, the overall efficiency and judgment effect of similarity judgment are improved, the keyword extraction effect of image-text content is improved through the word weight model, and in addition, the recall rate and accuracy are improved through the text similarity calculation method.

Description

technical field [0001] The invention relates to the field of information processing, in particular to a method and device for judging repetition of graphic and text content. Background technique [0002] At present, the Internet is full of massive graphic content, including a large amount of repetitive content. These duplicate contents waste a lot of storage resources. In order to save storage resources, it is necessary to remove duplicate graphic content on the Internet. With the popularization of various natural language processing applications such as text information by computers, people need an effective and accurate method to calculate the similarity between two graphic contents. The current methods for judging the repeatability of graphic and text content mainly include: cosine similarity algorithm, text simhash algorithm, etc., but the cosine similarity algorithm has the problems of large amount of comparison calculation, low efficiency and difficult threshold valu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/289G06F40/216G06F16/535G06F16/583
CPCG06F16/5846G06F16/535
Inventor 陈方毅谢振林
Owner XIAMEN MEET YOU INFORMATION TECH