Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method for detecting similar texts on basis of text picture retrieval

A text image, text detection technology, applied in neural learning methods, other database retrieval, file management systems, etc., can solve the problems of different high-level theme features, difficulty in similarity retrieval accuracy, etc.

Active Publication Date: 2017-11-07
XIANGTAN UNIV
View PDF8 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in practice, it is usually difficult to improve the accuracy of similarity retrieval if only the cosine similarity between text images is calculated. For example, different text images may have similar visual features but different high-level topic features. Case

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting similar texts on basis of text picture retrieval
  • Method for detecting similar texts on basis of text picture retrieval
  • Method for detecting similar texts on basis of text picture retrieval

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0054] Such as figure 1 As shown, in the training phase, first collect text documents to establish a document library, convert the document library documents into pictures according to paragraphs, and establish a text picture library; , GoogLeNet three network models for fine-tuning, dimension reduction and save the feature matrix of the text image library obtained from the model, and then calculate the average retrieval accuracy of the three network models, based on the obtained average retrieval accuracy of the three network models fusion The feature matrix corresponding to each model. In the retrieval stage, firstly, the retrieved document is divided into paragraphs and converted into a retrieved image set, and a series of preprocessing is performed; then, the three fine-tuned network models are used to extract features from the preprocessed retriev...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for detecting similar texts on the basis of text picture retrieval. The method includes steps of establishing document libraries; establishing text picture libraries; extracting features of pictures in the text picture libraries and reducing dimensions; segmenting retrieval documents to obtain retrieval picture sets; extracting features of pictures in the retrieval picture sets and reducing dimensions; measuring cosine similarity of the retrieval picture sets; filtering full-test similarity of retrieval results; outputting the retrieval results. The method has the advantages that diversified multilayer convolutional neural network modules are integrated with one another for training CNN (convolutional neural network) feature description operators, and accordingly text images can be deeply visually represented; the dimensions are reduced by means of PCA (principal component analysis) compression, and accordingly the similarity measurement efficiency can be improved; full-text similarity filter models are built from the filter aspect of the retrieval results, improvement can be carried out, accordingly, the similarity of the retrieval results can be updated, the retrieval precision can be improved, optional multiple character texts can be directly recommended and retrieved, and the method is excellent in similar text detection capacity and can be used for checking the repeatability of the texts or recommending similar literature.

Description

technical field [0001] The invention relates to a similar text detection method based on text picture retrieval. Background technique [0002] With the vigorous development of digital media technology, the scale of multimedia resources including text images is getting larger and larger. Text retrieval has gradually become a research hotspot in the field of natural language processing, and many text retrieval methods based on optical character recognition (OCR) technology have been produced. This method recognizes text content from images, and then uses text retrieval technology to implement text image retrieval systems. However, traditional text image retrieval systems need to rely on OCR-based complex models to achieve better text recognition and text similarity detection results. In addition, to directly recommend and retrieve unconstrained text images containing any number of characters, while adapting to different language symbols, a similarity retrieval method is neede...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06N3/04G06N3/08
CPCG06F16/93G06N3/04G06N3/08
Inventor 谭貌原思平金继成苏永新
Owner XIANGTAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products