Content-irrelevant text rapid filtration method

A fast filtering and text technology, which is applied in special data processing applications, instruments, electrical digital data processing, etc., to achieve the effect of small calculation, fast filtering speed and lightening system burden

Inactive Publication Date: 2016-08-24
ZHEJIANG UNIVERSITY OF MEDIA AND COMMUNICATIONS
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If you can quickly filter out irrelevant noise-containing text without touchi...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Content-irrelevant text rapid filtration method
  • Content-irrelevant text rapid filtration method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026]In order to further understand the present invention, the preferred embodiments of the present invention are described below in conjunction with examples, but it should be understood that these descriptions are only to further illustrate the features and advantages of the present invention, rather than limiting the claims of the present invention.

[0027] The invention can be typically applied in the detection of copyright infringement of literary works on the Internet.

[0028] Copyright, which is called copyright in Anglo-American legal terminology, is the most common form of intellectual property rights. All works that are produced by human intellectual activities, are original and can be reproduced are copyrighted. The most common carriers of copyright are literary and artistic works, such as novels, poems, scripts, music, drama, folk art, dance, acrobatics, fine arts, photography, movies, etc.; scientific and engineering works also have copyright, such as engineerin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of computer information retrieval, in particular to a content-irrelevant text rapid filtration method applied to such occasions as library information retrieval, plagiarism detection and copyright infringement detection. A mother text A and a target text B are segmented into series fragments separately through separators, evaluation is conducted on each text fragment through energy functions, and the energy functions are irrelevant to text content and are in positive correlation with noise contained in the texts; in obtained energy of the mother text A and the target text B, the values smaller than a present energy threshold value H are removed, and an energy sequence N and an energy sequence M are obtained; self-similarity analysis is conducted on the energy sequence M; under the constraint of permissible errors, matching comparison between the energy sequence M and the energy sequence N is conducted, if the energy sequence N is ended, the matching degree of any part of the target text B and any part of the mother text A is lower than the threshold value, and the target text B is excluded. By means of the text rapid filtration method, analysis of the text content is avoided, the calculated amount is small, filtration speed is high, and influence of part of noise can be shielded.

Description

technical field [0001] The invention relates to the field of computer information retrieval, in particular to a content-independent fast text filtering method applied to occasions requiring text filtering such as library information retrieval, plagiarism detection, and copyright infringement detection. Background technique [0002] A text is a written representation of a human's natural language that is semi-structured or unstructured and lacks computer-understandable semantics. Text processing technology is widely used in information retrieval, search engines, plagiarism detection, copyright protection and other fields. Its function is to discover implicit knowledge and patterns from massive, heterogeneous and distributed texts. Text processing technology represents the text with an appropriate mathematical model, so that it must contain enough information to reflect the characteristics of the text, but it will not be too complicated to exceed the processing capacity of the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/332
Inventor 张帆金哲凡
Owner ZHEJIANG UNIVERSITY OF MEDIA AND COMMUNICATIONS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products