Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and apparatus for a character-based comparison of documents

a document comparison and document technology, applied in the field of data processing, can solve the problems of wasting time by downloading voluminous but useless advertisement information, unsolicited electronic messages (emails), and limited storage capacity of thin client systems such as network computers, pda's, network computers,

Inactive Publication Date: 2005-06-16
SYMANTEC CORP
View PDF6 Cites 127 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

Effectively filters spam emails by accurately identifying similarities between messages, reducing noise and improving filtering efficiency, thus conserving resources and user time.

Problems solved by technology

These electronic messages (emails) are usually unsolicited and regarded as nuisances by the recipients because they occupy much of the storage space needed for the necessary and important data processing.
Moreover, thin client systems such as set top boxes, PDA's, network computers, and pagers all have limited storage capacity.
In addition, a typical user wastes time by downloading voluminous but useless advertisement information.
However, as spam filtering grows in sophistication, so do the techniques of spammers in avoiding the filters.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for a character-based comparison of documents
  • Method and apparatus for a character-based comparison of documents
  • Method and apparatus for a character-based comparison of documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] A method and apparatus for a character-based comparison of documents are described. In the following description, numerous details are set forth. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.

[0023] Some portions of the detailed descriptions which follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method and system for a character-based document comparison are described. In one embodiment, the method includes dividing a first document into tokens. Each token includes a predefined number of sequential characters from the first document. The method further includes calculating hash values for the tokens and creating, for the first document, a signature including a subset of hash values from the calculated hash values and additional information pertaining to the tokens of the first document. The signature of the first document is subsequently compared with a signature of a second document to determine resemblance between the first document and the second document.

Description

RELATED APPLICATIONS [0001] The present application claims priority to U.S. Provisional Application Ser. No. 60 / 471,242, filed May 15, 2003, which is incorporated herein in its entirety.FIELD OF THE INVENTION [0002] The present invention relates to data processing; more particularly, the present invention relates to a character-based comparison of documents. BACKGROUND OF THE INVENTION [0003] The Internet is growing in popularity, and more and more people are conducting business over the Internet, advertising their products and services by generating and sending electronic mass mailings. These electronic messages (emails) are usually unsolicited and regarded as nuisances by the recipients because they occupy much of the storage space needed for the necessary and important data processing. For example, a mail server may have to reject accepting an important and / or desired email when its storage capacity is filled to the maximum with the unwanted emails containing advertisements. More...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F15/16H04L12/58
CPCH04L12/58H04L12/583H04L51/12H04L51/063H04L12/585H04L51/212
Inventor MEDLAR, ART
Owner SYMANTEC CORP