Disclosed is a computer-implemented method for comparing text documents. The method comprises building a
database comprising first
text document data associated with a plurality of first text documents. The method further comprises receiving a query. The method also comprises converting the query to second
text document data. The method further comprises comparing second
text document data to first text document data and computing at least one
similarity measure between second text document data and first document data. Further disclosed is a computer-implemented method for
processing of similarities in text documents. The method comprises harmonizing at least one incoming query. It further comprises normalizing the at least one incoming harmonized query. The method also comprises constructing at least one query vector using the at least one normalized harmonized query. The method further comprises computing at least one
similarity measure between the at least one query vector and at least one further text document, wherein the at least one further text document underwent the previous steps. Also disclosed is a computer-implemented
system. The
system comprises at least one memory component adapted for at least storing a
database comprising a plurality of first text document data associated with first text documents. The
system also comprises at least one
input device adapted for receiving a query. The query comprises a second text document and / or information identifying a second text document. The second text document is associated with second text document data comprised within first text document data already stored within the memory component. The system further comprises at least one
processing component adapted for converting a query into second text document data and / or retrieving second text document data associated with the query from storage within the at least one memory component. The
processing component is also adapted to compare second text document data to the first text document data stored within the at least one memory component. The system also comprises at least one
output device adapted for returning information identifying at least one similar first text document associated with first text document data. The similar first text documents is most similar among first text documents to the query.