In
information retrieval (IR) systems with high-speed access, especially to search engines applied to
the Internet and / or corporate
intranet domains for retrieving accessible documents automatic
text categorization techniques are used to support the presentation of search query results within high-speed network environments. An integrated, automatic and open
information retrieval system (100) comprises an
hybrid method based on linguistic and mathematical approaches for an automatic
text categorization. It solves the problems of conventional systems by combining an automatic content recognition technique with a self-learning hierarchical scheme of indexed categories. In response to a word submitted by a requester, said
system (100) retrieves documents containing that word, analyzes the documents to determine their word-pair patterns, matches the document patterns to
database patterns that are related to topics, and thereby assigns topics to each document. If the retrieved documents are assigned to more than one topic, a
list of the document topics is presented to the requester, and the requester designates the relevant topics. The requester is then granted access only to documents assigned to relevant topics. A knowledge
database (1408) linking
search terms to documents and documents to topics is established and maintained to speed future searches. Additionally, new strategies are presented to deal with different update frequencies of changed Web sites.