A system to automatically enhance, tag, classify, categorize, cluster and index products described in unstructured text-based electronic documents. The system and method incorporate the use of text normalization, regular expressions, product number matching rules, text segmentation, entity detection, language models, predictive modeling, hierarchal subspace clustering, formal concept analysis, and a weighted combination of all techniques to detect and infer knowledge extracted from a digital version of raw, unstructured product text. Knowledge extracted and inferred comprises knowledge units including: main conceptual entity, entity text patterns, product language models, and conceptual hierarchies. The extracted knowledge units are utilized to store and index products in a product knowledge database and the products and knowledge units are made available to users via a user interface.