A
system and method for extracting data, hereinafter referred to as MitoMine, that produces a strongly-typed ontology defined collection referencing (and cross referencing) all extracted records. The input to the mining process can be any
data source, such as a
text file delimited into a set of possibly dissimilar records. MitoMine contains parser routines and post
processing functions, known as ‘munchers’. The parser routines can be accessed either via a batch mining process or as part of a running
server process connected to a live source. Munchers can be registered on a per data-source basis in order to process the records produced, possibly writing them to an
external database and / or a set of servers. The present invention also embeds an interpreted ontology based language within a
compiler /
interpreter (for the source format) such that the statements of the embedded language are executed as a result of the source
compiler ‘recognizing’ a given construct within the source and extracting the corresponding source content. In this way, the execution of the statements in the embedded program will occur in a sequence that is dictated wholly by the source content. This
system and method therefore make it possible to bulk extract free-form data from such sources as CD-ROMs, the web etc. and have the
resultant structured data loaded into an ontology based
system.