A method and
system for publishing a plurality of books for user
access to information includes selecting a plurality of books, converting each book from a publisher's digital form, e.g., by training a tool to detect characteristic features (such as
layout,
typeface, and hierarchical or organizational features such as chapter headings, captions, drawings and tables), and extracting text or
data information of the book tagged with the features. This produces a searchable
library database arranged, for example, as an
xml database indexed by book structure such that a user may remotely, over
the internet or other network, access the
database, search desired content, and view an image of a portion of the book with the desired data. The
system includes a user registration module to identify an authorized user, and may maintain a personal bookshelf for the user. A
search engine may
score search results based on their position in the hierarchy or other factors, determining degree of relevance of text or
data information located by the
search engine. The other factors may include position of located search data in the hierarchy, identification of search data in the user's personal
library or in a prior search by the user, or degree of match of data identified in the search. An interface with a commercially available
search engine may operate to adapt the search. When provided a search query by a user, it may search for an
exact match and
score hits for relevance, and in the event an
exact match is not found, operate to expand the query and return hits in order of rank together with an indication of the expanded search. The user may thus ascertain a degree of likely relevance of returned text or
data information. The
relational database may include hyperlinks to section headings and related data passages, such that a user accessing a page of a book may immediately view related data and context of a page. The
relational database is indexed by logical subunits of the book such that expanded searches for Boolean combinations or proximity of elements span page breaks of book text to identify all instances of the desired search data. The search engine may expand a search if all hits have low
ranking, and may suppress hits of low
ranking when the search produces hits of high
ranking. In further embodiments, the search engine may search tables, drawings and formulae of the converted book file.