Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Efficient passage retrieval using document metadata

a document metadata and efficient technology, applied in the field of information retrieval systems, can solve problems such as the lack of a computer program capable of accurately answering factual questions, and achieve the effect of efficient retrieval

Inactive Publication Date: 2012-03-29
IBM CORP
View PDF10 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0010]In one aspect there is provided a computing infrastructure and methodology that conducts question and answering and performs automatic passage retrieval operations in a highly efficient manner.
[0011]In one aspect, there is provided a computer-implemented method for efficiently retrieving relevant passages to questions based on a corpus of data comprising: receiving an input query; performing a query context analysis upon the input query to obtain searchable query terms; matching metadata associated with one or more documents against the query terms; mapping matched document metadata to corresponding one or more documents; identifying corresponding matched documents to form a subcorpus of documents; and conducting a search in the data subcorpus using the searchable query terms to obtain one or more passages relevant to the input query from the identified documents, wherein one or more processor devices performs one or more the retrieving, performing, matching, mapping, identifying and conducting.
[0014]In an alternate embodiment, there is provided a computer-implemented method for efficiently retrieving relevant passages to questions based on a corpus of data comprising: receiving, at a processor device, an input query; performing, at the processor device, a query context analysis upon the input query to obtain searchable query terms; accessing a dictionary of document metadata obtained from one or more documents of the data corpus, each stored document metadata being associated with a corresponding document identification (ID); performing, by the processor device, a dictionary matching of the metadata associated with one or more documents against the query terms; mapping matched document metadata to corresponding one or more document IDs; identifying corresponding matched documents to form a subcorpus of documents; and conducting a search in the subcorpus using the searchable query terms to obtain passages relevant to the input query from the identified documents.

Problems solved by technology

A major unsolved problem in such information query paradigms is the lack of a computer program capable of accurately answering factual questions based on information included in a collection of documents that can be either structured, unstructured, or both.
It is a challenge to understand the query, to find appropriate documents that might contain the answer, and to extract the correct answer to be delivered to the user.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient passage retrieval using document metadata
  • Efficient passage retrieval using document metadata
  • Efficient passage retrieval using document metadata

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021]FIG. 1 shows a QA system diagram such as described in U.S. patent application Ser. No. 12 / 126,642 depicting a high-level logical architecture 10 and methodology in which the present system and method may be employed in one embodiment.

[0022]FIG. 1 illustrates the major components that comprise a canonical question answering system 10 and their workflow. The question analysis component 20 receives a natural language question 19 (e.g., “Who is the 42˜president of the United States?”) and analyzes the question to produce, minimally, the semantic type of the expected answer (in this example, “president”), and optionally other analysis results for downstream processing. The search component 30a formulates queries from the output 29 of question analysis and consults various resources such as the World Wide Web 41 or one or more knowledge resources, e.g., databases, knowledge bases 42, to retrieve “documents” including, e.g., whole documents or document portions 44, e.g., web-pages, d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A system, method and computer program product for efficiently retrieving relevant passages to questions based on a corpus of data. A processor device receives an input query and performs a query analysis to obtain searchable query terms. The processor performs: matching metadata associated with one or more documents against the query terms. The document metadata includes one or more of: a title of the documents, one or more user tags or clouds. Then the processor device performs: mapping matched document metadata to corresponding one or more documents; identifying corresponding matched documents to form a subcorpus of documents; and conducting a search in the data subcorpus using the searchable query terms to obtain one or more passages relevant input query from the identified documents.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]The present invention relates to and claims the benefit of the filing date of commonly-owned, co-pending U.S. Provisional Patent Application No. 61 / 386,019, filed Sep. 24, 2010, the entire contents and disclosure of which is incorporated by reference as if fully set forth herein.BACKGROUND[0002]The invention relates generally to information retrieval systems, and more particularly, the invention relates to an automated query / answer system and method implementing a passage retrieval component to conduct a search that identifies passages relevant to a given question using document metadata from a collection including text-based resources.DESCRIPTION OF THE RELATED ART[0003]An introduction to the current issues and approaches of question answering (QA) can be found in the web-based reference http: / / en.wikipedia.org / wiki / Question_answering. Generally, QA is a type of information retrieval. Given a collection of documents (such as the World Wi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06N5/02G06F16/3329
Inventor CHU-CARROLL, JENNIFERFERRUCCI, DAVID A.
Owner IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products