System and method for confidentiality-preserving rank-ordered search

a rank-ordered search and rank-ordered search technology, applied in the field of information search and retrieval, can solve the problems of low effort in addressing secure searching, system administrators and other personnel involved may not be trusted to have decryption keys, and conventional practices to accommodate such searches on hard-copy collections are extremely time-consuming and laborious

Inactive Publication Date: 2010-06-10
UNIV OF MARYLAND
View PDF11 Cites 105 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0013]The confidentiality preserving rank-ordered search system and method of the invention focuses on secure and efficient rank-ordered search and retrieval over large data collections. The system includes a framework to securely rank-order documents in response to a query, and techniques for extracting the most relevant document(s) from an encrypted data collection. The system and method includes collection of term frequency information for each of the documents in the collection to build indices, as in traditional retrieval systems in plaintext. The system and method further includes securing of these indices that would otherwise reveal important statistical information about the collection to protect against statistical attacks. During the search process, the query terms may be encrypted to prevent the exposure of information to the data center and other intruders, and also confine the searching entity to only make queries within an authorized scope. Utilizing the term frequencies and other document information, schemes are developed herein to securely compute relevance scores of each document, identify the most relevant documents, and reserve the right to screen and release the full content of relevant documents.

Problems solved by technology

For example, when data storage is outsourced to a third-party data center, system administrators and other personnel involved may not be trusted to have decryption keys and thus have access to the content of the data collections.
When an authorized user remotely accesses the data collection to search and retrieve desired documents, the large size of the collections can often make it infeasible to transfer all encrypted data to the user's side, and then perform decryption and search on the user's trusted computers.
Conventional practices to accommodate such searches on hard-copy collections are extremely time consuming, and are often based on human factors (e.g. have limited memory and bounded by rules of privilege) that cannot all be directly extended to computerized practice.
There has also been minimal effort in addressing secure searching, and such effort has typically been limited to small collections.
This method still incurs a significant increase in storage (for storing the specially encrypted documents) and typically involves a linear time computational complexity with respect to the number of words in the collection.
The aforementioned techniques involve a high computational complexity, and target simple Boolean searches to identify the presence or absence of a term in encrypted text.
Furthermore, the aforementioned techniques cannot be easily extended to more sophisticated relevance-ranked searches over large collections.
The inventors herein have thus recognized the need for balancing privacy and confidentiality with efficiency and accuracy, which pose significant challenges to the design of search schemes for a number of search scenarios and large data collections.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for confidentiality-preserving rank-ordered search
  • System and method for confidentiality-preserving rank-ordered search
  • System and method for confidentiality-preserving rank-ordered search

Examples

Experimental program
Comparison scheme
Effect test

case 1

[0042] The content owner wants to search for some documents stored at the data center. He / she has a limited bandwidth connection with the data center, and needs to search through the encrypted content without downloading the entire collection. Furthermore, the content owner does not trust the data center with his / her unencrypted content. He / she wants to remotely search and retrieve top-ranked relevant documents without revealing the search terms, document content, and / or document index information to the data center. This scenario will be referred to as the confidentiality preserving baseline model, as discussed below, where the scheme enables both the confidentiality protection and the use of term frequency (discussed below) to achieve secure and efficient retrieval.

case 2

[0043] Next, consider the scenario where a user, who is not the content owner, wants to search for a particular phrase in the set of confidential documents held by the data center. This scenario may arise in a number of cases, for example, where the user may be a scholar or a low-level analyst who wants to search relevant documents from a private / classified collection, and may need clearance only for the top-ranked documents. The user may also be the opposing side in a litigation requesting relevant documents from a digital collection (e.g. e-mails) be turned in by the content owner's side. In general, the content owner does not trust the data center with the document content or the term frequency values. However, it is considered herein that the data center has a secure computing unit (SCU), which is trusted by the content owner to some degree. Depending on the level of trust on the SCU by the content owner, the following exemplary scenarios are identified:

case 2a

[0044] the content owner trusts the SCU both with the plain-text documents and the associated term-frequency table (discussed below).

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A confidentiality preserving system and method for performing a rank-ordered search and retrieval of contents of a data collection. The system includes at least one computer system including a search and retrieval algorithm using term frequency and/or similar features for rank-ordering selective contents of the data collection, and enabling secure retrieval of the selective contents based on the rank-order. The search and retrieval algorithm includes a baseline algorithm, a partially server oriented algorithm, and/or a fully server oriented algorithm. The partially and/or fully server oriented algorithms use homomorphic and/or order preserving encryption for enabling search capability from a user other than an owner of the contents of the data collection. The confidentiality preserving method includes using term frequency for rank-ordering selective contents of the data collection, and retrieving the selective contents based on the rank-order.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S)[0001]This application claims the benefit of provisional patent application U.S. Ser. No. 61 / 109,291, filed Oct. 29, 2008, which is expressly incorporated herein by reference.GOVERNMENT SUPPORT CLAUSE[0002]This invention was made with government support under H9823005C0425 awarded by NSA. The government has certain rights in the invention.BACKGROUND OF INVENTION[0003]a. Field of Invention[0004]This invention relates to information search and retrieval. In particular, the instant invention relates to a system and method for information search and retrieval in large-scale encrypted databases, with a particular embodiment employing a confidentiality-preserving rank-ordered search.[0005]b. Background Art[0006]In today's information era, efficient and effective search capability of digital collections is essential in information management and knowledge discovery. At the same time, many data collections have to be stored in an encrypted form to li...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/30H04L9/00
CPCG06F17/30666G06F21/6227H04L2209/60H04L9/3236H04L9/008G06F16/3335G06F16/24578G06F16/48G06F16/951G06F21/6218
Inventor SWAMINATHAN, ASHWINMAO, YINIANSU, GUAN-MINGGOU, HONGMEIVARNA, AVINASH L.HE, SHANWU, MINOARD, DOUGLAS W.
Owner UNIV OF MARYLAND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products