Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

336 results about "Reverse index" patented technology

Database management systems provide multiple types of indexes to improve performance and data integrity across diverse applications. Index types include b-trees, bitmaps, and r-trees. In database management systems, a reverse key index strategy reverses the key value before entering it in the index. E.g., the value 24538 becomes 83542 in the index. Reversing the key value is particularly useful for indexing data such as sequence numbers, where each new key value is greater than the prior value, i.e., values monotonically increase. Reverse key indexes have become particularly important in high volume transaction processing systems because they reduce contention for index blocks.

Method For Information Retrieval

A method of retrieving documents using a search engine includes providing a reverse index including one or more keywords and a list of documents containing the one or more keywords, the reverse index further including a measure of confidence (MOC) value associated with the one or more keywords. One or more query terms are input into the search engine. The query terms are disambiguated and a MOC value is associated with each meaning of the disambiguated query term. A list of documents is retrieved containing the query terms wherein the documents are initially ranked based at least in part on the MOC values of the keywords and query terms. The list of documents may be re-ranked based at least in part on the semantic similarity of each document to the disambiguated query terms.
Owner:RGT UNIV OF CALIFORNIA

File system with access and retrieval of XML documents

An XML-aware file system exploits attributes encoded in an XML document. The file system presents a dynamic directory structure to the user, and breaks the conventional tight linkage between sets of files and the physical directory structure, thus allowing different users to see files organized in a different fashion. The dynamic structure is based upon content, which is extracted using an inverted index according to attributes and values defined by the XML structure.
Owner:IBM CORP

Tool for automatically mapping multimedia annotations to ontologies

A tool for learning to relate annotations and transcript of a multimedia sequence to nodes in a formally or semi-formally represented ontology covering a broad range of possible multimedia documents. The device includes learning data preparation that involves certain special techniques for deriving data from the past mappings of annotations to nodes in an ontology, building inverted indices maintaining certain special statistics and a retriever that exploits these special statistics to rank the relevance of the nodes in an ontology for a given a set of new annotations.
Owner:KNUMI

Systems and methods for indexing content for fast and scalable retrieval

InactiveUS20050198076A1Fast and efficient and scalable retrievalFast and efficient generationDigital data information retrievalDigital data processing detailsPaper documentReverse index
Systems and methods for query processing and indexing of documents in connection with a content store in a computing system are provided. In various embodiments, an indexing model is provided that is optimized for fast, efficient and scalable retrieval of documents satisfying a query, including the mixed use of forward and inverted indexing representations, including algorithms for achieving a balance between the two representations. When processing queries, fast and efficient generation of reverse chronologically ordered posting lists is enabled for efficient execution of logical operators on query result sets. A term expand index is also provided wherein the overall terms included in the term expand index are decomposed into a plurality of lexicon files, which are combined when convenient for fast, scalable efficiency when performing queries of the content in the content store.
Owner:R2 SOLUTIONS

File system with access and retrieval of XML documents

A XML-aware file system exploits attributes encoded in a XML document. The file system presents a dynamic directory structure to the user, and breaks the conventional tight linkage between sets of files and the physical directory structure, thus allowing different users to see files organized in a different fashion. The dynamic structure is based upon content, which is extracted using an inverted index according to attributes and values defined by the XML structure. In one application, a dynamically changing federated repository is searchable using a system of local and merged master indices, wherein query results are presented as virtual directory paths that are semantically organized.
Owner:IBM CORP

Systems and methods for indexing content for fast and scalable retrieval

InactiveUS20050120004A1Fast and efficient and scalable retrievalFast and efficient generationData processing applicationsDigital data processing detailsPaper documentReverse index
Systems and methods for query processing and indexing of documents in connection with a content store in a computing system are provided. In various embodiments, an indexing model is provided that is optimized for fast, efficient and scalable retrieval of documents satisfying a query, including the mixed use of forward and inverted indexing representations, including algorithms for achieving a balance between the two representations. When processing queries, fast and efficient generation of reverse chronologically ordered posting lists is enabled for efficient execution of logical operators on query result sets. A term expand index is also provided wherein the overall terms included in the term expand index are decomposed into a plurality of lexicon files, which are combined when convenient for fast, scalable efficiency when performing queries of the content in the content store.
Owner:R2 SOLUTIONS

Method for segmenting and indexing scenes by combining captions and video image information

The invention relates to a method for segmenting and indexing scenes by combining captions and video image information. The method is characterized in that: in the duration of each piece of caption, a video frame collection is used as a minimum unit of a scene cluster. The method comprises the steps of: after obtaining the minimum unit of the scene cluster, and extracting at least three or more discontinuous video frames to form a video key frame collection of the piece of caption; comparing the similarities of the key frames of a plurality of adjacent minimum units by using a bidirectional SIFT key point matching method and establishing an initial attribution relationship between the captions and the scenes by combining a caption related transition diagram; for the continuous minimum cluster units judged to be dissimilar, further judging whether the minimum cluster units can be merged by the relationship of the minimum cluster units and the corresponding captions; and according to the determined attribution relationships of the captions and the scenes, extracting the video scenes. For the segments of the extracted video scenes, the forward and reverse indexes, generated by the caption texts contained in the segments, are used as a foundation of indexing the video segments.
Owner:INST OF ACOUSTICS CHINESE ACAD OF SCI

Indexing and searching entity-relationship data

Method, system, and computer program product for indexing and searching entity-relationship data are provided. The method includes: defining a logical document model for entity-relationship data including: representing an entity as a document containing the entity's searchable content and metadata; dually representing the entity as a document and as a category; and representing each relationship instance for the entity as a category set that contains categories of all participating entities in the relationship. The method also includes: translating entity-relationship data into the logical document model; and indexing the entity-relationship data of the populated logical document model as an inverted index. The method may include searching indexed entity-relationship data using a faceted search, wherein the categories are all categories required for supporting faceted navigation.
Owner:IBM CORP

International information search and delivery system providing search results personalized to a particular natural language

Documents containing information about product offerings in various natural languages are passed through transitional translation layers which convert the data to a single computer language using a universal character set encompassing the character sets used in all supported natural languages. The documents are stored in their original natural languages and in English with documents segmented into components which components are identified by search terms arranged in a taxonomy tree based on product types. The names of the products in the national languages are added to the English language documents enabling quick keyword searches when the product name or number is known. A bi-directional inverted index is provided for access by the keyword search terms so that keywords with the same meaning in different languages are accessible together when the keyword in one of the languages is queried.
Owner:IBM CORP

Large-scale human face image searching method

The invention discloses a large-scale human face image searching method. The method comprises the following steps of preprocessing human face images; extracting local characteristics from the human face images; extracting overall geometrical characteristics from the human face images; quantifying the local characteristics; quantifying the overall geometrical characteristics; establishing a reverse index; searching a candidate human face image set; and re-arranging the candidate human face image set. By the method, an index for a large-scale human face image database can be established, quick human face research is realized, and the research efficiency is realized. In addition, the accuracy of human face research is improved by embedding an auxiliary information characteristic quantifying and candidate human face image set re-arranging algorithm. Effective and accurate large-scale human face image search is realized by the method, so that the method has higher use value.
Owner:NANJING UNIV

Music retrieval system based on audio fingerprint features

The invention belongs to the technical field of information retrieval, and particularly relates to a music retrieval system based on audio fingerprint features. The system is composed of a preprocessing module, a feature extraction module, a reverse index module and a fine matching module. The preprocessing module mainly carries out audio signal conversion, resampling and filtering; the feature extraction module is used for representing audio files, wherein the audio fingerprint features are adopted to select the most stable point from a frequency spectrum as the feature point through twice screening based on dynamic threshold values, and each feature is represented by a dot pair; according to the reverse index module, the features are used as key words, reverse indexes are built according to the features of a song library, and the index result is returned according to the number of the same key words; according to the fine matching module, the sequential relationship of the audio features is combined, an improved editing distance is adopted as the similarity of two feature sequences, and therefore the index result is optimized. The music retrieval system based on the audio fingerprint features is suitable for the retrieval of a large number of songs, and can particularly conduct effective retrieval on record inquiry segments.
Owner:FUDAN UNIV

System and Method for Providing a Trustworthy Inverted Index to Enable Searching of Records

A trustworthy inverted index system processes records to identify features for indexing, generates posting lists corresponding to features in a dictionary, maintains in a storage cache a tail of at least one of the posting lists to minimize random I / Os to the index, determines a desired number of the posting lists based on a desired level of insertion performance, a query performance, or a size of the storage cache, and reads a posting list corresponding to a search feature in a query to identify records that comprise the search feature. The system maps the features in the dictionary to the desired number of posting lists. The system uses a jump pointer to point from one entry to the next in the posting lists based on increasing values of entries in the posting lists.
Owner:LINKEDIN

Index generation method and index generation device based on MapReduce programming architecture

The invention relates to an index generation method and an index generation device based on a MapReduce programming architecture. The index generation method comprises the following steps of: acquiring data, preparing the data into a unified format and storing the prepared data in a record set formula; carrying out head encapsulation on each data record in the record set; inserting the data records subjected to data encapsulation into an HBase cluster in batch; calling a MapReduce service and an HBase service in an Hadoop cluster and connecting an Solr cluster; carrying out MapReduce operation and submitting an operation index parallel generating task to form a reverse index intermediate file; carrying out Reduce operation to generate a reverse index file; and starting a new Map task for carrying out slit operation on the reverse index file to generate a final index. According to the index generation method and the index generation device, disclosed by the invention, the storage of high-efficiency distributed mass data and the establishment of the index can be realized; and in addition, the index generation method and the index generation device have the advantages of extensibility, high fault tolerance, high performance and the like.
Owner:XIAMEN MEIYA PICO INFORMATION

Click distance determination

An efficient determination of a click distance value is made for each document in a corpus of documents from data included in a locally-stored inverted index. The click distance is measurement of the number clicks or user navigations from a first document on the network to another document. Specialized words are included in the locally-stored inverted index. The specialized words relate source documents to a set of target documents. A click distance is assigned to a source document when an inverted index is queried for the corresponding set of target documents according to a query that passes in one of the specialized words. The process is repeated for each document in the corpus of documents.
Owner:MICROSOFT TECH LICENSING LLC

Indexing and caching strategy for local queries

The claimed subject matter relates to a computer-implemented architecture that can, at a high level, store query results in a location-independent manner in order to facilitate caching of local results. To store query results in a location-independent manner such that cached results to location-based queries can be useful, the architecture can further include a mechanism for encoding a surface or area (e.g., the earth) based upon document density rather than geography. The encoding mechanism can also organize an inverted index so that no join operation is required to return valid results to a location-based query. The architecture can also include a mechanism for determining when previously cached results are adequate to satisfy a query.
Owner:MICROSOFT TECH LICENSING LLC

Microblog-oriented emotion entity searching system

The invention relates to a microblog-oriented emotion entity searching system. The emotion entity searching system comprises a user interface (1), a query expansion module (2), a query processing module (3), an emotive information mining module (4), an emotive information judging and index building module (5) and a reverse index building module (6). The user interface (1) is used for interaction between a user and the system, and the user can submit a query request through the user interface and obtain a feedback result; the query expansion module (2) is used for carrying out word relation mining on microblog corpus data and building a weighting word relation graph in combination with a WordNet ontology base; the query processing module (3) is used for converting the query request of the user into query key words or query statements and for carrying out query expansion on the basis of the word relation graph built by the query expansion module (2), wherein the query key words or the query statements can be accepted by an index base; the emotive information mining module (4) is used for performing emotion mining on the microblog corpus base and generating a judging rule for emotion entities and emotion polarities; the emotive information judging and index building module (5) is used for judging the emotion entities and emotion polarities, building an emotive information index and storing the emotive information index; the reverse index building module (6) is used for building a reverse index for microblog text information and storing the reverse index. The microblog-oriented emotion entity searching system solves the problems that difficulty exists in microblog emotion entity extraction, emotion polarity analysis, emotion entity search and the like, and a novel intelligent searching product is provided for analyzing and monitoring social networking public opinions.
Owner:FOSHAN UNIVERSITY +2

Inverted indices in information extraction to improve records extracted per annotation

A method is provided for information extraction from among a multiplicity of documents each having a corresponding document object model (DOM) comprising: computing signatures associated with nodes of a multiplicity of DOMs corresponding to the multiplicity of documents; producing an index that associates computed signatures to each document that has a DOM that has one or more nodes corresponding to such signature; annotating one or more nodes of a DOM that corresponds to the at least one selected document; wherein the one or more annotated nodes respectively correspond to one or more respective signatures included in the index; and matching the signatures that correspond to the annotated nodes with signatures in the index to determine which documents from the multiplicity of documents have one or more DOM nodes that correspond to one or more of the annotated nodes.
Owner:R2 SOLUTIONS

Reverse index mixed compression and decompression method based on Hbase database

The invention discloses a reverse index mixed compression method based on an Hbase database. The reverse index mixed compression method comprises the steps of processing the Hbase database to obtain an Hbase database reversed index data table including keys and values; compressing the key part by a key dictionary compression method; compressing the value part by a variable bytecode compression method; and writing the compressed content into files. The invention also discloses a decompression method of the compressed file key part after being compressed by the compression method. The decompression method comprises the steps of judging the length of each compressed data item, processing to obtain decompressed data according to two conditions of the length being less than or equal to 13, and the length being more than or equal to 25, otherwise, failing to decompress. According to the method adopts the classification mixed compression and the classification decompression method, the compression ratio is improved on the condition that the high decompression ratio is ensured possibly; the unified considerations of file reading and data decompression can be achieved; and the query efficiency of the reverse index can be improved completely and the storage space can be saved.
Owner:CHENGDU UNIV OF INFORMATION TECH

Automatic question and answer method and apparatus, and storage medium

Embodiments of the invention disclose an automatic question and answer method and apparatus, and a storage medium. The method comprises the steps of adopting multiple question and answer pairs formedbased on social data in a social platform, wherein the question and answer pairs comprise questions and corresponding answers; then, establishing reverse indexes of the questions and corresponding word groups; obtaining a retrieval question; according to a question word group of the retrieval question and the reverse indexes, determining similar questions similar to the retrieval question; according to the similar questions and the question and answer pairs, obtaining candidate answers of the retrieval question, thereby obtaining a candidate answer set of the retrieval question; and selectinga target answer of the retrieval question from the candidate answer set. According to the scheme, the answer matched with the retrieval question can be output, so that the accuracy and quality of output answers of a chat robot system are improved.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Two-level n-gram index structure and methods of index building, query processing and index derivation

Disclosed relates to a structure of two-level n-gram inverted index and methods of building the same, processing queries and deriving the index that reduce the size of n-gram inverted index and improves the query performance by eliminating the redundancy of the position information that exists in the n-gram inverted index. The inverted index of the present invention comprises a back-end inverted index using subsequences extracted from documents as a term and a front-end inverted index using n-grams extracted from the subsequences as a term. The back-end inverted index uses the subsequences of a specific length extracted from the documents to be overlapped with each other by n−1 (n: the length of n-gram) as a term and stores position information of the subsequences occurring in the documents in a posting list for the respective subsequences. The front-end inverted index uses the n-grams of a specific length extracted from the subsequences using a 1-sliding technique as a term and stores position information of the n-grams occurring in the subsequences in a posting list for the respective n-grams.
Owner:KOREA ADVANCED INST OF SCI & TECH

Cryptogram-based safe full-text indexing and retrieval system

The invention discloses a cryptogram-based safe full-text indexing and retrieval system. In the system, a cryptogram index library comprises a cryptogram entry reverse index and an internal document object set; a cryptogram document library is responsible for storing and managing an encrypted XML document; a word segmentation encryption server carries out Chinese word segmentation on a plaintext document and encrypts the plaintext document item by item; a cryptogram full-text indexing server standardizes an original plaintext document into an XML document, encrypts and stores the XML document in the cryptogram document library, creates a corresponding internal document object in the cryptogram index library by combining document metamessage, and creates a cryptogram reverse index for the XML document through the cryptogram entry; and a cryptogram full-text retrieval server retrieves the cryptogram index library to obtain the internal document object set through user authority information and the cryptogram entry, obtains a corresponding encrypted XML document result set from the cryptogram document library according to a pointer, decrypts the corresponding encrypted XML document result set, and returns the decrypted corresponding encrypted XML document result set to a user. The Chinese word segmentation method, the safe and high-efficiency indexing structure and the retrieval mechanism of the invention based on the special requirements of cryptogram full-text indexing can realize the cryptogram full-text indexing integrated with an access control strategy. The cryptogram-based safe full-text indexing and retrieval system has the advantages of a safe and high-efficiency indexing process, no decrypted docuterms in the indexing process, a high recall ratio and a high precision ratio in a cryptogram environment, and the like.
Owner:HUAZHONG UNIV OF SCI & TECH

System and method for distributed index searching of electronic content

InactiveUS20100094877A1Constant run-time searchConstant bandwidth hitWeb data retrievalDigital data processing detailsTraffic capacityDistributed index
There are provided methods and systems for efficient search in a peer-to-peer network topology. In various embodiments, search methods and systems provide for response times and network traffic that are independent from the number of query terms, thereby producing constant run-time searches and bandwidth hits in a P2P network search implementation. By distributing inverse indexes between peers, and storing with each inverse index a Bloom filter populated with selected keywords, multi-term search and analysis can be conducted on one network node without requiring exchange of posting lists between various network nodes.
Owner:GARBE WOLF

Method based on Hadoop small file optimization and reverse index establishment

The invention discloses a method based on Hadoop small file optimization and reverse index establishment. A large number of small files can be uploaded to an HDFS, and reverse indexes can be established for the files in the HDFS. The method comprises small file optimization and reverse index establishment and mainly comprises the steps that (1) a user uploads a large number of small files corresponding to HDFS blocks in size to small file queues in Hadoop; (2) the size of the small files in the file queues is calculated regularly, (3) the files, meeting requirements, in the small file queues are combined through the Sequence file method and then are uploaded to the HDFS; (4) the reverse indexes are established for the files in the HDFS. According to the method based on Hadoop small file optimization and reverse index establishment, the defect that the Hadoop small file process is inconvenient is overcome, the processing performance of the small files can be optimized, the memory is released, and the retrieval speed and efficiency are improved.
Owner:SOUTHEAST UNIV

Distributed type reverse index organization method based on user log analysis

The invention discloses a distributed type reverse index organization method based on user log analysis. The distributed type reverse index organization method comprises the following steps: 1) analyzing query logs of the user, extracting high-frequency words and non-high-frequency words, establishing a relativity matrix of the high-frequency words, and establishing a high-frequency word relation graph according to the relativity of the high-frequency words; 2) calculating the load of each high-frequency word, and clustering the high-frequency words according to the high-frequency word relation graph and the loads of the high-frequency words; 3) distributing the clusters to nodes, establishing a high-frequency word index, hashing non-high-frequency words to the nodes, and establishing a non-high-frequency word index; 4) establishing a global index table according to the high-frequency word index and the non-high-frequency word index, and inquiring routes according to the global index table. The distributed type reverse index organization method disclosed by the invention has the advantages of small query cost, high query efficiency, and favorable query performance, and also has the advantages that the distributed type reverse index organization method can realize the balance of the throughput of the entire system and the query response speed of each time, and less nodes is referred during the query of a plurality of words.
Owner:ZHEJIANG UNIV

Multimedia courseware retrieval system based on voice keyword recognition

InactiveCN103956166AGood effectThe effect is: the use of speech recognition technology to automatically retrieve goodSpeech recognitionSpecial data processing applicationsHabilitation trainingHide markov model
The invention provides a multimedia courseware retrieval system based on voice keyword recognition. Firstly, a backstage converts pre-provided text knowledge points into voice models, courseware is labeled by a voice recognizing technique based on a hidden markov model to locate the accurate positions of the knowledge points in the multimedia courseware, and a reverse index based on keywords is constructed and maintained in an index module. When a user inputs text keywords in a prompt box to inquire, results in the index are extracted to be displayed to the user if the keywords are previously labeled. If the keywords are not labeled, the system can retrieve the courseware in real time, waits for feedback of the user to the results and makes statistics for the feedback information. Self-adaption training is carried out on the keywords to label the courseware again and upgrade the indexes. Compared with a traditional network course learning style system, the courseware retrieval system can quickly search for and locate the keywords of the knowledge points, improves retrieval accuracy through user interaction and finally improves the learning efficiency of students effectively.
Owner:EAST CHINA UNIV OF SCI & TECH

A method for storage and near-real time query of time-sensitive data based on open source big data

The invention provides a method for storage and near-real time query of time-sensitive data based on open source big data. The method comprises the steps of establishing a near-real time query processing platform having an internal storage space and an external storage space; defining a file storage strategy and performing data processing and calculating on source data files in the internal storage space so that the source data files are stored in the external storage space after being arranged according to the time-sensitive characteristics thereof; performing reverse index with the time-sensitive characteristics of the data files as the filter conditions, establishing point index and range index to generate index information and storing the index information into the external storage space and caching the information into the internal storage space; inquiring the index information and searching the point index or range index to obtain relevant file path lists, and reading source data files corresponding to query requests according to the file path lists. Fully based on the time-sensitive characteristics, the data filter strategy is designed to reduce data scanning quantity, and thus the near-real time query feedback of big data is realized.
Owner:EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products