Full text query and search systems and methods of use
A technology of retrieval system and search engine, which is applied in the field of information technology and software, and can solve problems that are difficult, cannot realize user intentions, and have a large number of hits
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
example 1
[0118] Example 1: Implementation of Theoretical Model
[0119] Details of a specific embodiment of the search engine of the present invention will be disclosed in this section.
[0120] 1. Introduce the flatDB program
[0121] FlatDB is a set of C language programs that work with flat file databases. That is, they are tools that can handle flat text files with large data contents. The file format can be various, such as table form, XML format, FASTA format, and any form, as long as there is a unique original key. Typical applications include large sequence databases (genpept, dbEST), human gene ranking or other gene banks, PubMed, Medline, etc.
[0122] In the settings of the tool, there is an indexing program, a retrieval program, an inserting program, an updating program, and a deleting program. Also, for very large entries, there is a procedure for retrieving a specific part of the entry. Unlike SQL, FlatDB does not support links between different files. For example, ...
example 2
[0219] Example 2: A database example for Medline
[0220] Here is a list of database files, which have been processed:
[0221] 1) Medline.raw raw database downloaded from NLM in XML format.
[0222] 2) Medline.fasta processed database
[0223] Follow FASTA format for parsed entries
[0224] >primary_id author.(year) title.journal.column: page number-page number
[0225] word1(freq)word2(freq)...
[0226] Words are picked out by features.
[0227] 3) Medline.pid2bid mapping between primary_id(pid) and binary_id(pid)
[0228] Medline.bid2pid mapping between binary_id and primary_id
[0229] primary_id is defined as a FASTA file. It is a unique identifier used by Medline. binary_id is an assigned id, we use it to save space.
[0230] Medline.pid2bid is a tabular format file. Format: primary_id binary_id (selected by primary_id)
[0231] Medline.bid2pid is a tabular format file. Format: binary_id primary_id (selected by binary_id)
[0232] ...
example 3
[0257] Example 3: How to generate a phrase dictionary
[0258] 1. Theoretical Aspects of Phrase Search
[0259] A phrase search is when a search is performed using a string of words (not a single word). Example: A person might look up information about teenage abortion. Each of these words has a different meaning when taken alone, and retrieves a lot of unrelated documents, but when you combine them one by one their meaning changes to very accurately the "teenage abortion" idea. From this perspective, phrases contain more information than combinations of individual words.
[0260] In order to perform a phrase search, we need to first generate a phrase dictionary, and a distribution function for any given database, just as we have for individual words. A programmatic method for generating a phrase distribution for any given text database is disclosed herein. From a completely theoretical point of view, for any 2 words, 3 words, ..., K words, the frequency of each ca...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com