Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

125 results about "Approximate matching" patented technology

Approximate Matching. Approximate matching is a term used in computer forensics to mean that two objects have similar contents but are not identically the same. It replaced the previously used terms similarity and fuzzy hashing.

Associating and linking compact disc metadata

Improved techniques for enhancing, associating, and linking various sources of metadata for music files, to allow integration of commercially generated metadata with user-entered metadata, and to ensure that metadata provided to the user is of the highest quality and accuracy available, even when the metadata comes from disparate sources having different levels of credibility. The invention further provides improved techniques for identifying approximate matches when querying metadata databases, and also provides improved techniques for accepting user submissions of metadata, for categorizing user submissions according to relative credibility, and for integrating user submissions with existing metadata.
Owner:R2 SOLUTIONS

Associating and linking compact disc metadata

Improved techniques for enhancing, associating, and linking various sources of metadata for music files, to allow integration of commercially generated metadata with user-entered metadata, and to ensure that metadata provided to the user is of the highest quality and accuracy available, even when the metadata comes from disparate sources having different levels of credibility. The invention further provides improved techniques for identifying approximate matches when querying metadata databases, and also provides improved techniques for accepting user submissions of metadata, for categorizing user submissions according to relative credibility, and for integrating user submissions with existing metadata.
Owner:R2 SOLUTIONS

Intelligent data storage and processing using fpga devices

A data storage and retrieval device and method is disclosed. The device includes at least one magnetic storage medium configured to store target data and at least one re-configurable logic device comprising an FPGA coupled to the at least one magnetic storage medium and configured to read a continuous stream of target data therefrom, having been configured with a template or as otherwise desired to fit the type of search and data being searched. The re-configurable logic device is configured to receive at least one search inquiry in the form of a data key and to determine a match between the data key and the target data as it is being read from the at least one magnetic storage medium. This device and method can perform a variety of searches on the target data including without limitation exact and approximate match searches, sequence match searches, image match searches and data reduction searches. This device and method may be provided as part of a stand-alone computer system, embodied in a network attached storage device, or can otherwise be provided as part of a computer LAN or WAN. In addition to performing search and data reduction operations, this device may also be used to perform a variety of other processing operations including encryption, decryption, compression, decompression, and combinations thereof.
Owner:IP RESERVOIR

Method and system for approximate string matching

A method and system are provided for approximate string matching of a target string to a trie data structure. The trie data structure has a root node and generations of child nodes each node representing at least one character in an alphabet to provide a lexicon of words and word fragments. The method involves traversing the trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string and adding characters traversed in a branch of the trie data structure to a gathered string to provide suggestions of approximate matches. If the method reaches a node flagged as a node for a word or a word fragment and, if the target string is longer than the gathered string, the method loops back to the root node, and continues the traverse from the root node. This enables the trie data structure to use word fragments for compound words and to split non-delimited words where appropriate. The method also includes, at each node, determining if there is a correction rule for one or more characters in the remainder of the target string from the current node, and if so, applying the correction rule to the target string to obtain a modified target string.
Owner:IBM CORP

Method and System for High Performance Data Metatagging and Data Indexing Using Coprocessors

ActiveUS20080114725A1Robust and high performance data searchingHigh indexWeb data indexingFile access structuresData streamCoprocessor
Disclosed herein is a method and system for hardware-accelerating the generation of metadata for a data stream using a coprocessor. Using these techniques, data can be richly indexed, classified, and clustered at high speeds. Reconfigurable logic such a field programmable gate arrays (FPGAs) can be used by the coprocessor for this hardware acceleration. Techniques such as exact matching, approximate matching, and regular expression pattern matching can be employed by the coprocessor to generate desired metadata for the data stream.
Owner:IP RESERVOIR

Intelligent data storage and processing using fpga devices

A data storage and retrieval device and method is disclosed. The device includes at least one magnetic storage medium configured to store target data and at least one re-configurable logic device comprising an FPGA coupled to the at least one magnetic storage medium and configured to read a continuous stream of target data therefrom, having been configured with a template or as otherwise desired to fit the type of search and data being searched. The re-configurable logic device is configured to receive at least one search inquiry in the form of a data key and to determine a match between the data key and the target data as it is being read from the at least one magnetic storage medium. This device and method can perform a variety of searches on the target data including without limitation exact and approximate match searches, sequence match searches, image match searches and data reduction searches. This device and method may be provided as part of a stand-alone computer system, embodied in a network attached storage device, or can otherwise be provided as part of a computer LAN or WAN. In addition to performing search and data reduction operations, this device may also be used to perform a variety of other processing operations including encryption, decryption, compression, decompression, and combinations thereof.
Owner:IP RESERVOIR

Intelligent data storage and processing using FPGA devices

A data storage and retrieval device and method is disclosed. The device includes at least one magnetic storage medium configured to store target data and at least one re-configurable logic device comprising an FPGA coupled to the at least one magnetic storage medium and configured to read a continuous stream of target data therefrom, having been configured with a template or as otherwise desired to fit the type of search and data being searched. The re-configurable logic device is configured to receive at least one search inquiry in the form of a data key and to determine a match between the data key and the target data as it is being read from the at least one magnetic storage medium. This device and method can perform a variety of searches on the target data including without limitation exact and approximate match searches, sequence match searches, image match searches and data reduction searches. This device and method may be provided as part of a stand-alone computer system, embodied in a network attached storage device, or can otherwise be provided as part of a computer LAN or WAN. In addition to performing search and data reduction operations, this device may also be used to perform a variety of other processing operations including encryption, decryption, compression, decompression, and combinations thereof.
Owner:IP RESERVOIR

System and method for normalization of a string of words

The present invention relates generally to a system and method for categorization of strings of words. More specifically, the present invention relates to a system and method for normalizing a string of words for use in a system for categorization of words in a predetermined categorization scheme. A method for adaptive categorization of words in a predetermined categorization scheme may include receiving a string of text, tagging the string of text, and normalizing the string of text. Normalization may be performed with a three-stage algorithm including a literal match processing stage, an approximation match processing stage, and a nearest neighbor match processing stage. The normalized string of text can be compared to a number of sequences of text in the predetermined categorization scheme.
Owner:NUANCE COMM INC

System and method for dynamic learning

New language constantly emerges from complex, collaborative human-human interactions like meetings—such as when a presenter handwrites a new term on a whiteboard while saying it redundantly. The system and method described includes devices for receiving various types of human communication activities (e.g., speech, writing and gestures) presented in a multimodally redundant manner, includes processors and recognizers for segmenting or parsing, and then recognizing selected sub-word units such as phonemes and syllables, and then includes alignment, refinement, and integration modules to find or at least an approximate match to the one or more terms that were presented in the multimodally redundant manner. Once the system has performed a successful integration, one or more terms may be newly enrolled into a database of the system, which permits the system to continuously learn and provide an association for proper names, abbreviations, acronyms, symbols, and other forms of communicated language.
Owner:ADAPX INC

Fuzzy matching-based Chinese geo-code determination method

The method discloses a fuzzy matching-based Chinese geo-code determination method, which comprises the following steps: A1, reading descriptive Chinese address information in and adopting a forward maximum searching method to split an original address to obtain an original address element array in a way that the levels of administrative regions are taken as breakpoints; A2, standardizing original address elements through an address dictionary; and A3, reading a standard address tree, adopting a branch-bound algorithm to match the original address element array, simultaneously, utilizing fuzzy rules to control the matching operation, and after acquiring keywords after the original address is split, taking a matching result with the highest evaluation score as the most approximate matching result to obtain a more accurate matched address. The invention provides the fuzzy matching-based Chinese geo-code determination method, which has the advantages of rational address model, relatively higher matching rate and high speed.
Owner:ZHEJIANG UNIV OF TECH

Fuzzy search using progressive relaxation of search terms

Disclosed herein is a computer implemented method and system that progressively relaxes search terms provided by a user. Data of predefined types is stored in a database. The data is obtained by uniquely modifying data previously stored in the database, based on the predefined types. Search terms of predefined types are accepted from the user. The search terms are compared with the stored data to find exact matches, if length of the search terms exceeds a predefined value. On not finding exact matches, the accepted search terms are modified uniquely based on the predefined types to structure first alternative queries. The first alternative queries are compared with the stored data to find exact matches. On not finding exact matches, the first alternative queries are modified based on the predefined types to structure second alternative queries. The second alternative queries are compared with the stored data to find approximate matches.
Owner:DEEM

Methods and arrangements including data migration among computing platforms, e.g. through use of steganographic screen encoding

An illustrative implementation of the technology includes three primary components: a desktop application, a mobile phone application, and connections to retailer inventory and pricing APIs (e.g., for Walmart and / or Best Buy). The experience begins with the consumer going to an online retailer's website (e.g., Amazon) to search for a product. The desktop application automatically searches for product matches using the APIs of affiliated retailers. If matches and near-matches of the product are found, the product name, model, price, and local availability at affiliate locations is shown. With a mobile phone camera-scan of the product page, relevant information is transferred to the consumer's phone. From there, the consumer can interact with the options on the mobile phone to be directed to the nearby brick and mortar store of choice carrying that product at the price they want. Along the way, the retailer can present offers and additional product information directly to the consumer. A great variety of other technologies and arrangements are also detailed.
Owner:DIGIMARC CORP

Selecting Representative Images for Establishments

Establishments are identified in geo-tagged images. According to one aspect, text regions are located in a geo-tagged image and text strings in the text regions are recognized using Optical Character Recognition (OCR) techniques. Text phrases are extracted from information associated with establishments known to be near the geographic location specified in the geo-tag of the image. The text strings recognized in the image are compared with the phrases for the establishments for approximate matches, and an establishment is selected as the establishment in the image based on the approximate matches. According to another aspect, text strings recognized in a collection of geo-tagged images are compared with phrases for establishments in the geographic area identified by the geo-tags to generate scores for image-establishment pairs. Establishments in each of the large collection of images as well as representative images showing each establishment are identified using the scores.
Owner:GOOGLE LLC

Sensitive word filtering method and system

The invention relates to the character string multi-mode matching field, and discloses a sensitive word filtering method. The sensitive word filtering method comprises the steps of performing management on Chinese, English, website sensitive words and excluding words; performing a character normalization processing method; performing a group of filtering policies and realization method for sensitive words in different existence forms, at least comprising a filtering step for Chinese, English, websites, full spelling, pinyin compiling and anagram; setting a group of criterion rules for sensitive words; and performing an approximate matching method for Chinese sensitive words. The invention also discloses a sensitive word filtering apparatus. According to the sensitive word filtering method and apparatus, the requirements of a content administrator and a searcher on issued or searched text filtering sensitive words can be satisfied; filtering for a large amount of sensitive words can be carried out rapidly and accurately; and the sensitive words, the level of the sensitive words and the positions of the sensitive words in the can be returned to the caller.
Owner:北京中科汇联科技股份有限公司

Two-stage data validation and mapping for database access

Input data queries directed at a plurality of target databases and originating from any of a plurality of sources are first converted to validated canonical forms, which are then used to query the target databases. Specifically, upon receiving an input data query, a relatively accurate reference database is selected based on the type of the input data. This reference is then queried for the input data with the intent of finding an exact matching record or a near-matching record that can be considered an exact match and thereby validating the input data. Otherwise, the requesting source is instructed to provide a new query. Once having a validated record, it is converted to a canonical form, which is then used to query the target databases intended to be searched. In a further embodiment, multiple reference databases are queried to determine a canonical form of the data or to determine multiple canonical forms of the data.
Owner:NYTELL SOFTWARE LLC

Method and apparatus for finding differences between two computer files efficiently in linear time and for using these differences to update computer files

A method of updating a computer file from an old file into a new file comprises blocking the new file and the old file into fixed-size blocks, maintaining a window (collection of contiguous blocks) for each file on which lookup preprocessing has been performed, and performing match processing on each new file block in turn (comparing against both the old and new windows) using a key-sampling technique combined with approximate matching. For each new file block, the match information is then optimized for coding efficiency and encoded into a patch file that describes an algorithm for converting the old file into the new file. The patch file application method and apparatus then performs the algorithm described in the patch file. The method uses a fixed amount of random-access memory regardless of the sizes of the two files and uses no temporary mass storage. In addition, the method has a running time roughly proportional to the size of the new file and allows the use of parallel processing to reduce the time required. The system and method produce patch files which are smaller than prior systems and methods, and allow the operator of the apparatus to perform an efficiency / effectiveness trade-off.
Owner:POCKET SOFT

Systems and methods for building an electronic dictionary of multi-word names and for performing fuzzy searches in the dictionary

The present invention automatically builds a contracted dictionary from a given list of multi-word proper names and performs fuzzy searches in the contracted dictionary. The contracted dictionary of proper names includes two linked trie-based dictionaries: a first dictionary is used to store single word names, each word name having an ID number; and a second dictionary is used to store multi-word names encoded with ID numbers. Information related to the multi-word names is also stored as a gloss to the terminal node of the multi-word entry of the trie-based dictionary. An approximate lookup for a multi-word name is conducted first for each word of the multi-word name using an approximate matching technique such as a phonetic proximity or a simple edit distance. Accordingly, N suggestions is determined for each word of the multi-word name under consideration. Then, multi-word candidates are assembled in ID notation. Finally, an approximate search for each assembled candidate is performed based on an edit distance or a n-grams approximate string matching. Edit distances and N-grams are used to measure how similar two strings are. The result is a set of multi-word suggestions in an ID notation. This ID notation is encoded back to the original form using the first trie-based dictionary.
Owner:IBM CORP

System and Method for Matching Data Using Probabilistic Modeling Techniques

A system and method for matching data using probabilistic modeling techniques is provided. The system includes a computer system and a data matching model / engine. The present invention precisely and automatically matches and identifies entities from approximately matching short string text (e.g., company names, product names, addresses, etc.) by pre-processing datasets using a near-exact matching model and a fingerprint matching model, and then applying a fuzzy text matching model. More specifically, the fuzzy text matching model applies an Inverse Document Frequency function to a simple data entry model and combines this with one or more unintentional error metrics / measures and / or intentional spelling variation metrics / measures through a probabilistic model. The system can be autonomous and robust, and allow for variations and errors in text, while appropriately penalizing the similarity score, thus allowing dataset linking through text columns.
Owner:OPERA SOLUTIONS U S A LLC

Interface for relating clusters of data objects

Data objects are related by comparing attributes of data objects that belong to different clusters and determining that the data objects are an approximate match based on the comparison. Data elements corresponding to assignments of an identifier are generated, and the data elements are stored in a grouping.
Owner:ROVI TECH CORP

Method and system for approximate string matching

A method and system for approximate string matching are provided for generating approximate matches whilst supporting compounding and correction rules. The method for approximate string matching of an input pattern to a trie data structure, includes traversing a trie data structure to find approximate partial and full character string matches of the input pattern. Traversing a node of the trie data structure to process a character of the string applies any applicable correction rules to the character, wherein each correction rule has an associated cost, adjusted after each character processed. The method includes accumulating costs as a string of characters is gathered, and restricting the traverse through the trie data structure according to the accumulated cost of a gathered string and potential costs of applicable correction rules.
Owner:IBM CORP

Image stitching real-time performance optimization method

The invention belongs to the technical field of computer vision, and particularly relates to an image stitching real-time performance optimization method. The method comprises the following steps of image NCC region matching, SURF (speeded up robust feature) threshold value estimation and feature point matching. Under the condition of precisely solving the transformation matrix, through algorithm optimization, the detection feature point number is greatly reduced; meanwhile, the image overlapping region size is pre-estimated through a local region matching algorithm NCC; the feature point search range in the image stitching process is reduced through locking the overlapping region. The NCC algorithm obtains a cross-correlation maximum value window for estimating the approximate matching condition of the image; the feature point finding among local images is avoided; the real-time performance of the image stitching is improved through the overlaying use of two methods.
Owner:湖州清舟船舶科技有限公司

System and method for approximate searching very large data

The invention provides efficient searching with fuzzy criteria in very large information systems. The technique of the present invention uses the Pigeonhole Principle approach. This approach can be utilized with different embodiments. but the most effective realization would be to amplify some already given intrinsic approximate matching capabilities, like those in the FuzzyFind method [1][2]. Considering the following problem, data to be searched is presented as a bit-attribute vector. The searching operation includes finding a subset of this bit-attribute vector that is within particular Flamming distance. Normally, this search with approximate matching criteria requires sequential lookup for the whole collection of the attribute vector. This process can be easily parallelized, but in very large information systems this still would be slow and energy consuming. The present invention provides approximate search in very large files using the Pigeonhole Principle, circumvents the sequential search operations and reduces the calculations tremendously.
Owner:GEORGE WASHINGTON UNIVERSITY

Selecting candidate rows for deduplication

The present invention extends to methods, systems, and computer program products for selecting candidate records for deduplication from a table. A table can be processed to compute an inverse index for each field of the table. A deduplication algorithm can traverse the inverse indices in accordance with a flexible user-defined policy to identify candidate records for deduplication. Both exact matches and approximate matches can be found.
Owner:MICROSOFT TECH LICENSING LLC

Image compression by comparison to large database

A method of reducing storage capacity needed to store a target image in a large database of images is presented. A target image is uploaded from a client system to a server system, which are connected through an Internet connection. An image index is queried to find an approximate match to the target image and to identify a most similar reference image to a processed target image stored in an image database. The difference between the target image and a raw image corresponding to the identified most similar reference image is encoded. A pointers corresponding to the processed reference image stored in the image index is updated to reflect the newly stored target image in the image index.
Owner:OATH INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products