Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

46 results about "Approximate string matching" patented technology

In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.

Method and system for approximate string matching

A method and system are provided for approximate string matching of a target string to a trie data structure. The trie data structure has a root node and generations of child nodes each node representing at least one character in an alphabet to provide a lexicon of words and word fragments. The method involves traversing the trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string and adding characters traversed in a branch of the trie data structure to a gathered string to provide suggestions of approximate matches. If the method reaches a node flagged as a node for a word or a word fragment and, if the target string is longer than the gathered string, the method loops back to the root node, and continues the traverse from the root node. This enables the trie data structure to use word fragments for compound words and to split non-delimited words where appropriate. The method also includes, at each node, determining if there is a correction rule for one or more characters in the remainder of the target string from the current node, and if so, applying the correction rule to the target string to obtain a modified target string.
Owner:IBM CORP

Methods and systems for implementing approximate string matching within a database

A computer-based method for character string matching of a candidate character string with a plurality of character string records stored in a database is described. The method includes a) identifying a set of reference character strings in the database, the reference character strings identified utilizing an optimization search for a set of dissimilar character strings, b) generating an n-gram representation for one of the reference character strings in the set of reference character strings, c) generating an n-gram representation for the candidate character string, d) determining a similarity between the n-gram representations, e) repeating steps b) and d) for the remaining reference character strings in the set of identified reference character strings, and f) indexing the candidate character string within the database based on the determined similarities between the n-gram representation of the candidate character string and the reference character strings in the identified set.
Owner:MASTERCARD INT INC

Methods and systems for implementing approximate string matching within a database

A computer-based method for character string matching of a candidate character string with a plurality of character string records stored in a database is described. The method includes performing a clustering operation on at least a portion of the plurality of character string records, the clustering operation generating a plurality of clusters, each cluster comprising a plurality of character strings from the plurality of character string records, the plurality of character strings in each cluster are determined to be similar with respect to each other based on at least one characteristic of the plurality of character strings. The method also includes generating a set of reference character strings that are selected from the plurality of character strings in each cluster, generating an n-gram representation for one of the reference character strings in the set of reference character strings, and generating an n-gram representation for the candidate character string.
Owner:MASTERCARD INT INC

Chinese word segmentation algorithm based on reverse maximum matching

The invention discloses a Chinese word segmentation algorithm based on reverse maximum matching, which comprises the following steps: initializing three objects in a memory; inputting the contents of a text which needs word segmentation; splitting characters in the text into different types according to character codes; directly adding characters which are not Chinese characters to word segmentation results according to the character codes after the text is segmented into short sentences; splitting the short sentences into character sets according to a character string matching and decision-making mechanism; matching the character sets with character sets in a word segmentation dictionary based on the reverse maximum matching algorithm; storing matched character sets into a word segmentation result set; combining consecutive unmatched characters; and adding the consecutive unmatched characters to the word segmentation results to complete word segmentation. A quick word segmentation algorithm based on dictionaries is provided, and the dictionary loading efficiency and the word segmentation efficiency are greatly improved while word segmentation accuracy is ensured.
Owner:BEIJING JINHER SOFTWARE

Method and system for approximate string matching

A method and system for approximate string matching are provided for generating approximate matches whilst supporting compounding and correction rules. The method for approximate string matching of an input pattern to a trie data structure, includes traversing a trie data structure to find approximate partial and full character string matches of the input pattern. Traversing a node of the trie data structure to process a character of the string applies any applicable correction rules to the character, wherein each correction rule has an associated cost, adjusted after each character processed. The method includes accumulating costs as a string of characters is gathered, and restricting the traverse through the trie data structure according to the accumulated cost of a gathered string and potential costs of applicable correction rules.
Owner:IBM CORP

Hybrid approach to approximate string matching using machine learning

Systems, apparatuses, and methods are provided for identifying a corresponding string stored in memory based on an incomplete input string. A system can analyze and produce phonetic and distance metrics for a plurality of strings stored in memory by comparing the plurality of strings to an incomplete input string. These similarity metrics can be used as the input to a machine learning model, which can quickly and accurately provide a classification. This classification can be used to identify a string stored in memory that corresponds to the incomplete input string.
Owner:VISA INT SERVICE ASSOC

A rapid fuzzy matching algorithm for strings in mass audio data

InactiveCN106528599ASupport searchSupport matchingSpecial data processing applicationsChinese charactersShort string
The invention provides a rapid fuzzy matching algorithm for strings. According to the invention, firstly data preprocessing is performed on texts in a database to obtain a statistical model and an index is established via Hash. An input text is a shorter string. The algorithm traverses all Chinese characters therein, activates the positions of corresponding Chinese characters in a finite character complete set, and maps the activation state of the finite character complete set to each tag to filter tags. A few filtered tags are used for matching the texts and the DTW algorithm is used for approximate string matching. The algorithm also comprises the steps of performing scoring and sorting according to the result of the degree of approximation of matching and returning to a search result. Through the efficient tag filtering method, the calculation efficiency of the string matching algorithm is greatly increased; in a process of input text matching, a fuzzy matching effect is achieved and a good matching performance is guaranteed for fuzzy languages.
Owner:深圳凡豆信息科技有限公司

Image similarity detection using approximate pattern matching

ActiveUS8175387B1Efficiently detect similaritySimple and non-resource intensiveCharacter and pattern recognitionPattern matchingByte
Two images are compared to determine how similar they are. First, a process normalizes each image, then horizontal and vertical byte sequences are derived from each image. A similarity formula is used to obtain a similarity value that represents the similarity between the two images. An approximate pattern matching algorithm is used to determine the error distance between the horizontal byte sequences for the images and to determine the error distance between the vertical byte sequences for the images. The error distances and the length of the byte sequences are used to determine the similarity value. Padding is used to make the aspect ratios the same.
Owner:TREND MICRO INC

Multidimensional spatial searching for identifying duplicate crash dumps

A method of identifying duplicate crash dumps in a computer system may include receiving a first crash dump caused by an application crash, extracting a first function signature of a function that caused the first crash dump, and searching a datastore of crash dumps for function signatures that substantially match the first function signature. The searching may include performing an approximate string-match between each of the function signatures the first function signature and performing an exact string match between each of the function signatures and the first function signature. The searching may also include combining weighted results of the approximate string-match with weighted results of the exact string match to generate match scores for each of the function signatures, and identifying the function signatures that substantially match the first function signature based on the match scores.
Owner:ORACLE INT CORP

Text clustering method and system

The invention relates to a text clustering method and system. The text clustering method comprises the following steps: keywords of to-be-classified texts are extracted when the to-be-classified texts are received; the keywords of the to-be-classified texts are matched according to the obtained keywords in a final word bag, and the type tag of the to-be-classified text is obtained; the final word bag is obtained by sorting and screening the key words in various type tag word bags according to preset selection rules; the type tag word bags are sets of key words generated after key word extraction from texts corresponding to type tags. The key words corresponding to each tag are extracted through records of existing tags, the final word bag is obtained, to-be-classified texts are classified according to the key words in the final word bag, good adaptability to noise data is realized, and the condition that the accuracy is reduced substantially under the condition of more noise is avoided; an approximate string matching effect is improved greatly through large-range thresholding of a centroid.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD

Data synchronization using string matching

The present invention relates to the technical field of data or file synchronization. In particular, the present invention relates to a method and system for data synchronization using character string matching. Provided are a method, computer program product, and system for data synchronization between a source node and target node. An old copy and a new copy of data to be synchronized is received. A block map is generated according to the difference determined using character string matching between the old copy and the new copy. The block map, which includes the position information of unchanged blocks and the position information and contents of changed blocks, is transmitted to a target node.
Owner:IBM CORP

Method and device for auditing phone bills with different sources

The invention discloses a method and a device for auditing phone bills with different sources. The method comprises the following steps: respectively picking phone bills from a billing system and a settlement system, and obtaining business state distribution of same businesses in picked phone bills; comparing the business state distribution with existing stable state distribution; if an undulate valve between the business state distribution and the existing stable state distribution is larger than a set value, enabling the picked phone bills to be treated as abnormal phone bills. The method and the device for auditing the phone bills with the different sources support flexible treatment to various special circumstances, and are high in operating efficiency, support approximate string matching auditing, is flexible in auditing rules, and have configurable and expansible capabilities.
Owner:中国移动通信集团甘肃有限公司

Electronic text document plagiarism recognition method based on similar string matching distance

InactiveCN101441620AStrong structural recognition abilityPlagiarism meetsSpecial data processing applicationsTheoretical computer scienceDocument preparation
The invention relates to a method for identifying plagiarism of an electronic text document. The method mainly identifies the plagiarism through the approximate string matching distance of a subparagraph. The method to identify whether a document A plagiarizes a document B comprises the following specific steps: firstly, the approximate string matching distance and an approximate matching segment of each paragraph of the document A in the document B are calculated; secondly, according to the approximate matching segment, the retroversion number and the forward jumping number are calculated; the retroversion number refers to the number of generation that the head part of the next approximate matching segment is positioned before the tail part of the last approximate matching segment or the total number of passing segments; the forward jumping number refers to the number of generation that the next approximate matching segment is behind the last approximate matching segment and at least has distance of one segment with the last approximate matching segment or the total number of the alternate segments; and finally, the sum of the approximate string matching distance, the retroversion number and the forward jumping number are summed; the sum is taken as the plagiarism distance of the document A to the document; and if the distance is less than certain threshold value, the document A is suspected of plagiarizing the document B.
Owner:WENZHOU UNIVERSITY

Approximate functional matching in electronic systems

Methods and apparatuses for approximate functional matching are described including identifying functionally similar subsets of an integrated circuit design or software program, distinguishing control inputs of the subsets from data inputs, and assigning combinations of logic values to the input control signals to capture co-factors for functional matching.
Owner:SYNOPSYS INC

Method for calculating similarity of Geographic Information System (GIS) vector data image watermarks

The invention discloses a method for calculating the similarity of Geographic Information System (GIS) vector data image watermarks belonging to the geographic information version protection field. The method comprises the following steps of: correcting the position of an extracted watermark W' by means of an original watermark W such that the disordered pixels of a version image return to the right positions thereof; and then performing similarity calculation on the corrected watermark W' and the original watermark W by employing a dynamic programming algorithm of approximate character-string matching. The method disclosed by the invention is capable of accurately correcting the pixels of the extracted image watermark to the right positions, visually reflecting the tampered positions of data and objectively measuring the similarity of the original watermark and the extracted watermark; therefore, the quality of watermark authentication is improved to a certain extent, the omission factor of the watermark authentication is reduced, and the theory and method system of the geographic information version protection is completed; and the method can be applied to the aspects of the version protection technology and secure transmission of the GIS vector data.
Owner:NANJING NORMAL UNIVERSITY

Method and device for character string matching

ActiveCN107545071AMeet matching needsImprove the efficiency of multi-pattern matchingSpecial data processing applicationsTheoretical computer scienceMulti segment
The invention discloses a method and device for character string matching. The method includes the steps that an AC state machine with fuzzy nodes is initialized, wherein the AC state machine generates regular nodes based on the non-wildcard relationship between the characters contained in each rule character string, and generates the corresponding fuzzy nodes according to the wildcard relationship between characters; target character strings are entered into the AC state machine, each character in the target character strings is compared with the corresponding character of each node in the ACstate machine, one or more rule character strings matching the target strings is determined, and a corresponding operation is performed according to the matched rule character strings. According to the technical scheme, after obtaining the target character strings, the target character strings are entered into the AC state machine to match, the matched one or more rule character strings in the target character string is determined, the multi-segment fuzzy matching is achieved, the flexible ability defined by the rule character strings is guaranteed, and the application demand is satisfied.
Owner:北京神州泰岳智能数据技术有限公司

Method for accelerating character string matching by trans-border protection mechanism

The invention provides a method which uses a boundary violation protection mechanism for accelerating the matching of character strings. A tail position of a text is obtained according to the length of the text to be matched, and the last end character of the text is assumed to be positioned at the position of loc; an isolation word of one character is arranged in the position of loc plus 1, and the isolation word is any character that does not appear in a mode; a copy mode is connected to the position of loc plus 2 of the text; a normal character string matching is implemented without checking whether a subscript crosses a boundary; whether a subscript crosses the boundary or not is judged in front of the matching position of an output mode, if the subscript does not cross the boundary, the matching position is output, and if the subscript crosses the boundary, the matching action is then finished. The method of the invention has no relation with the concrete realization of the matching of the character strings and is a general improved method for present matching problems of various character strings. The output action after the mode matching in the whole string matching process is the action with the lowest frequency of all the actions appearing in the string matching process. Therefore, the method of the invention can minimize the total number of the examination operations for the subscript boundary violation.
Owner:HARBIN ENG UNIV

Systems and methods for building an electronic dictionary of multi-word names and for performing fuzzy searches in the dictionary

The present invention automatically builds a contracted dictionary from a given list of multi-word proper names and performs fuzzy searches in the contracted dictionary. The contracted dictionary of proper names includes two linked trie-based dictionaries: a first dictionary is used to store single word names, each word name having an ID number; and a second dictionary is used to store multi-word names encoded with ID numbers. Information related to the multi-word names is also stored as a gloss to the terminal node of the multi-word entry of the trie-based dictionary. An approximate lookup for a multi-word name is conducted first for each word of the multi-word name using an approximate matching technique such as a phonetic proximity or a simple edit distance. Accordingly, N suggestions is determined for each word of the multi-word name under consideration. Then, multi-word candidates are assembled in ID notation. Finally, an approximate search for each assembled candidate is performed based on an edit distance or a n-grams approximate string matching. Edit distances and N-grams are used to measure how similar two strings are. The result is a set of multi-word suggestions in an ID notation. This ID notation is encoded back to the original form using the first trie-based dictionary.
Owner:IBM CORP

Kinship analysis method based on household registration information data

The invention provides a kinship analysis method based on household registration information data. The method comprises the following steps: S1, carrying out encoding of basic relationships in the kinship through letters and numeric characters, so as to obtain a character code set for the basic relationships; S2, determining connection symbols, positive relationships and reverse relationships, wherein the connection symbols are symbols connecting character codes corresponding to the basic relationships, the known kinship is defined as one of the positive relationships, and a relationship opposite to each positive relationship is defined as one of the reverse relationships; S3, obtaining a character string of the kinship to be analyzed according to data of the kinship to be analyzed and through the character codes, the connection symbols, and the reverse relationships; S4, carrying out simplification of the character string according to simplification rules, so as to obtain a new character string, of which the length is smaller than the length of the original character string; and S5, carrying out character string matching of the simplified new character string according to matching rules, so as to obtain analysis results of the kinship to be analyzed.
Owner:ENC DATA SERVICE CO LTD

System and method for variant string matching

A method, computer program product, and system for variant string matching. A computer implemented method for variant string matching may comprise comparing with a computing device two unidentical strings in a training variant string pair. The two unidentical strings may represent the same item from training data, which may be stored in a memory. The two unidentical strings may be compared to determine if they include an identical substring pair, and a first unidentical substring pair. The computer implemented method may also determine if the first unidentical substring pair includes a first unidentical substring and a second unidentical substring. The computer implemented method may further determine if the first unidentical substring pair is in the training data. The first unidentical substring pair may be entered into the training data as a first variant string pair if it is not in the training data.
Owner:SRA INTERNATIONAL

Font information fusion-based medicine-taking bill recognition result error correction method

The invention relates to a font information fusion-based medicine-taking bill recognition result error correction method and belongs to the field of character recognition. The method comprises the following steps of: constructing a standard medicine word bank, storing each piece of medicine information in the word bank in a BK tree memory structure as a node, setting a search distance threshold n, reducing a data search scale through a threshold search rule, and obtaining a result candidate set; carrying out similarity matching on a character string to be corrected after character recognition and a character string in a result candidate set, improving a traditional editing distance formula on the basis of an original similarity matching scheme, keeping the insertion and deletion operation cost unchanged, and reducing character replacement cost; during character replacement operation, considering relevant information of three fonts including five-stroke codes, four-corner codes and strokes, and improving character string approximate matching precision; and replacing the character string with the highest similarity as an error correction result. According to the method of the invention, a medicine-taking bill identification result is corrected, so that the medicine-taking bill identification accuracy is improved.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

State tree matching method capable of finishing integer matching

The invention relates to a state tree matching method capable of finishing integer matching and finishing the numerical value matching of integers when finishing the universal parallel mode matching, in particular to a method used for intrusion monitoring and auditing of a computer or network and based on data monitoring. The method comprises the following steps: integer defining mode reading, state tree producing, data reading, mode matching and result reporting. The invention can finish the integer matching when finishing the character string matching, thereby increasing the matching speed, quickening the data detection and auditing speed, reducing the hardware expenses and improving the data detection and auditing efficiency.
Owner:BEIJING VENUS INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products