Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

46 results about "Approximate string matching" patented technology

In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.

Electronic text document plagiarism recognition method based on similar string matching distance

InactiveCN101441620AStrong structural recognition abilityPlagiarism meetsSpecial data processing applicationsTheoretical computer scienceDocument preparation
The invention relates to a method for identifying plagiarism of an electronic text document. The method mainly identifies the plagiarism through the approximate string matching distance of a subparagraph. The method to identify whether a document A plagiarizes a document B comprises the following specific steps: firstly, the approximate string matching distance and an approximate matching segment of each paragraph of the document A in the document B are calculated; secondly, according to the approximate matching segment, the retroversion number and the forward jumping number are calculated; the retroversion number refers to the number of generation that the head part of the next approximate matching segment is positioned before the tail part of the last approximate matching segment or the total number of passing segments; the forward jumping number refers to the number of generation that the next approximate matching segment is behind the last approximate matching segment and at least has distance of one segment with the last approximate matching segment or the total number of the alternate segments; and finally, the sum of the approximate string matching distance, the retroversion number and the forward jumping number are summed; the sum is taken as the plagiarism distance of the document A to the document; and if the distance is less than certain threshold value, the document A is suspected of plagiarizing the document B.
Owner:WENZHOU UNIVERSITY

Method for calculating similarity of Geographic Information System (GIS) vector data image watermarks

The invention discloses a method for calculating the similarity of Geographic Information System (GIS) vector data image watermarks belonging to the geographic information version protection field. The method comprises the following steps of: correcting the position of an extracted watermark W' by means of an original watermark W such that the disordered pixels of a version image return to the right positions thereof; and then performing similarity calculation on the corrected watermark W' and the original watermark W by employing a dynamic programming algorithm of approximate character-string matching. The method disclosed by the invention is capable of accurately correcting the pixels of the extracted image watermark to the right positions, visually reflecting the tampered positions of data and objectively measuring the similarity of the original watermark and the extracted watermark; therefore, the quality of watermark authentication is improved to a certain extent, the omission factor of the watermark authentication is reduced, and the theory and method system of the geographic information version protection is completed; and the method can be applied to the aspects of the version protection technology and secure transmission of the GIS vector data.
Owner:NANJING NORMAL UNIVERSITY

Method for accelerating character string matching by trans-border protection mechanism

The invention provides a method which uses a boundary violation protection mechanism for accelerating the matching of character strings. A tail position of a text is obtained according to the length of the text to be matched, and the last end character of the text is assumed to be positioned at the position of loc; an isolation word of one character is arranged in the position of loc plus 1, and the isolation word is any character that does not appear in a mode; a copy mode is connected to the position of loc plus 2 of the text; a normal character string matching is implemented without checking whether a subscript crosses a boundary; whether a subscript crosses the boundary or not is judged in front of the matching position of an output mode, if the subscript does not cross the boundary, the matching position is output, and if the subscript crosses the boundary, the matching action is then finished. The method of the invention has no relation with the concrete realization of the matching of the character strings and is a general improved method for present matching problems of various character strings. The output action after the mode matching in the whole string matching process is the action with the lowest frequency of all the actions appearing in the string matching process. Therefore, the method of the invention can minimize the total number of the examination operations for the subscript boundary violation.
Owner:HARBIN ENG UNIV

Systems and methods for building an electronic dictionary of multi-word names and for performing fuzzy searches in the dictionary

The present invention automatically builds a contracted dictionary from a given list of multi-word proper names and performs fuzzy searches in the contracted dictionary. The contracted dictionary of proper names includes two linked trie-based dictionaries: a first dictionary is used to store single word names, each word name having an ID number; and a second dictionary is used to store multi-word names encoded with ID numbers. Information related to the multi-word names is also stored as a gloss to the terminal node of the multi-word entry of the trie-based dictionary. An approximate lookup for a multi-word name is conducted first for each word of the multi-word name using an approximate matching technique such as a phonetic proximity or a simple edit distance. Accordingly, N suggestions is determined for each word of the multi-word name under consideration. Then, multi-word candidates are assembled in ID notation. Finally, an approximate search for each assembled candidate is performed based on an edit distance or a n-grams approximate string matching. Edit distances and N-grams are used to measure how similar two strings are. The result is a set of multi-word suggestions in an ID notation. This ID notation is encoded back to the original form using the first trie-based dictionary.
Owner:IBM CORP

Kinship analysis method based on household registration information data

The invention provides a kinship analysis method based on household registration information data. The method comprises the following steps: S1, carrying out encoding of basic relationships in the kinship through letters and numeric characters, so as to obtain a character code set for the basic relationships; S2, determining connection symbols, positive relationships and reverse relationships, wherein the connection symbols are symbols connecting character codes corresponding to the basic relationships, the known kinship is defined as one of the positive relationships, and a relationship opposite to each positive relationship is defined as one of the reverse relationships; S3, obtaining a character string of the kinship to be analyzed according to data of the kinship to be analyzed and through the character codes, the connection symbols, and the reverse relationships; S4, carrying out simplification of the character string according to simplification rules, so as to obtain a new character string, of which the length is smaller than the length of the original character string; and S5, carrying out character string matching of the simplified new character string according to matching rules, so as to obtain analysis results of the kinship to be analyzed.
Owner:ENC DATA SERVICE CO LTD

Font information fusion-based medicine-taking bill recognition result error correction method

The invention relates to a font information fusion-based medicine-taking bill recognition result error correction method and belongs to the field of character recognition. The method comprises the following steps of: constructing a standard medicine word bank, storing each piece of medicine information in the word bank in a BK tree memory structure as a node, setting a search distance threshold n, reducing a data search scale through a threshold search rule, and obtaining a result candidate set; carrying out similarity matching on a character string to be corrected after character recognition and a character string in a result candidate set, improving a traditional editing distance formula on the basis of an original similarity matching scheme, keeping the insertion and deletion operation cost unchanged, and reducing character replacement cost; during character replacement operation, considering relevant information of three fonts including five-stroke codes, four-corner codes and strokes, and improving character string approximate matching precision; and replacing the character string with the highest similarity as an error correction result. According to the method of the invention, a medicine-taking bill identification result is corrected, so that the medicine-taking bill identification accuracy is improved.
Owner:CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products