Pattern search method, pattern search apparatus and computer program therefor, and storage medium thereof
a pattern search and pattern technology, applied in the field of pattern search methods, pattern search apparatus and computer programs therefor, can solve the problems of large database, difficult to employ suffix trees for large text databases, etc., and achieve the effect of reducing the data size of the data structur
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Benefits of technology
Problems solved by technology
Method used
Image
Examples
example 1
[0122]Case wherein a character is represented by one byte and there are 256 types of characters (255 types when the end character $ is to be represented at the same time). In general, English text corresponds to this case.
[0123]When the character count of the text T is defined as n, the size of the text T is n bytes, and the size of the suffix array SA is 4n bytes. For example, when k=65536 (=216) is employed, the numbers equal to or smaller than k can be represented by two bytes. Thus, the total size of the tables F, L, G and C is a little more than 2n bytes. Therefore, the data size, even including the text T, the suffix array SA and the text T, is only a little over 7n bytes, which is only about one third of the size of the suffix tree (20n to 40n bytes) that corresponds to the text T. Since the search speed is proportional to log k, the speed can be increased by reducing the value of k. When, for example, k=256 (=28) is set, the twice the search speed can be expected than when k...
example 2
[0124]Case where a character is represented by two bytes, and there are 65536 (=216) character types. Japanese text corresponds to this case. When k=65536 is set, the total size of the tables F, L, G and C is 8n bytes, and the total data size, even including the text T and the suffix array SA, is only 14n bytes. It should be noted that in this case a small value of k, such as k=256, is not preferable because the data size will be increased.
example 3
[0125]Case of a DNA array (the number of character types is four). If the use of 2-bit characters and 4-bit characters is permitted, with k=4 the total data size for the tables F, L, G and C, the text T and the suffix array SA will be approximately 8.75n bytes. Further, when k=16, the total data size is about 5.375n bytes. The data size, especially in the second case, is substantially no different from the size of the suffix array SA.
[0126]An example for measuring the search speed for an actual DNA array will be explained. In this example, the calculation times are compared when the search method of this embodiment and of the conventional method for a binary search of the suffix array SA are employed, and the same query is repeated 10000000 times for all the arrays of a colon bacillus. It should be noted that an RS6000 (a workstation by IBM), which was equipped with a 333 MHz Power PC as the CPU, was employed for the calculations.[0127]Search pattern P=“CACATAA”[0128]Search time req...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com