Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

An on-line string matching method without gap constraint

A string matching, gapless technology, applied in the field of online string matching without gap constraints, can solve the problems of incomplete character matching, difficult to effectively control space and time overhead, and can not solve frequent patterns well, etc. Achieve the effect of achieving high efficiency, solving space overhead and time overhead, and ensuring completeness

Active Publication Date: 2019-01-25
HEBEI UNIV OF TECH
View PDF20 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] Two conditions are very important in pattern matching: one is to meet the completeness requirements of the solution; the other is to improve the speed of solution and reduce the space overhead. Satisfy these two conditions at the same time, for example: the online method proposed in the literature "an efficient on-line algorithm for approximate pattern matching with wildcards and Length constraints, IEEE." published by Wu Xindong and Zhu Xinquan, although get rid of the traditional offline technology, use the online The method of reading characters saves a lot of time, but because it is aimed at one-time conditions, it cannot solve the problem of frequent patterns well; the literature published by Huang Guolin, Guo Dan, and Hu Xuegang "Approximate patterns based on wildcards and length constraints Matching method, computer application." Proposed an approximate pattern matching method based on EDM, which can handle three editing operations in approximate matching, namely insertion, replacement and deletion operations, but in the process of solving approximate matching, the incomplete matching of characters , so that it can only be a loose match; Wu Youxi and Shen Cong published the document "Strict pattern matching under non-overlapping condition, Science China Information Sciences." A pattern matching method under the network tree structure is proposed, and each character is represented by a node The form is stored in the tree structure, and the nodes that meet the conditions are established as a parent-child relationship, which solves the completeness of the solution under the condition of no overlap, and it has a lot of advantages compared with other methods under various constraints. Good performance, but when dealing with non-overlapping problems with no gap constraints, the space overhead of this method increases rapidly, resulting in a sharp drop in solution speed; the literature published by Wu Youxi and Tong Yao "NOSEP: Nonoverlapping Sequence Pattern Mining with GapConstraints. IEEE Transactions on Cybernetics." The content reported in it is to use non-overlapping constraints for sequential pattern mining, which belongs to the application of string matching. This document needs to pre-set the gap constraints, and then carry out sequential pattern mining
[0014] In short, to solve the problem of non-overlapping pattern matching without gap constraints, it is difficult to effectively control the problem of space overhead and time overhead on the basis of ensuring completeness in the existing technology, and there is no good method to solve this problem so far. class questions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An on-line string matching method without gap constraint
  • An on-line string matching method without gap constraint
  • An on-line string matching method without gap constraint

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0087] Example of biological sequence matching: in the DNA sequence, the biological sequence is composed of four bases, a, c, g, and t, and the given biological sequence string is S=s 1 the s 2 the s 3 the s 4 the s 5 the s 6 the s 7 the s 8 the s 9 the s 10 =acgatgacgg, the given pattern string is P=p 1 p 2 p 3 p 4 = aagg.

[0088] The first step is to read in the pattern string P and create multiple queues:

[0089] Read in pattern string P=p 1 p 2 p 3 p 4 =aagg, determine that the length of the pattern string P is 4, and the characters of each pattern substring in the pattern string P are respectively p 1 ,p 2 ,p 3 ,p 4 , and establish four queues for the pattern string P, and the numbers of these queues are queue 1, queue 2, queue 3, and queue 4 respectively, that is, p 1 = a for queue 1, p 2 = a for queue 2, p 3 = g for queue 3, p 4 = g is queue 4;

[0090] The second step is to read the given sequence string S in sequence:

[0091] Sequentially ...

Embodiment 2

[0144] Example of shopping psychological matching: in order to discover the relationship between behaviors from multiple purchase behaviors of users, so as to take more effective targeted measures, the types of goods purchased by customers are symbolized as a, b, c, d, e , f, g. The symbolized sequence string S=s of the product purchased by a customer 1 the s 2 the s 3 the s 4 the s 5 the s 6 the s 7 the s 8 the s 9 the s 10 =adgacgacef, given pattern string P=p 1 p 2 p 3 p 4 =agac, the meaning of its representation is the situation of buying c after buying a, g and a in turn.

[0145] The first step is to read in the pattern string P and create multiple queues:

[0146] Read in pattern string P=p 1 p 2 p 3 p 4 =agac, determine the length of the pattern string P to be 4, and the characters of each pattern substring in the pattern string P are respectively p 1 ,p 2 ,p 3 ,p 4 , and establish four queues for the pattern string P, and the numbers of these que...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an on-line string matching method without gap constraint, which relates to the technical field of electric digital data processing. The on-line mode is used for processing thepattern matching problem without gap constraint and without overlap, that is, characters at the same position in the sequence string can be matched at different positions of the pattern string. The method comprises the following steps: reading a pattern string P to establish a plurality of queues; reading a given sequence string S according to a pre-sequence and a post-sequence; determining whether queue i is capable of creating a node; determining whether the occurrence of a non-overlapping condition can be constituted, and when the occurrence of a condition is constituted, outputting on thedisplay until all the characters in the sequence string S are processed. The invention overcomes the defect that it is difficult to effectively control the space cost and the time cost on the basis of ensuring the completeness of the prior art, and not only improves the solving speed, but also ensures the completeness of the solution.

Description

technical field [0001] The technical solution of the present invention relates to the technical field of electrical digital data processing, in particular to an online string matching method without gap constraint. Background technique [0002] With the continuous progress of society and the vigorous development of the computer field, data processing has gradually become a research hotspot, and it is particularly important to retrieve more useful information from the data. Furthermore, researchers characterize the information in the data , and make statistics on it, so the technology of string matching will emerge. With the development of technology, researchers will find out all the substrings in a certain string that are the same as a given substring. Defined as string matching or pattern matching, string matching or pattern matching has a wide range of practical applications, not only for simple biological sequence matching, but also for shopping psychological matching in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/2458
Inventor 武优西王建姣刘靖宇张帅柴欣朱怀忠李艳
Owner HEBEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products