One-time approximate mode matching method with local-integral constraints

A pattern matching and overall technology, applied in special data processing applications, other database retrieval, other database indexing, etc., can solve problems such as lack of generality, non-redundancy of solution sets, inability to deal with data noise problems, etc.

Active Publication Date: 2019-09-13
HEBEI UNIV OF TECH
View PDF14 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0027] Pattern matching problems generally require generality, accuracy, and flexibility, and the solution set must be non-redundant, which can reduce the difficulty of data analysis and processing, but the existing related technologies are difficult to meet these conditions at the same time, for example, Wu et al. The paper "A Heuristic Method for Solving the MPMGOOC Problem, Journal of Computers." Based on the network tree structure, the pattern matching with gap constraints and one-time conditions is studied. This paper proposes a heuristic method, that is, through the most The right parent strategy and the greedy search parent strategy to find the optimal occurrence have improved the quality of the solution compared with other methods, and are of reference value for solving other complex problems. However, this method studies exact pattern matching and cannot deal with data noise. Not general; the document "SAIL-APPROX: An Efficient On-Line Algorithm for Approximate Pattern Matching with Wildcards and Length Constraints, IEEE." Approximate pattern matching, and proved the correctness and effectiveness of the proposed method, but this literature studies the approximate pattern matching under Hamming distance, Hamming distance does not consider the local constraints between sequences, which will lead to huge Bias, not accurate; Tang et al. published the document "Approximate pattern matching with gap constraints, Journal of Information Science." studied approximate pattern matching with gap constraints, and proposed an efficient solution method based on a single net tree, Compared with exact pattern matching, more valuable information can be found in many fields, but this literature studies approximate pattern matching without special conditions, which allows characters at any position in the sequence to be used multiple times, resulting in patterns in The number of occurrences in the sequence increases exponentially with the length of the pattern, which increases the complexity of the problem; the literature published by Liu et al. "An improved BM pattern matching method, computer engineering." On the basis of the BM method, By judging whether there are continuous characters in the pattern, and then changing the comparison order of the pattern, the matching efficiency of the BM method is improved. However, this document matches continuous characters, without gap constraints, and lacks flexibility.
[0028] In short, for the one-shot approximate pattern matching problem with local-global constraints, the existing one-shot approximate pattern matching technology is difficult to take into account the generality, accuracy and flexibility of the solution, and the solution set must be non-redundant. A good way to solve this kind of problem appears

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • One-time approximate mode matching method with local-integral constraints
  • One-time approximate mode matching method with local-integral constraints
  • One-time approximate mode matching method with local-integral constraints

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0113] Given a period of time sequence, its symbolized sequence S=aababccc through the SAX (symbolic aggregation approximation) method, given pattern P=a[0,2]b[0,2]c[0,2]c, Given a local threshold δ=1, a global threshold γ=1.

[0114] The first step is to determine the number of layers of the net tree:

[0115] Read in given sequence S=aababccc, its length is 8, read in given pattern P=a[0,2]b[0,2]c[0,2]c, its length is 4, the pattern P Each sub-pattern is denoted as sub-pattern p 1 = a, subpattern p 2 = b, subpattern p 3 = c, subpattern p 4 = c, 4 sub-patterns in total, then the network tree has 4 layers, which are respectively recorded as the first layer, the second layer, the third layer, and the fourth layer;

[0116] The second step is to create a network tree and calculate the nodes array of tree root paths and an array of leaf paths

[0117] Given the local threshold δ and the overall threshold γ, where 0≤δ≤γ, and create a network tree according to the seque...

Embodiment 2

[0192] In addition to "the third step, using the network tree structure to solve the one-time approximate pattern matching problem under (δ, γ)-distance, the heuristic method is used to find the root leaf path in the network tree that satisfies the local-global constraint. This paper In the embodiment, the second one of four similar strategies for finding occurrences using the heuristic method, the leftmost parent strategy and the greedy search parent strategy: the leftmost parent strategy refers to starting from the first node of the leaf layer, Under the condition of satisfying the local-global constraint, the leftmost parent node of the current node is preferentially selected. At this time, the greedy search parent strategy is to start from the first node of the leaf layer, and under the condition of satisfying the local-global constraint , select the optimal parent node of the current node, where the optimal parent node refers to the parent node with a smaller position corr...

Embodiment 3

[0194]In addition to "the third step, using the network tree structure to solve the one-time approximate pattern matching problem under (δ, γ)-distance, the heuristic method is used to find the root leaf path in the network tree that satisfies the local-global constraint. This paper In the embodiment, the third one of four similar strategies for finding occurrences using this heuristic method, the rightmost child strategy and the greedy search child strategy: the rightmost child strategy starts from the last node of the root layer, in Under the condition of satisfying the local-global constraint, the rightmost child node of the current node is preferentially selected. At this time, the greedy search child strategy is to start from the last node of the root layer of the tree. Under the condition of satisfying the local-global constraint, Select the optimal child node of the current node, where the optimal child node refers to the child node with a small position correlation numb...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provided is a one-time approximate mode matching method with local-integral constraints, relating to the technical field of digital processing. A network tree structure is used to solve a one-time approximate pattern matching problem under (Delta, Gamma)-distance. Firstly, a sequence S, a mode P, a local threshold Delta and overall threshold Gamma are read to create a network tree according to aninput condition. Then, starting from a last node of a leaf layer, according to a rightmost parent strategy and a greedy search parent strategy, the occurrence of the larger number of remaining occurrences is selected and the process is iterated until the first node of the leaf layer to find a maximum result set. Finally, all occurrences of pattern P in sequence S are outputted. The present invention overcomes a one-time approximation pattern matching problem with local-integral constraints in the prior art. Generality, accuracy, and flexibility are integrated and the drawback of a solution setis not redundant.

Description

technical field [0001] The technical solution of the invention relates to the technical field of electrical digital data processing, in particular to a one-time approximate pattern matching method with local-total constraints. Background technique [0002] With the continuous development of Internet technology, the scale of data is increasing rapidly. How to use data mining technology to find valuable information from a large amount of data has become a research hotspot. Frequent pattern mining refers to finding frequent patterns from a large amount of data. , its main task is pattern matching, because frequent pattern mining usually needs to calculate the support of a pattern, and the essence of support calculation is the pattern matching problem, therefore, pattern matching is the basis and core of frequent pattern mining, with the development of science and technology With each passing day, pattern matching technology has been widely used in various fields, not only for s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/901G06F16/903
CPCG06F16/9014G06F16/9027G06F16/90344
Inventor 武优西菅博境于磊成淑慧朱昌瑞单劲松刘靖宇
Owner HEBEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products