Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

A method, device and system for regular expression matching based on multi-fork tree

An expression matching and multi-tree technology, which is applied in the fields of instrumentation, computing, and electrical and digital data processing, and can solve the problems of inability to meet application requirements, low matching efficiency, and large amount of data matching.

Active Publication Date: 2021-05-04
ZHONGKE DINGFU BEIJING TECH DEV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in practical applications, the number of regular expressions to be matched is very large, and can even reach as many as hundreds of thousands, while in the regular expression set, the number of regular expressions corresponding to a certain target text is usually only a few tens Therefore, the method of matching regular expressions one by one in the prior art has a huge amount of data matching in the matching process, and the matching efficiency is low, which cannot meet the actual application requirements.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method, device and system for regular expression matching based on multi-fork tree
  • A method, device and system for regular expression matching based on multi-fork tree
  • A method, device and system for regular expression matching based on multi-fork tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to enable those skilled in the art to better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described The embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0026] Regular expressions are also known as regular expressions, regular expressions, regular expressions, regular expressions, and regular expressions (English: Regular Expression). Regular expressions are a computer science concept. In the field of natural language processing technology, regular expressions It is used to describe and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the present application provides a regular expression matching method, device and system based on a multi-fork tree, which generates a multi-fork tree by performing node fusion on a simplified tree of a regular expression; then, extracts each The constant character of the node, generate a keyword dictionary, and segment the target text according to the keyword dictionary; finally, according to the word segmentation result of the target text, match the hit path from the multi-fork tree, and add the regular expression corresponding to the hit path to the hit set. Among them, the multi-fork tree clusters the regular expressions that exist alone into each subtree in the multi-fork tree, which realizes the reduction of the total number of nodes and the clustering of homologous expressions. When matching the hit path, since the root node of each subtree is different, the subtree to be matched can be quickly located according to the root node, so that the subsequent matching process will only be performed in the subtree to be matched, and no regular expression is required The formula is matched one by one, which reduces the matching amount and improves the matching efficiency.

Description

technical field [0001] The present application relates to the technical field of natural language processing, in particular to a multi-tree-based regular expression matching method, device and system. Background technique [0002] In the field of natural language processing technology, regular expressions are used to describe and match a series of strings that match a certain syntax rule, and are often used for text retrieval, text matching or text replacement. Regular expressions are composed of constant characters (also known as ordinary characters) and operator characters (also known as special characters, metacharacters, pattern characters, etc.); where constant characters are used to match text in text retrieval, text matching or text replacement, Operators are used to define the operation rules when constant characters are used to match text. [0003] In the prior art, a large amount of text is sometimes involved in the process of text retrieval, text matching or text...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284G06F40/247
CPCG06F40/247G06F40/284
Inventor 李德彦晋耀红林谡
Owner ZHONGKE DINGFU BEIJING TECH DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products