Regular expression matching method, device and system based on multi-way tree

A technology of expression matching and multi-fork tree, which is applied in special data processing applications, instruments, electrical digital data processing, etc. It can solve the problems of large amount of data matching, low matching efficiency, and failure to meet application requirements, etc., reaching the total number of nodes Reduce, improve matching efficiency, reduce the effect of matching amount

Active Publication Date: 2018-05-04
ZHONGKE DINGFU BEIJING TECH DEV
View PDF7 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in practical applications, the number of regular expressions to be matched is very large, and can even reach as many as hundreds of thousands, while in the regular expression set, the number of regular expressions corresponding to a certain target text is usually only a few tens Therefore, the method of matching regular expressions one by one in the prior art has a huge amount of data matching in the matching process, and the matching efficiency is low, which cannot meet the actual application requirements.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Regular expression matching method, device and system based on multi-way tree
  • Regular expression matching method, device and system based on multi-way tree
  • Regular expression matching method, device and system based on multi-way tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to enable those skilled in the art to better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described The embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0026] Regular expressions are also known as regular expressions, regular expressions, regular expressions, regular expressions, and regular expressions (English: Regular Expression). Regular expressions are a computer science concept. In the field of natural language processing technology, regular expressions It is used to describe and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention provides a regular expression matching method, device and system based on a multi-way tree. The method comprises the following steps: carrying out node fusion through asimplified tree of regular expression; then extracting a constant character of every node from the multi-way tree to generate a keyword dictionary, and carrying out word segmentation on a target textaccording to the keyword dictionary; and finally, matching a hit path from the multi-way tree according to a work segmentation result of the target text, and adding a regular expression correspondingto the hit path into a hit gather. By the multi-way tree, regular expressions which exist singly are clustered into various sub-trees in the multi-way tree, and thus, reduction of the total amount ofnodes and clustering of homologous expression are realized. When the hit path is matched, due to difference of root nodes of the different sub-trees, sub-trees to be matched can be rapidly positionedaccording to the root nodes, thus, a follow-up matching process only can be carried out in to-be-matched sub-trees, the regular expressions do not need to be matched one by one, the matching amount isreduced, and the matching efficiency is improved.

Description

technical field [0001] The present application relates to the technical field of natural language processing, in particular to a multi-tree-based regular expression matching method, device and system. Background technique [0002] In the field of natural language processing technology, regular expressions are used to describe and match a series of strings that match a certain syntax rule, and are often used for text retrieval, text matching or text replacement. Regular expressions are composed of constant characters (also known as ordinary characters) and operator characters (also known as special characters, metacharacters, pattern characters, etc.); where constant characters are used to match text in text retrieval, text matching or text replacement, Operators are used to define the operation rules when constant characters are used to match text. [0003] In the prior art, a large amount of text is sometimes involved in the process of text retrieval, text matching or text...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/247G06F40/284
Inventor 李德彦晋耀红林谡
Owner ZHONGKE DINGFU BEIJING TECH DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products