Chinese variation text matching recognition method

A recognition method and variation technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of difficult matching of nearly typos replacement, and achieve the effect of improving matching speed, small time and space complexity

Active Publication Date: 2011-02-16
重庆智载科技有限公司
View PDF4 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the problem that it is difficult to match the replacement of similar characters in variant text and the replacement of typos based on similar shapes, the present invention improves the similarity of the text by performing special encoding conversion on the target text and the pattern string, and then adopts the exact String matching algorithm to match

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese variation text matching recognition method
  • Chinese variation text matching recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0013] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0014] The implementation of the present invention will be specifically described below with reference to the accompanying drawings and specific examples. Such as figure 1 It is a flow chart of Chinese variant text matching and recognition in the present invention.

[0015] Construct a character encoding table based on radicals and a pattern string encoding conversion encoding table.

[0016] Based on the construction of the radical character code table, each Chinese character is divided into character basic units including radicals and characters according to the structure, and 64 different identifiers (such as uppercase and lowercase English letters, Arabic numerals, etc., can be used) character) as the basic unit of characters, this example builds a character encoding table based on radicals (such as Table 1) based on 64 encoding convers...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese variation text matching recognition method. The method comprises the following steps of: performing special encoding conversion on a target text and a mode character string to improve the similarity of texts; adding proper wildcard characters into the converted mode character string according to the structural characteristics of Chinese characters in the mode character string; and matching by using a precise character string matching algorithm (namely a CV-BM algorithm). The method solves the problems of the replacement of characters with similar shapes in the variation text and the matching difficulty based on the replacement of wrongly written characters with similar shapes. The problems of the replacement of the characters with the similar shapes in the variation text and the matching difficulty based on the replacement of wrongly written characters with the similar shapes can be solved. Compared with an image partitioning recognition method, the method has smaller time and space complexities, is more suitable for the quick matching of Chinese character information in a high-speed network data transmission environment, and can be widely applied to systems needing to match the Chinese character information, such as an intrusion prevention system, an information retrieval system and the like.

Description

technical field [0001] The invention relates to a Chinese information retrieval and content filtering method, especially a Chinese information matching method. The method can be widely used in systems such as intrusion prevention systems and information retrieval systems that need to match Chinese information. Background technique [0002] IPS (Intrusion Prevention System) has always played an important role in the security protection system. IPS technology can carry out multi-layer, deep and active protection on the network to effectively ensure the security of the enterprise network. String matching is an important indicator of IPS system performance. String matching refers to finding out all occurrences of P in text T (target text) given a set of specific strings P (pattern strings). If a string identical to the pattern string P is found in the text T, then the pattern string P matches the target text T, otherwise it does not match. [0003] In the Chinese environment, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 程克非李红波郭瑞杰席珍
Owner 重庆智载科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products