Supercharge Your Innovation With Domain-Expert AI Agents!

A character string matching method and device

A string matching and string technology, which is applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of inoperability, low efficiency of obtaining similar strings, and complicated implementation.

Active Publication Date: 2018-12-11
IFLYTEK CO LTD
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] It can be seen that the existing technical solutions define the gram recognition index. However, this recognition index is complicated to implement and does not have strong operability. In particular, the comprehensive index strategy based on the integration of the recognition index is engineering. Difficult, the overall execution efficiency is low, resulting in low efficiency in obtaining similar strings

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A character string matching method and device
  • A character string matching method and device
  • A character string matching method and device

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0090] see figure 1 , is a schematic flowchart of a string matching method provided in this embodiment, the string matching method includes the following steps:

[0091] S101: Obtain a target character string to be matched.

[0092] In this embodiment, it is necessary to match one or more similar character strings based on an existing character string. For the existing character string that needs to be matched, this embodiment defines the existing character string is the target string.

[0093] S102: Determine a target candidate set, where the target candidate set includes a plurality of first candidate character strings.

[0094] In practical applications, in order to achieve string matching, it is necessary to pre-build a string candidate set Cad, which usually includes a large number of candidate strings, so that when it is necessary to perform string matching on the target string, One or more candidate character strings similar to the target character string can be matc...

no. 2 example

[0111] It should be noted that this embodiment will introduce an implementation manner of "determining target candidate sets" in S102 in the first embodiment.

[0112] Step S102 may determine a target candidate set based on a preset edit distance threshold. Edit distance is the cost of completely transforming a string into another string through three operations of insertion, deletion, and replacement. Generally speaking, the smaller the edit distance, the greater the similarity between two strings. In this embodiment, an edit distance threshold can be preset The edit distance threshold can be the maximum number of edits required to convert the target string into a similar string, the edit distance threshold It can be set by the user, or the system default value can be used. The edit distance threshold It is the key parameter to realize the matching operation. Understandably, the edit distance threshold The larger the , the more similar strings are matched from the t...

no. 3 example

[0133] It should be noted that this embodiment will introduce the implementation manner of "determining the character string filtering threshold" in S103 in the first embodiment.

[0134] Step S103 can determine the character string filtering threshold MergThreshold based on the preset edit distance threshold, where the edit distance threshold MergThreshold is the maximum number of edits required to convert the target character string into a similar character string. For the relevant introduction of the edit distance threshold MergThreshold, please refer to the above The second embodiment will not be repeated here.

[0135] In an implementation manner of this embodiment, step S103 may specifically determine a character string filtering threshold according to the length of the target character string and a preset edit distance threshold.

[0136] It should be noted that when the number of slices to be matched in the target character string is smaller, then the filtering conditi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a character string matching method and device. The method comprises the following steps: acquiring a target character string to be matched and determining a target candidate set for matching; then determining a character string filtering threshold, wherein the character string filtering threshold is at least the number of slices to be matched between the target character string and each similar character string, and the similar character string is a first candidate character string similar to the target character string in the target candidate set; when the string filtering threshold is determined, matching each similar string from the target candidate set, so that the number of the same slices of the similar string and the target string reaches at least the stringfiltering threshold. Through the filter threshold value of the character string in the present application, the similar character strings can be quickly and accurately matched.

Description

technical field [0001] The present application relates to the technical field of natural language processing, in particular to a string matching method and device. Background technique [0002] In practical applications, for a certain string, it is expected to find a candidate string similar to the string from a very large candidate set containing a large number of candidate strings. [0003] In an existing technical solution, firstly, the gram recognition degree index is obtained by using the overlapping information between the inverted linked lists in the inverted index, and then the gram recognition degree index is integrated with the length ratio of the inverted linked list to form a comprehensive index, and finally According to the comprehensive index, the candidate set is obtained through the generalized prefix filter, and the retrieval result is obtained by calculating the real coding distance of the character strings in the candidate set, that is, similar character s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 夏涛陈洋杨强陈志刚
Owner IFLYTEK CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More