A Fast Fuzzy Matching Algorithm for Strings in Massive Audio Data

A fuzzy matching and string technology, which is used in electrical digital data processing, digital data information retrieval, special data processing applications, etc. It can solve the problems of high requirements of neural networks, and achieve the effect of improving speed and reducing the amount of matching calculation.

Inactive Publication Date: 2019-05-14
深圳凡豆信息科技有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The main problem is that when the database content is large, the training and application of the neural network will have relatively high requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Fast Fuzzy Matching Algorithm for Strings in Massive Audio Data
  • A Fast Fuzzy Matching Algorithm for Strings in Massive Audio Data
  • A Fast Fuzzy Matching Algorithm for Strings in Massive Audio Data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further elaborated below in conjunction with accompanying drawing:

[0033] Such as figure 1 As shown, the main process of the present invention is as follows: firstly, it is necessary to read the label and text data in the database, train and learn the data stored in the database, obtain the mapping relationship D1 from characters to label strings, and the mapping from label strings to text Relationship D2, mapping relationship D3 from text to label quantity. Obtain the description text X input by the user, the length of which is L characters, and extract the character set X(l) (l=1, 2, 3, . . . , L) from the input search text. Through the mapping relationship D1 from character X(l) to label string, filter out the relevant label set from the keyword set, perform fuzzy matching on the filtered label set and the input text X, and save the score of the matching result. Then look up useless dictionaries and negative word dictionaries to furt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a rapid fuzzy matching algorithm for strings. According to the invention, firstly data preprocessing is performed on texts in a database to obtain a statistical model and an index is established via Hash. An input text is a shorter string. The algorithm traverses all Chinese characters therein, activates the positions of corresponding Chinese characters in a finite character complete set, and maps the activation state of the finite character complete set to each tag to filter tags. A few filtered tags are used for matching the texts and the DTW algorithm is used for approximate string matching. The algorithm also comprises the steps of performing scoring and sorting according to the result of the degree of approximation of matching and returning to a search result. Through the efficient tag filtering method, the calculation efficiency of the string matching algorithm is greatly increased; in a process of input text matching, a fuzzy matching effect is achieved and a good matching performance is guaranteed for fuzzy languages.

Description

technical field [0001] The invention relates to a fast fuzzy matching algorithm for character strings in massive audio data, belonging to the field of natural language processing. Background technique [0002] The string matching problem is a search problem in which an element (called a pattern) in a given symbol sequence or a given symbol sequence set (called a pattern) appears in a given symbol sequence (called a text) according to a certain matching condition. This problem is one of the basic problems of computer science, it is widely used in various fields involving text and symbol processing, and it is a key problem in important fields such as network security, information retrieval, and computational biology. With the emergence of network security issues, massive information retrieval, and the rapid development of computational biology, the existing string matching algorithms can no longer meet the needs of applications for matching performance, and there is an urgent ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/683G06F16/9032
CPCG06F16/686G06F16/90344
Inventor 田学红朱晓明于拾全
Owner 深圳凡豆信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products