A rapid fuzzy matching algorithm for strings in mass audio data

A technology of fuzzy matching and character strings, applied in digital data processing, special data processing applications, calculations, etc., can solve problems such as high requirements of neural networks, achieve the effect of increasing speed and reducing the amount of matching calculations

Inactive Publication Date: 2017-03-22
深圳凡豆信息科技有限公司
View PDF4 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The main problem is that when the database content is large, the training and application of the neural network will have relatively high requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A rapid fuzzy matching algorithm for strings in mass audio data
  • A rapid fuzzy matching algorithm for strings in mass audio data
  • A rapid fuzzy matching algorithm for strings in mass audio data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further elaborated below in conjunction with accompanying drawing:

[0033] like figure 1 As shown, the main process of the present invention is as follows: firstly, it is necessary to read the label and text data in the database, train and learn the data stored in the database, obtain the mapping relationship D1 from characters to label strings, and the mapping from label strings to text Relationship D2, mapping relationship D3 from text to label quantity. Obtain the description text X input by the user, the length of which is L characters, and extract the character set X(l) (l=1, 2, 3, . . . , L) from the input search text. Through the mapping relationship D1 from character X(l) to label string, filter out the relevant label set from the keyword set, perform fuzzy matching on the filtered label set and the input text X, and save the score of the matching result. Then look up useless dictionaries and negative word dictionaries to further...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a rapid fuzzy matching algorithm for strings. According to the invention, firstly data preprocessing is performed on texts in a database to obtain a statistical model and an index is established via Hash. An input text is a shorter string. The algorithm traverses all Chinese characters therein, activates the positions of corresponding Chinese characters in a finite character complete set, and maps the activation state of the finite character complete set to each tag to filter tags. A few filtered tags are used for matching the texts and the DTW algorithm is used for approximate string matching. The algorithm also comprises the steps of performing scoring and sorting according to the result of the degree of approximation of matching and returning to a search result. Through the efficient tag filtering method, the calculation efficiency of the string matching algorithm is greatly increased; in a process of input text matching, a fuzzy matching effect is achieved and a good matching performance is guaranteed for fuzzy languages.

Description

technical field [0001] The invention relates to a fast fuzzy matching algorithm for character strings in massive audio data, belonging to the field of natural language processing. Background technique [0002] The string matching problem is a search problem in which an element (called a pattern) in a given symbol sequence or a given symbol sequence set (called a pattern) appears in a given symbol sequence (called a text) according to a certain matching condition. This problem is one of the basic problems of computer science, it is widely used in various fields involving text and symbol processing, and it is a key problem in important fields such as network security, information retrieval, and computational biology. With the emergence of network security issues, massive information retrieval, and the rapid development of computational biology, the existing string matching algorithms can no longer meet the needs of applications for matching performance, and there is an urgent ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/686G06F16/90344
Inventor 田学红朱晓明于拾全
Owner 深圳凡豆信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products