A dynamic sound feature extraction method based on cosine similarity

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of cosine similarity and feature extraction, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of not being able to capture the long-term characteristics of the signal, losing the dynamic characteristics of the speech signal, etc., and achieve fast recognition speed, obvious effectiveness, and recognition accuracy high effect

Active Publication Date: 2022-03-11

DALIAN MARITIME UNIVERSITY

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

First, they are estimated with a window function from 10ms to 50ms and thus cannot capture long-term features in the signal

Second, MFCC believes that the adjacent frames of the speech signal are independent of each other, which leads to the loss of the dynamic characteristics of the speech signal during the feature extraction process.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0023] In order to make the technical solutions and advantages of the present invention more clear, the technical solutions in the embodiments of the present invention are clearly and completely described below in conjunction with the drawings in the embodiments of the present invention:

[0024] Such as figure 1 Shown is a method for extracting dynamic sound features based on cosine similarity, which specifically includes the following steps:

[0025] S1: Preprocessing the time-domain speech signal by using pre-emphasis, framing and windowing methods, and decomposing the speech signal into frames of a certain length by using a window function. Preprocessing is some preprocessing of time-domain speech signals before extracting features, and the specific methods are as follows.

[0026] ①Pre-emphasis: Due to the structure of the human body and the characteristics of pronunciation, there will be a 6dB attenuation in the frequency band above 800Hz. The pre-emphasis is to make up...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for extracting dynamic sound features based on cosine similarity, comprising the following steps: S1: using pre-emphasis, framing and windowing methods to preprocess the voice signal into a time-domain signal, and using a window function to decompose the voice signal Be the frame of certain length; S2: voice time domain signal is converted into frequency domain signal, obtain the discrete cosine inverse transform (IDCT-Cepstrum Coefficient) cepstral coefficient of 320 dimensions of every frame voice domain signal; S3: to the frequency domain signal of voice Domain signal is carried out the cosine similarity calculation between adjacent dimensions; S4: Find two columns of maximum adjacent dimensions of cosine similarity to merge; S5: Repeat operation S3 to S4 to reduce the dimensionality of 320-dimensional speech audio domain signal to 14 dimensions Speech domain feature; S6: represent the speech feature in the form of a histogram.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a dynamic sound feature extraction method based on cosine similarity. Background technique [0002] Speech recognition consists of three parts: speech feature extraction, speech recognition model building and speaker recognition. Among them, the speech feature extraction is very important in the whole speaker recognition process. Effectively extracting the speech features that represent the essential characteristics of the speaker will make the speech classification and recognition of the model more accurate and the recognition rate higher. Currently commonly used features include MFCC (Mel Frequency Cepstral Coefficient), Fbank (Filterbank feature), PLP (Linear Predictive Coding), etc. Currently, MFCC is used as a speech feature in this field. [0003] Although MFCC is the most popular sonic representation, it suffers from two major drawbacks. First, they are estima...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L15/02G10L25/24

CPCG10L15/02G10L25/24

Inventor 左毅艾佳琪李铁山陈俊龙肖杨贺培超刘君霞马赫

Owner DALIAN MARITIME UNIVERSITY

A dynamic sound feature extraction method based on cosine similarity

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology