Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for realizing voice age and/or gender recognition service, and medium

A gender recognition and speech recognition technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of difficult deployment, poor scalability, and difficult to modify.

Pending Publication Date: 2021-07-30
GUANGZHOU YUNCONG INFORMATION TECH CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The speech age and / or gender recognition model based on the Kaldi HMM-DNN hybrid architecture has great advantages in the ability to recognize speech, but it is very difficult to deploy and use in industry. The commonly used method is to first convert the KaldiNnet3 model through the model The tool is converted into an ONNX model, and then other deep learning engines are used to use the ONNX model to provide speech recognition services (for example: MACE mobile terminal AI computing engine), or to deploy using Tensorflow Serving, but the two methods used The framework is fixed and not easy to modify. For speech recognition services, it has poor flexibility and poor scalability, and only supports the operators of the Kaldi neural network reasoning part. WFST decoding still needs to be decoded by Kaldi itself
[0003] The speech recognition engine originally provided by Kaldi based on the Websocket and Gstreamer framework can provide certain speech service capabilities, but it cannot meet the actual industrial deployment requirements in terms of memory resource occupation, decoding speed, and concurrency
[0004] In addition, the speech recognition engine generally provides the access mode of Rest-API to the outside world, and there is no serialization and compression mechanism for the transmitted audio data, which is not conducive to the data transmission of long-term audio and audio files, and it is also used in scenarios that require two-way streaming interaction. extremely difficult
[0005] On the other hand, voice and audio formats are diverse, and general speech recognition engines only support a class of audio formats defined in advance (such as 16k / 8k sampling rate), which cannot dynamically adapt to different needs
[0006] Therefore, although the existing speech recognition model based on the Kaldi HMM-DNN hybrid architecture has great advantages in speech recognition, it is difficult to deploy in actual applications, has poor scalability and flexibility, and after deployment, the decoding speed, concurrency, resource occupation, interaction, and dynamic adaptation properties, etc. cannot meet the actual application requirements, making the user experience poor, and requires a solution that is more flexible, easier to expand, and has a better user experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for realizing voice age and/or gender recognition service, and medium
  • Method and system for realizing voice age and/or gender recognition service, and medium
  • Method and system for realizing voice age and/or gender recognition service, and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] Some embodiments of the present invention are described below with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are only used to explain the technical principles of the present invention, and are not intended to limit the protection scope of the present invention.

[0038] In the description of the present invention, "module" and "processor" may include hardware, software or a combination of both. A module may include hardware circuits, various appropriate sensors, communication ports, and memory, and may also include software parts, such as program codes, or a combination of software and hardware. The processor may be a central processing unit, a microprocessor, an image processor, a digital signal processor or any other suitable processor. The processor has data and / or signal processing functions. The processor can be implemented in software, hardware or a combination of both. The non-transitory computer ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of voice recognition and particularly relates to a method, a system and a device for realizing voice age and / or gender recognition service and a medium, and aims to solve technical problems of remote accurate calling and simple and convenient deployment of an existing voice age and / or gender recognition model. Therefore, a terminal calls a server through a serialized voice age / gender identification request under a predefined GRPC framework, and the server identifies the age / gender through a set age / gender voice identification service; the corresponding voice age / gender recognition deep neural network model is accurately selected to decode and determine the age and / or gender information of the target object, and the age and / or gender information is returned to the terminal. Due to the fact that the age and / or gender service mode and the remote calling architecture are set, the corresponding model is called after the type of the model is determined, calling is more accurate and does not need to depend on a fixed frame, the method is more flexible, expandability is high, the resource utilization rate is high, concurrency is high, and meanwhile iterative updating of the algorithm model is facilitated.

Description

technical field [0001] The present invention relates to the field of voice recognition, in particular to a method and system for realizing voice age and / or gender recognition. Background technique [0002] The speech age and / or gender recognition model based on the Kaldi HMM-DNN hybrid architecture has great advantages in the ability to recognize speech, but it is very difficult to deploy and use in industry. The commonly used method is to first convert the KaldiNnet3 model through the model The tool is converted into an ONNX model, and then other deep learning engines are used to use the ONNX model to provide speech recognition services (for example: MACE mobile terminal AI computing engine), or to deploy using Tensorflow Serving, but the two methods used The framework is fixed and not easy to modify. For speech recognition services, it has poor flexibility and poor scalability, and only supports the operators of the Kaldi neural network reasoning part. WFST decoding still ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/26
Inventor 杨学锐晏超
Owner GUANGZHOU YUNCONG INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products