Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

35 results about "Speech comprehension" patented technology

Speech comprehension starts with the identification of the speech signal against an auditory background and its transformation to an abstract representation, also called decoding. Speech sounds are perceived as phonemes, which form the smallest unit of meaning.

Smart interaction method for speech and text mutual conversion

The embodiment of the invention discloses a smart interaction method for speech and text mutual conversion. The smart interaction method comprises the following steps that: a speech command sent by auser is sent to a speech gateway through a public switched telephone network (PSTN); the speech gateway sends speech information to a speech to text module ASR for speech recognition and speech to text processing; the text information is transmitted to a natural language processing module (NLP) for semantic comprehension; a semantic comprehension result is transmitted to a task processing module;the task processing module matches a logical task with relevant knowledge; a user service processing system performs service processing according to a matching result, and outputs a service processingresult; the service processing result is transmitted to a task processing module; the processed result data are transmitted to the NLP for speech comprehension interaction; speech synthesis is performed on the text through a text to speech module (TTS); and corresponding interaction is performed on the synthesized speech and the user through the speech gateway. Through adoption of the smart interaction method, the problems of small data processing amount, high running cost and poor user experience in an existing enterprise hotline service system are solved.
Owner:深圳市一号互联科技有限公司

Lip language recognition-based method for improving speech comprehension degree of patient with severe hearing impairment

The invention discloses a lip language recognition-based method for improving the speech comprehension degree of a patient with severe hearing impairment. The method comprises the steps: collecting alip motion image sequence from a real environment through image collection equipment, and enabling the lip motion image sequence to serve as an input feature of a deep neural network; constructing a visual modal voice endpoint detection method based on deep learning, and determining the position of a voice segment under the condition of a low signal-to-noise ratio; constructing a deep learning model based on a three-dimensional convolution-residual network-bidirectional GRU structure as a baseline model; constructing a lip language recognition model based on spatio-temporal information features on the basis of the baseline model; and training a network model by using the cross entropy loss, and identifying the speaking content according to the trained lip language identification model. According to the method, fine-grained features and time domain key frames of the lip language image are captured through space-time information feedback, so that the adaptability to the lip language features in a complex environment is improved, the lip language recognition performance is improved, the language understanding ability of a patient suffering from severe hearing impairment is improved, and the method has a good application prospect.
Owner:NANJING INST OF TECH

Method for controlling set-top box by intelligent sound box terminal based on broadcasting and TV

The invention discloses a method for controlling a set-top box by an intelligent sound box terminal based on broadcast and television, which comprises the following steps: 1) after the intelligent sound box terminal is electrified, connecting a network through a physical key, and accessing a service background of the intelligent sound box terminal; 2) enabling the Bluetooth voice remote controller to display a pairing code through voice, enabling the set top box to display a unique identifier, namely a television pairing code, and completing bidirectional binding; 3) entering a television mode, switching the instruction transmission interface to the broadcast and TV voice service platform by the sound box cloud service platform, and controlling the set top box through the sound box; 4) enabling the intelligent sound box terminal equipment to collect user voice, perform voice ASR natural voice recognition and NLU natural voice understanding processing, and then send intention to the user set top box voice apk through the voice platform proxy server to be processed; and 5) enabling the voice platform service server to receive the intention request of the set-top box user, request the set-top box to execute the intention request, and realize remote voice control of the set-top box without adjusting hardware of the set-top box by connecting the set-top box with the intelligent sound box terminal.
Owner:江苏有线技术研究院有限公司

Intelligent voice understanding system with multiple voice understanding engines and intelligent voice interaction method

The invention provides an intelligent voice understanding system with multiple voice understanding engines and an intelligent voice interaction method. The intelligent voice understanding system comprises a first voice understanding engine which processes voice without adopting a transcription mode, a second voice understanding engine which processes voice in a transcription mode and an understanding result judgment unit, a voice processing unit of the first voice understanding engine processes the voice to obtain voice data in a coding sequence form, and a natural language understanding unit acquires an intention corresponding to the voice based on the voice data in the coding sequence form through a natural language understanding model; a voice processing unit of the second voice understanding engine performs transcription processing on the voice to obtain voice data in a text form, and the natural language understanding unit acquires an intention corresponding to the voice based on the voice data in the text form through a natural language understanding model; and the understanding result determination unit determines an intention corresponding to the speech on the basis of the understanding results of the two speech understanding engines.
Owner:水木智库(北京)科技有限公司

Visual content blind guiding auxiliary system and method based on coding and decoding technology

The invention discloses a visual content blind guiding auxiliary system and method based on coding and decoding technology, and relates to the field of computer vision, the system comprises a central processor module, a depth camera module, a voice broadcast device module, a voice understanding device module and a power supply module; wherein the central processing unit is used for controlling the system, visual data processing and signal transmission, control software with a blind guiding system is deployed on the central processing unit, and the control software comprises a visual content interpretation unit, a voice recognition unit and a road planning unit; the depth camera is used for carrying out image acquisition on a current scene and generating an RGB image and a depth map; the voice broadcast device is used for understanding the voice information output by the central processing unit and playing object searching or road planning conditions; the voice understanding device is used for collecting voice information of a user and transmitting the voice information to the central processing unit; the power supply is used for supplying power to the central processor. The blind person can be assisted in better life, and the life quality is improved.
Owner:SHENYANG LIGONG UNIV

Speech understanding model generation method and intelligent speech interaction method based on pragmatic information

The invention discloses a speech understanding model generation method and an intelligent speech interaction method based on pragmatic information, and the speech understanding model generation method comprises the steps: processing speech, and obtaining speech data in a coding sequence form; presetting pragmatic information classification nodes; correlating the voice data in the coding sequence form with pragmatic information classification nodes; and generating a voice understanding model by using the voice data in the coding sequence form and the pairing data of the pragmatic information classification nodes. According to the method, pragmatic information is directly understood from voice, and information loss caused by the fact that the voice is transferred into characters is avoided; the method is not limited by characters, and a set of voice interaction architecture and a corresponding model can support various language environments such as different dialects, small languages and mixed languages; training corpora and a training voice understanding model are collected according to the hierarchy of pragmatic information classification nodes of voice interaction, and the data volume required by training is greatly reduced; and through simple association operation, the voice obtained in voice interaction is used for rapid iteration of the voice understanding model.
Owner:水木智库(北京)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products