Multi-mode-based conference spokesman identity non-inductive confirmation method

A speaker, multi-modal technology, applied in neural learning methods, character and pattern recognition, biological neural network models, etc., to achieve high accuracy and improve efficiency

Pending Publication Date: 2020-02-18
南京星耀智能科技有限公司
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the cumbersome problem of distinguishing different speakers due to the need to turn off and turn on the microphones at different positions many times due to the adjustment of the distance caused by the traditional allocation of microphones in regular meetings, the present invention provides a multi-modal The method for confirming the identity of the conference speaker without feeling, specifically: the method of automatically identifying and distinguishing the speaker's expression, voice and speech style in three aspects, including the expression recognition method based on the deep learning model, the method based on Voice recognition method based on artificial intelligence algorithm, speech content recognition method based on text clustering algorithm

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-mode-based conference spokesman identity non-inductive confirmation method
  • Multi-mode-based conference spokesman identity non-inductive confirmation method
  • Multi-mode-based conference spokesman identity non-inductive confirmation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] Collected about 1,000 face photos of speakers at the meeting site, manually classified these photos into speech and non-speech categories, and then used basic operations such as random interference, deformation, and rotation, and then used the Gan network to generate more The training set of the source data set is about 10 times larger than the original data set. Then use the Faster R-Cnn model to train the sample data, and the final model accuracy rate reaches 85%.

[0036] For speaker voice recognition, as a specific embodiment of the present invention, it is: 1) data collection: real-time collection of voice data at the meeting site, and the data is segmented every 4-8 seconds, preferably 5 seconds, and each section is used as a processing unit; 2 ) data processing: because the speeches at the meeting site are relatively standardized, mostly in Mandarin, and the venue is relatively quiet with less noise, so basically there is no need for data processing; 3) model con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a multi-mode-based conference speaker identity non-inductive confirmation method. Based on a conference using multiple modes of image, voice and text, the identity of a spokesman is confirmed by recognizing the expression, voice and speaking style of the spokesman, and the method specifically comprises an expression recognition method based on a deep learning model, a voicerecognition method based on an artificial intelligence algorithm and a method for recognizing speaking content by adopting a text clustering algorithm. According to the method, the whole process is automatic, manual intervention is not needed, the identity of the speaker can be confirmed in a non-inductive mode through the artificial intelligence algorithm model, manual intervention is not needed,meeting and office efficiency is greatly improved, and accuracy is high.

Description

technical field [0001] The invention belongs to the field of natural language processing, in particular to a method for non-sensing confirmation of the identity of a conference speaker based on multimodality. Background technique [0002] With the development of the economy, efficient office is increasingly inseparable from the conference system. At this stage, many conference systems need to record the speech content of each speaker for the convenience of summarization and reporting. Therefore, for this requirement, an intelligent and fast method for distinguishing speakers is needed. [0003] At present, the current conference system mostly uses the microphone to record the voice of the speaker to record the content of the speech. If you want to distinguish different speakers, you need to assign a microphone to each speaker. However, if you assign multiple microphones, it may cause crosstalk. Because the distance is too close, multiple microphones will recognize a person ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/62G06F16/35G06F16/55G06F40/30G06F40/216G06N3/04G06N3/08G10L17/04G10L17/08
CPCG06F16/355G06F16/55G06N3/08G10L17/04G10L17/08G06V40/174G06N3/045G06F18/241G06F18/214
Inventor 杨理想王云甘周亚孙振平
Owner 南京星耀智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products