Voice matching method in multi-person scene

A matching method and vocal technology, applied in the field of multi-person scene vocal matching to achieve the effect of reducing workload
CN110648667BActive Publication Date: 2022-04-08YUNNAN POWER GRID CO LTD ELECTRIC POWER RES INST

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
YUNNAN POWER GRID CO LTD ELECTRIC POWER RES INST
Publication Date
2022-04-08

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

An embodiment of the present application provides a voice matching method in a multi-person scene, including: dividing the audio to be matched into multiple sound segments; performing voice recognition on the sound segments to obtain the voice segments in the sound segments; and obtaining the video corresponding to the voice segments Segment; Face detection is performed on the video segment to obtain all predicted speakers of the voice segment; according to the pixel difference of adjacent gray-scale frames in the video segment, the hit information of each predicted speaker in the adjacent gray-scale frame is obtained; according to The hit information counts the number of hits of each predicted speaker in the video segment, and the predicted speaker with the largest number of hits is the target speaker of the speech segment. The present application realizes the automatic binding of the voice to the target speaker, which can greatly reduce the workload of subsequent manual matching of the voice and the target speaker, and is conducive to promoting the practical application of audio-visual cognition technology.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present application relates to the technical field of vocal matching, and in particular to a method for matching vocals in a multi-person scene. Background technique

[0002] With the continuous development of natural language processing technology, the speech recognition function of converting sound into text has been continuously improved. However, in some multi-person conversation scenarios, such as multi-person meeting records and interview summaries, in addition Only by identifying the speaker’s identity and matching the voice with the speaker’s human voice can the meeting minutes or interview summary be fully recorded.

[0003] In related technologies, voiceprint recognition technology can be used to distinguish different speakers. However, voiceprint recognition needs to collect a segment of each speaker's voice in advance to extract the speaker's voice features as the basis for voiceprint recognition. It does not meet the conditions for re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More