Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

339 results about "Speaker identification" patented technology

Method and apparatus for linking a video segment to another segment or information source

A given video segment is configured to include links to one or more other video segments or information sources. The given video segment is processed in a video processing system to determine an association between an object, entity, characterization or other feature of the segment and at least one additional information source containing the same feature. The association is then utilized to access information from the additional information source, such that the accessed information can be displayed to a user in conjunction with or in place of the original video segment. A set of associations for the video segment can be stored in a database or other memory of the processing system, or incorporated into the video segment itself, e.g., in a transport stream of the video segment. The additional information source may be, e.g., an additional video segment which includes the designated feature, or a source of audio, text or other information containing the designated feature. The feature may be a video feature extracted from a frame of the video segment, e.g., an identification of a particular face, scene, event or object in the frame, an audio feature such as a music signature extraction, a speaker identification, or a transcript extraction, or a textual feature. The invention allows a user to access information by clicking on or otherwise selecting an object or other feature in a displayed video segment, thereby facilitating the retrieval of information related to that segment.
Owner:KONINKLIJKE PHILIPS ELECTRONICS NV

Meeting record generation method, apparatus, apparatus, and computer storage medium

The invention discloses a meeting record generation method, comprising the following steps: receiving a meeting record generation request and obtaining meeting information in the meeting record generation request; And collecting audio information of the meeting through the audio collecting device; Performing speech recognition on the conference audio information to obtain voice print features andspeech text information corresponding to the conference audio information; Determining a speaker identification corresponding to the conference audio information according to the voiceprint characteristics; inputting The speech text information is to a preset processing model to obtain meeting text information, and associating the speaker identification with the meeting text information to generate meeting records. The invention also discloses a meeting record generation device, an apparatus and a computer storage medium. that the speech text information is obtained by the speech recognitionis used in the invention, the speech text information is input into a preset processing model obtained by the preset machine learn training, and the meeting records are automatically generated by theautomatic processing, so that the meeting records generation is more intelligent.
Owner:ONE CONNECT SMART TECH CO LTD SHENZHEN

Sparse representation based short-voice speaker recognition method

InactiveCN103345923AAlleviate underrepresentation of personality traitsDealing with mismatchesSpeech analysisMajorization minimizationSelf adaptive
The invention discloses a sparse representation based short-voice speaker recognition method, which belongs to the technical field of voice signal processing and pattern recognition, and aims to solve the problem that the existing method is low in recognition rate under limited voice data conditions. The method mainly comprises the following steps: (1) pretreating all voice samples, and then extracting Mel-frequency cepstral coefficients and first-order difference coefficients thereof as characteristic; (2) training a gaussian background model by a background voice library, and extracting gaussian supervectors as secondary characteristics; (3) arranging the gaussian supervectors for training voice samples together so as to form a dictionary; and (4) solving an expression coefficient by using a sparse solving algorithm, reconstructing signals, and determining a recognition result according to a minimized residual error. According to the invention, the gaussian supervectors obtained through self-adaption can greatly relieve the problem that the personality characteristics of a speaker are expressed insufficiently due to limited voice data; through carrying out classification by using sparsely represented reconstructed residual errors, a speaker model mismatch problem caused by mismatched semantic information can be handled.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Method for improving speaker identification by determining usable speech

InactiveUS7177808B2Overcome limitationsEnhance a target speakerSpeech recognitionDependabilityFormant
Method for improving speaker identification by determining usable speech. Degraded speech is preprocessed in a speaker identification (SID) process to produce SID usable and SID unusable segments. Features are extracted and analyzed so as to produce a matrix of optimum classifiers for the detection of SID usable and SID unusable speech segments. Optimum classifiers possess a minimum distance from a speaker model. A decision tree based upon fixed thresholds indicates the presence of a speech feature in a given speech segment. Following preprocessing, degraded speech is measured in one or more time, frequency, cepstral or SID usable / unusable domains. The results of the measurements are multiplied by a weighting factor whose value is proportional to the reliability of the corresponding time, frequency, or cepstral measurements performed. The measurements are fused as information, and usable speech segments are extracted for further processing. Such further processing of co-channel speech may include speaker identification where a segment-by-segment decision is made on each usable speech segment to determine whether they correspond to speaker #1 or speaker #2. Further processing of co-channel speech may also include constructing the complete utterance of speaker #1 or speaker #2. Speech features such as pitch and formants may be extended back into the unusable segments to form a complete utterance from each speaker.
Owner:THE UNITED STATES OF AMERICA AS REPRESETNED BY THE SEC OF THE AIR FORCE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products