Unsupervised learning of semantic audio representations

A technology of anchor audio and audio clips, applied in the field of unsupervised learning of semantic audio representation, which can solve the problems of expensive manual tag generation and lack of tags with sound content

Pending Publication Date: 2020-07-17
GOOGLE LLC
View PDF8 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the generation of such manual labels can be expensive
Additionally, such manual labeling may require a dedicated set of possible labels for the audio content to be generated before the manual labeling process begins; such a dedicated set may lack labels for all the sound content of the audio recording

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised learning of semantic audio representations
  • Unsupervised learning of semantic audio representations
  • Unsupervised learning of semantic audio representations

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] Examples of methods and systems are described herein. It should be understood that the words "exemplary," "example," and "illustrative" are used herein to mean "serving as an example, instance, or illustration." Any embodiment or feature described herein as "exemplary," "example" or "illustrative" is not necessarily to be construed as preferred or advantageous over other embodiments or features. Furthermore, the exemplary embodiments described herein are not meant to be limiting. It should be readily appreciated that certain aspects of the disclosed systems and methods can be arranged and combined in many different configurations.

[0019] I. overview

[0020] Audio recordings can include various non-speech sounds. These non-speech sounds may include noises related to the operation of machinery, weather, human or animal motion, sirens or other warning sounds, barking or other sounds produced by animals, or other sounds. Such sounds may provide an indication of whe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Methods are provided for generating training triplets that can he used to train multidimensional embeddings to represent the semantic content of non-speech sounds present in a corpus of audio recordings. These training triplets can be used with a triplet loss function to train the multidimensional embeddings such that the embeddings can be used to cluster the contents of a corpus of audio recordings, to facilitate a query-by-example lookup from the corpus, to allow a small number of manually-labeled audio recordings to be generalized, or to facilitate some other audio classification task. Thetriplet sampling methods may be used individually or collectively, and each represent a respective heuristic about the semantic structure of audio recordings.

Description

[0001] Cross References to Related Applications [0002] This application claims priority to US Provisional Patent Application No. 62 / 577,908, filed October 27, 2017, which is hereby incorporated by reference. Background technique [0003] Artificial neural networks can be trained to recognize and / or classify the content of audio recordings. Such classifications can be used to determine the semantic content or context of a recording, determine the location of the recording, identify the purpose of the recording, generate content markers for the recording, select one or more audio processing steps for the recording, or provide some other benefit. Audio recordings may include speech or other sounds. To train such a classifier, audio recordings can be provided with manually generated labels. However, the generation of such manual labels can be expensive. Additionally, such manual labeling may require a dedicated set of possible labels for the audio content to be generated befo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/02G06N3/08
CPCG10L25/51G10L25/30G06N3/08G10L25/18G06N3/045G06N3/04G10L15/02G10L15/063G10L2015/0635
Inventor 阿伦·扬森马诺伊·普拉卡尔理查德·钱宁·莫尔肖恩·赫尔希拉泰特·潘德亚瑞安·里夫金刘家洋丹尼尔·埃利斯
Owner GOOGLE LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products