Unsupervised learning of semantic audio representations

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of anchor audio and audio clips, applied in the field of unsupervised learning of semantic audio representation, which can solve the problems of expensive manual tag generation and lack of tags with sound content

Pending Publication Date: 2020-07-17

GOOGLE LLC

View PDF8 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the generation of such manual labels can be expensive

Additionally, such manual labeling may require a dedicated set of possible labels for the audio content to be generated before the manual labeling process begins; such a dedicated set may lack labels for all the sound content of the audio recording

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0018] Examples of methods and systems are described herein. It should be understood that the words "exemplary," "example," and "illustrative" are used herein to mean "serving as an example, instance, or illustration." Any embodiment or feature described herein as "exemplary," "example" or "illustrative" is not necessarily to be construed as preferred or advantageous over other embodiments or features. Furthermore, the exemplary embodiments described herein are not meant to be limiting. It should be readily appreciated that certain aspects of the disclosed systems and methods can be arranged and combined in many different configurations.

[0019] I. overview

[0020] Audio recordings can include various non-speech sounds. These non-speech sounds may include noises related to the operation of machinery, weather, human or animal motion, sirens or other warning sounds, barking or other sounds produced by animals, or other sounds. Such sounds may provide an indication of whe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

Methods are provided for generating training triplets that can he used to train multidimensional embeddings to represent the semantic content of non-speech sounds present in a corpus of audio recordings. These training triplets can be used with a triplet loss function to train the multidimensional embeddings such that the embeddings can be used to cluster the contents of a corpus of audio recordings, to facilitate a query-by-example lookup from the corpus, to allow a small number of manually-labeled audio recordings to be generalized, or to facilitate some other audio classification task. Thetriplet sampling methods may be used individually or collectively, and each represent a respective heuristic about the semantic structure of audio recordings.

Description

[0001] Cross References to Related Applications [0002] This application claims priority to US Provisional Patent Application No. 62 / 577,908, filed October 27, 2017, which is hereby incorporated by reference. Background technique [0003] Artificial neural networks can be trained to recognize and / or classify the content of audio recordings. Such classifications can be used to determine the semantic content or context of a recording, determine the location of the recording, identify the purpose of the recording, generate content markers for the recording, select one or more audio processing steps for the recording, or provide some other benefit. Audio recordings may include speech or other sounds. To train such a classifier, audio recordings can be provided with manually generated labels. However, the generation of such manual labels can be expensive. Additionally, such manual labeling may require a dedicated set of possible labels for the audio content to be generated befo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/02G06N3/08

CPCG10L25/51G10L25/30G06N3/08G10L25/18G06N3/045G06N3/04G10L15/02G10L15/063G10L2015/0635

Inventor 阿伦·扬森马诺伊·普拉卡尔理查德·钱宁·莫尔肖恩·赫尔希拉泰特·潘德亚瑞安·里夫金刘家洋丹尼尔·埃利斯

Owner GOOGLE LLC

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Unsupervised learning of semantic audio representations

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology