Voice retrieval device, voice retrieval method

A sound and sound signal technology, applied in the field of sound retrieval devices, can solve problems such as poor retrieval accuracy

Active Publication Date: 2019-10-11
CASIO COMPUTER CO LTD
View PDF16 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In the technology disclosed in Non-Patent Document 1, when the speech rate of the search object's voice differs from the speech rate of the query inputter, there is a problem that the search accuracy deteriorates.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice retrieval device, voice retrieval method
  • Voice retrieval device, voice retrieval method
  • Voice retrieval device, voice retrieval method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach 1

[0035] Such as figure 1 As shown, the voice search device 100 of Embodiment 1 is physically equipped with: ROM (Read Only Memory: Read Only Memory) 1, RAM (Random Access Memory: Random Access Memory) 2, external storage device 3, input device 4, output A device 5 , a CPU (Central Processing Unit: Central Processing Unit) 6 , and a bus 7 .

[0036] ROM1 stores a sound search program. RAM2 is used as a work area of ​​CPU6.

[0037] The external storage device 3 is constituted by, for example, a hard disk, and stores an audio signal to be analyzed, a monophonic model, a triphonic model, and phoneme time lengths described later as data.

[0038] The input device 4 is constituted by, for example, a keyboard or a voice recognition device. The input device 4 supplies the CPU 6 with the search word input by the user as text data. The output device 5 includes, for example, a screen such as a liquid crystal display, a speaker, and the like. The output device 5 displays text data ou...

Embodiment approach 2

[0111] Next, Embodiment 2 of the present invention will be described.

[0112] The voice search device 100 according to Embodiment 1 executes calculation of the output probability used for acquiring the likelihood after the search character string is acquired by the search character string acquisition unit 111 . However, the present invention is not limited thereto. The voice search device according to Embodiment 2 performs calculations of output probabilities using a monophonic submodel which requires a large amount of calculations in advance when selecting candidates for sections corresponding to search character strings, thereby speeding up search time. That is, the output probabilities corresponding to the search words are obtained in advance for all sections of the audio signal to be searched, and are stored as search indexes. Then, at the time of retrieval, the likelihood of the likelihood acquisition section is obtained by adding the output probabilities corresponding ...

Deformed example 1

[0123] As used in Embodiment 1 Figure 7 As described above, when the selection unit 121 selects the time length with the highest likelihood, x (10) likelihoods are added for each time length in descending order of likelihood, and a likelihood-based phase is selected. The likelihood acquisition interval of the length of time in which the added value becomes the maximum. However, the selection method is not limited to these. Such as Figure 11A and Figure 11B As an example, in Modification 1, the likelihood of the likelihood acquisition interval based on which speech rate is better is compared using the added value of the corrected likelihood with a weighting factor that multiplies larger as the likelihood is higher.

[0124] Figure 11B is an example of a weighting coefficient, and the higher the likelihood order is, the larger the weighting coefficient is set. Figure 11A This is an example showing that when comparing the likelihood of the likelihood acquisition section...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A voice retrieval apparatus executes processes of: converting a retrieval string into a phoneme string; obtaining, from a time length memory, a continuous time length for each phoneme contained in the converted phoneme string; deriving a plurality of time lengths corresponding to a plurality of utterance rates as candidate utterance time lengths of voices corresponding to the retrieval string based on the obtained continuous time length; specifying, for each of the plurality of time lengths, a plurality of likelihood obtainment segments having the derived time length within a time length of a retrieval sound signal; obtaining a likelihood showing a plausibility that the specified likelihood obtainment segment specified is a segment where the voices are uttered; and identifying, based on the obtained likelihood, for each of the specified likelihood obtainment segments, an estimation segment where utterance of the voices is estimated in the retrieval sound signal.

Description

[0001] Regarding this application, priority is claimed based on Japanese Patent Application No. 2014-259419 for which it applied on December 22, 2014, and the content of this basic application is referred, and the whole content is used in this application. technical field [0002] The invention relates to a voice retrieval device and a voice retrieval method. Background technique [0003] With the expansion and popularization of multimedia content such as audio and video, high-precision multimedia retrieval technology is required. Among them, research is underway on a voice search technique for specifying a position where a voice corresponding to a search word (query) to be searched is uttered from a voice signal. [0004] In voice retrieval, a retrieval method having sufficient performance compared with a character string retrieval technique using image recognition has not yet been established. Therefore, various techniques for realizing sound retrieval with sufficient per...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/14
Inventor 富田宽基
Owner CASIO COMPUTER CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products