Voice retrieval device and voice retrieval method

A sound and sound signal technology, applied in the field of sound retrieval devices, can solve problems such as poor retrieval accuracy

Active Publication Date: 2019-03-08
CASIO COMPUTER CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In the technique disclosed in Non-Patent Document 1, there is a problem that the search accuracy deteriorates when the speech rate of the voice of the search object is different from that of the query inputter.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice retrieval device and voice retrieval method
  • Voice retrieval device and voice retrieval method
  • Voice retrieval device and voice retrieval method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach 1

[0027] like figure 1 As shown, the voice search device 100 of Embodiment 1 physically includes: ROM (Read Only Memory: Read Only Memory) 1, RAM (Random Access Memory: Random Access Memory) 2, external storage device 3, input device 4, output A device 5 , a CPU (Central Processing Unit: Central Processing Unit) 6 , and a bus 7 .

[0028] ROM1 stores a sound search program. RAM2 is used as a work area of ​​CPU6.

[0029] The external storage device 3 is constituted by, for example, a hard disk, and stores an audio signal to be searched, a monophone model, a triphone model, and phoneme time lengths described later as data.

[0030] The input device 4 is composed of, for example, a keyboard and a voice recognition device. The input device 4 supplies the search word input by the user to the CPU 6 as text data. The output device 5 includes, for example, a screen such as a liquid crystal display, a speaker, and the like. The output device 5 displays text data output by the CPU 6...

Embodiment approach 2

[0102] In Embodiment 1, the case where the speech rate is assumed to be fixed and only one piece of speech rate information is set has been described. Therefore, the speech rate information can only correspond to one kind. However, in actual speech, it is not limited to pronounce the same word at the same speed. For example, if the word "カテゴリ" is uttered at an average speed, it may also be uttered slowly with emphasis. To cope with this, in Embodiment 2, a plurality of utterance time lengths are derived by using a plurality of speech rate information. In Embodiment 2, a case will be described in which three kinds of speech rate information (change rate of duration length) of 0.7 (fast), 1.0 (normal), and 1.4 (slow) are used as speech rate information.

[0103] The voice search device of Embodiment 2 is the same as the voice search device 100 of Embodiment 1, as figure 1 physically constituted as shown. In addition, regarding the functional structure and figure 2 The stru...

Deformed example 1

[0131] The case where the speech search apparatus 100 of Embodiments 1 and 2 uniformly multiplies the change rate by the duration of each state of a phoneme has been described. However, the present invention is not limited thereto. For example, a case where the rate of change is changed for each state of a phoneme will be described.

[0132] use Figure 12 A case where the rate of change is changed for each state of a phoneme will be described. Let α1 be the rate of change for duration T1 of state 1 of the phoneme, α2 be the rate of change for duration T2 of state 2, and α3 be the rate of change for duration T3 of state 3.

[0133] In this modified example, when the length of duration is extended, the rate of change in state 1 is set to 1.3, the rate of change in state 2 is set to 1.6, and the rate of change in state 3 is set to 1.3 for vowels. Regarding consonants, the rate of change in state 1 was set to 1.1, the rate of change in state 2 was set to 1.2, and the rate of c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A voice retrieval apparatus executes processes of: obtaining, from a time length memory, a continuous time length for each phoneme contained in a phoneme string of a retrieval string; obtaining user-specified information on an utterance rate; changing the continuous time length for each obtained phoneme in accordance with the obtained information; deriving, based on the changed continuous time length, an utterance time length of voices corresponding to the retrieval string; specifying a plurality of likelihood obtainment segments of the derived utterance time length in a time length of a retrieval sound signal; obtaining a likelihood showing a plausibility that the specified likelihood obtainment segment is a segment where the voices are uttered; and identifying, based on the obtained likelihood, an estimation segment where, within the retrieval sound signal, utterance of the voices is estimated, the estimation segment being identified for each specified likelihood obtainment segment.

Description

[0001] This application claims priority based on Japanese Patent Application No. 2014-259418 filed on December 22, 2014, and the contents of the basic application are incorporated in this application as a reference. technical field [0002] The invention relates to a voice retrieval device and a voice retrieval method. Background technique [0003] With the expansion and popularization of multimedia content such as audio and video, high-precision multimedia retrieval technology is required. Among them, a technique of voice retrieval is being studied, which specifies the position where a voice corresponding to a search term (query) set as a search target is emitted from a voice signal. [0004] In voice retrieval, there is no established retrieval method that has sufficient performance compared with character retrieval using image recognition. Therefore, techniques for realizing voice retrieval with sufficient performance have been intensively studied. [0005] For example,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/63G10L25/54
CPCG06F16/60G06F16/367G06F16/683G10L2015/025
Inventor 富田宽基
Owner CASIO COMPUTER CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products