Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech search device and speech search method

A speech and speech recognition technology, applied in speech analysis, speech recognition, natural language data processing, etc., to achieve the effect of improving retrieval accuracy

Inactive Publication Date: 2016-09-28
MITSUBISHI ELECTRIC CORP
View PDF10 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, if a single statistical language model is constructed using a wide range of learning data, there is a problem that it may not be the most suitable statistical language model for recognizing the utterances of a specific topic, such as the topic of weather

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech search device and speech search method
  • Speech search device and speech search method
  • Speech search device and speech search method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach 1

[0025] figure 1 It is a block diagram showing the configuration of the speech search device according to Embodiment 1 of the present invention.

[0026] The speech retrieval device 100 is composed of an acoustic analysis unit 1, a recognition unit 2, a first language model storage unit 3, a second language model storage unit 4, an acoustic model storage unit 5, a character string comparison unit 6, a character string dictionary storage unit 7 and a retrieval system. The result determining unit 8 is constituted.

[0027] The acoustic analysis unit 1 performs acoustic analysis of the input speech and converts it into a time series of feature vectors. The feature vector is, for example, 1 to N-dimensional data of MFCC (Mel Frequency Cepstral Coefficient: Mel Frequency Cepstral Coefficient). The value of N is 16, for example.

[0028] The recognition unit 2 uses the first language model stored in the first language model storage unit 3, the second language model stored in the s...

Embodiment approach 2

[0066] Figure 4 It is a block diagram showing the configuration of the speech search device according to Embodiment 2 of the present invention.

[0067] In the voice search device 100a according to Embodiment 2, the recognition unit 2a not only outputs the character string as the recognition result to the search result determination unit 8a, but also outputs the acoustic likelihood and linguistic likelihood of the character string to the search result determination unit 8a. Spend. In addition to using the character string collation score, the retrieval result specifying unit 8a specifies the retrieval result using the acoustic likelihood and the language likelihood.

[0068] Hereinafter, for parts that are the same as or corresponding to the constituent elements of the speech search device 100 according to Embodiment 1, the symbols and figure 1 The same reference numerals as used in , and descriptions are omitted or simplified.

[0069] The recognition unit 2 a performs th...

Embodiment approach 3

[0080] Image 6 It is a block diagram showing the configuration of the speech retrieval device according to Embodiment 3 of the present invention.

[0081] Compared with the speech search device 100a shown in Embodiment 2, the speech search device 100b of Embodiment 3 has only the second language model storage unit 4 and does not have the first language model storage unit 3 . Therefore, the recognition process using the first language model is performed using the external recognition device 200 .

[0082] Hereinafter, the parts that are the same as or corresponding to the constituent elements of the speech search device 100a according to Embodiment 2 are marked with Figure 4 The same reference numerals as used in , omit or simplify the description.

[0083] The external recognition device 200 can be constituted by, for example, a server with relatively high computing power, and obtains the best A time-series character string close to the feature vector input from the acous...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A device is provided with: a recognition unit (2) for referring to a plurality of linguistic models having differing acoustic models and learned data to carry out speech recognition of input speech, and acquiring recognized text strings for each of the plurality of linguistic models; a text string matching unit (6) for matching recognized text strings in each of the plurality of linguistic models, to text strings of search-targeted vocabulary collected in a text string dictionary that is stored in a text string dictionary storage unit (7), computing a text string matching score that indicates the degree of matching of a recognized text string to a text string from the search-targeted vocabulary, and acquiring, for each of the recognized text strings, the search-targeted vocabulary text string having the highest text string matching score, as well as the text string matching score in question; and a search result determination unit (8) for referring to the acquired text string matching scores, and outputting one or more search-targeted vocabulary items as a search result, in order from those having higher string matching scores.

Description

technical field [0001] The present invention relates to a speech search device and a speech search method for obtaining a search result by collation processing of a recognition result obtained from a plurality of language models to which language likelihoods are given with a search target vocabulary on a character string. Background technique [0002] Conventionally, as a language model provided with a language likelihood, a statistical language model in which the language likelihood is calculated from statistics of learning data described later is almost always used. In speech recognition using a statistical language model, when the purpose is to recognize utterances of various vocabulary and expressions, it is necessary to construct a statistical language model using various texts as learning data for the language model. However, if a single statistical language model is constructed using a wide range of learning data, there is a problem that it may not necessarily be the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/32G06F17/30G10L15/00G10L15/06
CPCG10L15/183G10L15/26G10L25/54G06F16/3343G10L15/10G06F16/3344G06F40/194
Inventor 花泽利行
Owner MITSUBISHI ELECTRIC CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products