Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for correcting pronunciation based on machine vision

A technology of machine vision and standard pronunciation, which is applied in the direction of instruments, speech analysis, speech recognition, etc., can solve problems such as inability to accurately recognize and correct confusing pronunciation

Pending Publication Date: 2022-01-07
CHONGQING MEDICAL & PHARMA COLLEGE
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The invention provides a method for correcting pronunciation based on machine vision, which solves the technical problem that the prior art cannot accurately identify and correct confusing pronunciation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for correcting pronunciation based on machine vision

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] The embodiment is basically as attached figure 1 shown, including:

[0033] S1. Real-time synchronous collection of user pronunciation audio and user mouth shape images;

[0034] S2. Detect whether the user's pronunciation audio contains pronunciation confusion words: if yes, proceed to S3; if not, proceed to S4;

[0035] S3. Extract the confused audio segment and the confused video segment containing the pronunciation confused word from the user's pronunciation audio and the user's mouth shape image respectively, and compare the preset standard confused audio and standard confused video with the confused audio segment and confused video segment respectively , to judge whether the pronunciation is incorrect: if yes, proceed to S5; if not, return to S1;

[0036] S4. Compare the preset standard pronunciation audio and standard mouth-shape image with the user's pronunciation audio and user's mouth-shape image respectively, and judge whether the pronunciation is wrong: if...

Embodiment 2

[0048] The only difference with Embodiment 1 is that, in S3, before extracting from the user's pronunciation audio and the user's mouth-shaped image, the user's pronunciation audio and the user's mouth-shaped image are degraded. Noise processing, such as the use of Gaussian filtering, can improve the quality of the user's pronunciation audio and user's mouth shape image, and ensure the accuracy and precision of the extraction process.

Embodiment 3

[0050] The only difference from Embodiment 2 is that S3 also detects whether the user's pronunciation audio includes the pronunciation confusion dialect: if so, extracts the confusion audio segment and the confusion image containing the corresponding period of the pronunciation confusion dialect from the user pronunciation audio and the user's mouth shape image respectively segment, respectively comparing the preset standard obfuscated audio and standard obfuscated video corresponding to the pronunciation confused dialect with the confused audio segment and the confused video segment to determine whether the pronunciation is wrong: if yes, go to S5; if not, return to S1. In this embodiment, consider such a situation: due to the extensive and profound Chinese culture, there are unique local dialects in various places, especially the Sichuan dialect. Many dialects have corresponding Mandarin pronunciations, but the meanings of the two are completely different. . For example, in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of computer software, in particular to a method for correcting pronunciation based on machine vision. The method comprises the following steps: S1, synchronously collecting a user pronunciation audio and a user mouth shape image in real time; S2, detecting whether the pronunciation audio of the user contains words with confusing pronunciation or not, and if so, executing S3, or if not, executing S4; S3, respectively extracting a confusing audio clip and a confusing image clip containing the corresponding time period of the words with confusing pronunciation from the user pronunciation audio and the user mouth shape image, respectively comparing a preset standard confusing audio and a preset standard confusing image with the confusing audio clip and the confusing image clip, and judging whether the pronunciation is wrong or not; S4, respectively comparing a preset standard pronunciation audio and a preset standard mouth shape image with a user pronunciation audio and a user mouth shape image, and judging whether the pronunciation is wrong or not; and S5, prompting a pronunciation error, and outputting the standard confusion audio or the standard pronunciation audio. The technical problem that confusing pronunciation cannot be corrected in the prior art is solved.

Description

technical field [0001] The invention relates to the technical field of computer software, in particular to a method for correcting pronunciation based on machine vision. Background technique [0002] Usually, when learning various languages, they will read aloud and follow-up to improve their own pronunciation ability. In most cases, learners cannot know whether their own pronunciation is accurate. Therefore, a variety of language learning software with built-in pronunciation evaluation function or pronunciation correction function has appeared on the market. [0003] The pronunciation evaluation results obtained by the existing language learning software cannot correct specific pronunciation errors, resulting in a lack of pertinence in the pronunciation evaluation results. In this regard, a Chinese patent has published a corresponding pronunciation correction device for language learning, which outputs the user's pronunciation in real time by outputting the preset standard...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L25/51G10L15/25G10L15/02G10L15/04G10L15/16G10L21/0208
CPCG10L25/51G10L15/25G10L15/02G10L15/005G10L15/04G10L15/16G10L21/0208
Inventor 张舰文
Owner CHONGQING MEDICAL & PHARMA COLLEGE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products