Method for correcting pronunciation based on machine vision

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of machine vision and standard pronunciation, which is applied in the direction of instruments, speech analysis, speech recognition, etc., can solve problems such as inability to accurately recognize and correct confusing pronunciation

Pending Publication Date: 2022-01-07

CHONGQING MEDICAL & PHARMA COLLEGE

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The invention provides a method for correcting pronunciation based on machine vision, which solves the technical problem that the prior art cannot accurately identify and correct confusing pronunciation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0032] The embodiment is basically as attached figure 1 shown, including:

[0033] S1. Real-time synchronous collection of user pronunciation audio and user mouth shape images;

[0034] S2. Detect whether the user's pronunciation audio contains pronunciation confusion words: if yes, proceed to S3; if not, proceed to S4;

[0035] S3. Extract the confused audio segment and the confused video segment containing the pronunciation confused word from the user's pronunciation audio and the user's mouth shape image respectively, and compare the preset standard confused audio and standard confused video with the confused audio segment and confused video segment respectively , to judge whether the pronunciation is incorrect: if yes, proceed to S5; if not, return to S1;

[0036] S4. Compare the preset standard pronunciation audio and standard mouth-shape image with the user's pronunciation audio and user's mouth-shape image respectively, and judge whether the pronunciation is wrong: if...

Embodiment 2

[0048] The only difference with Embodiment 1 is that, in S3, before extracting from the user's pronunciation audio and the user's mouth-shaped image, the user's pronunciation audio and the user's mouth-shaped image are degraded. Noise processing, such as the use of Gaussian filtering, can improve the quality of the user's pronunciation audio and user's mouth shape image, and ensure the accuracy and precision of the extraction process.

Embodiment 3

[0050] The only difference from Embodiment 2 is that S3 also detects whether the user's pronunciation audio includes the pronunciation confusion dialect: if so, extracts the confusion audio segment and the confusion image containing the corresponding period of the pronunciation confusion dialect from the user pronunciation audio and the user's mouth shape image respectively segment, respectively comparing the preset standard obfuscated audio and standard obfuscated video corresponding to the pronunciation confused dialect with the confused audio segment and the confused video segment to determine whether the pronunciation is wrong: if yes, go to S5; if not, return to S1. In this embodiment, consider such a situation: due to the extensive and profound Chinese culture, there are unique local dialects in various places, especially the Sichuan dialect. Many dialects have corresponding Mandarin pronunciations, but the meanings of the two are completely different. . For example, in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of computer software, in particular to a method for correcting pronunciation based on machine vision. The method comprises the following steps: S1, synchronously collecting a user pronunciation audio and a user mouth shape image in real time; S2, detecting whether the pronunciation audio of the user contains words with confusing pronunciation or not, and if so, executing S3, or if not, executing S4; S3, respectively extracting a confusing audio clip and a confusing image clip containing the corresponding time period of the words with confusing pronunciation from the user pronunciation audio and the user mouth shape image, respectively comparing a preset standard confusing audio and a preset standard confusing image with the confusing audio clip and the confusing image clip, and judging whether the pronunciation is wrong or not; S4, respectively comparing a preset standard pronunciation audio and a preset standard mouth shape image with a user pronunciation audio and a user mouth shape image, and judging whether the pronunciation is wrong or not; and S5, prompting a pronunciation error, and outputting the standard confusion audio or the standard pronunciation audio. The technical problem that confusing pronunciation cannot be corrected in the prior art is solved.

Description

technical field [0001] The invention relates to the technical field of computer software, in particular to a method for correcting pronunciation based on machine vision. Background technique [0002] Usually, when learning various languages, they will read aloud and follow-up to improve their own pronunciation ability. In most cases, learners cannot know whether their own pronunciation is accurate. Therefore, a variety of language learning software with built-in pronunciation evaluation function or pronunciation correction function has appeared on the market. [0003] The pronunciation evaluation results obtained by the existing language learning software cannot correct specific pronunciation errors, resulting in a lack of pertinence in the pronunciation evaluation results. In this regard, a Chinese patent has published a corresponding pronunciation correction device for language learning, which outputs the user's pronunciation in real time by outputting the preset standard...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L25/51G10L15/25G10L15/02G10L15/04G10L15/16G10L21/0208

CPCG10L25/51G10L15/25G10L15/02G10L15/005G10L15/04G10L15/16G10L21/0208

Inventor 张舰文

Owner CHONGQING MEDICAL & PHARMA COLLEGE

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for correcting pronunciation based on machine vision

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology