A popular singing real-time and vocal auxiliary training method

By combining audio separation and pitch visualization tools, interactive popular singing and harmony training is provided, which solves the problems of low training efficiency and delayed feedback, realizes systematic and scientific harmony training, reduces the learning difficulty and improves the training effect.

CN122201091APending Publication Date: 2026-06-12韩笑

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
韩笑
Filing Date
2026-03-05
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing popular singing and harmony training methods are inefficient, have delayed feedback, are detached from real singing contexts, and lack systematicity and scientific rigor.

Method used

By separating the lead vocals and harmonies using audio separation technology, a multi-track audio environment is constructed. Combined with pitch visualization tools, it provides real-time feedback and interactive training, allowing trainees to imitate and construct harmonies in the accompaniment of real pop songs.

🎯Benefits of technology

It improves the systematic and scientific nature of harmony training, lowers the learning threshold, enhances self-correction efficiency, allows training results to be directly transferred to actual singing, and the equipment is readily available and inexpensive.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122201091A_ABST
    Figure CN122201091A_ABST
Patent Text Reader

Abstract

The application discloses a popular singing real-time and vocal auxiliary training method, which comprises the following steps: a material preprocessing stage, an interactive hierarchical training stage, a real-time feedback training stage, an active construction and evaluation stage; in the multi-track audio environment, the trainer first records the main melody singing audio, then based on the main melody singing audio and the chord information of the target song, the harmony degree is evaluated by comparing the second audio stream with the and audio stream, the application relates to the technical field of music training, through the path design of separation-imitation-interaction-construction-practice, the complex vocal ability is disassembled into sequential and progressive training modules, the learning threshold is reduced, the systematicness and scientificity of the training are improved, the pitch visualization tool is introduced, the abstract auditory sensation is converted into specific visual images, the pitch problem is obvious at a glance, and the self-correction efficiency and accuracy are greatly improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of music training technology, specifically to a real-time harmony-assisted training method for popular singing. Background Technology

[0002] In popular singing, harmonic ability is an important indicator of a singer's musical literacy and collaborative skills. Traditional harmonic training mainly relies on the following methods: Face-to-face instruction: The teacher sings the main melody, and the students try to match the harmonies. This method is effective, but it is limited by the teacher's time and space, costly, and difficult to record and quantify the training process.

[0003] Sheet music and piano assistance: This method involves sight-singing and ear training using a piano with fixed pitches. It requires a high level of music theory and sight-reading ability from the student, is relatively tedious, and is not closely integrated with the actual context of popular song performances.

[0004] Singing along to the original recording: Students learn by repeatedly listening to and imitating the harmonies in the original recording. This method lacks structured guidance, making it difficult for students to effectively separate the harmonic parts from the complex mix for focused imitation, and they cannot obtain immediate, objective feedback on their own pitch and rhythm.

[0005] In recent years, with the development of digital audio technology, some singing applications with vocal removal or pitch shifting have emerged. However, they mainly serve entertainment and do not have a scientific path designed for systematic training of harmony skills. Most existing music education software also focuses on pitch training for instrument playing or singing melody, lacking a progressive and interactive training program specifically designed for how to move from imitation to independent construction of harmony in the context of popular songs.

[0006] Therefore, existing technologies suffer from problems such as low training efficiency, delayed feedback, detachment from the real singing context, and excessively high requirements for students' music theory foundation. There is an urgent need for a real-time harmony-assisted training method for popular singing that is easy to implement and has significant training effects. Summary of the Invention

[0007] (a) Technical problems to be solved To address the shortcomings of existing technologies, this invention provides a real-time harmony-assisted training method for popular singing, which solves the problems of low training efficiency, delayed feedback, and detachment from the real singing context in existing technologies.

[0008] (II) Technical Solution To achieve the above objectives, the present invention provides the following technical solution: a real-time harmony-assisted training method for popular singing, comprising the following steps: S1. Material preprocessing stage: Obtain the complete audio data of the target song, and use audio separation technology to separate a first audio stream containing at least the lead singer's voice and a second audio stream containing the harmonies from the complete audio data; S2, Interactive Layered Training Phase: Based on the first audio stream and the second audio stream, a multi-track audio environment containing a guide track and a practice track is constructed, and the accompaniment audio corresponding to the target song is loaded; the guide track is used to play the second audio stream, and the practice track is used to record the trainee's singing audio; S3. Real-time feedback training phase: The trainee wears headphones and sings synchronously while listening to the first audio stream and / or the accompaniment audio and following the second audio stream played by the guide track; the practice track records the trainee's singing audio in real time and performs real-time pitch analysis on the singing audio through a pitch visualization tool to generate a visualized pitch curve feedback. S4. Active Construction and Evaluation Phase: In the multi-track audio environment, the trainee first records the main melody singing audio, and then independently constructs and records the harmony audio based on the main melody singing audio and the chord information of the target song; the harmony is evaluated by comparing the harmony audio with the second audio stream, or by playing a mixed audio of the main melody singing audio and the harmony audio.

[0009] In some embodiments, in step S1, the audio separation technique is performed using an AI-based audio separation application or software to separate independent harmonic and vocal audio streams.

[0010] In some embodiments, in step S2, the multitrack audio environment is constructed using digital audio workstation software, and the accompaniment audio is a pure accompaniment separated from the original song or an intelligent accompaniment generated based on the chord progression of the target song.

[0011] In some embodiments, the "synchronous singing following the second audio stream played on the guide track" in step S3 includes: S31, First Sing-along Mode: Simultaneously play the guide track and the second audio stream, and the trainee sings along synchronously. S32, Second Sing-along Mode: Only the first audio stream and the accompaniment audio are played, the guide track is turned off, and the trainee sings the harmony part independently based on memory and auditory recognition; the trainee can switch between the first singing-along mode and the second singing-along mode.

[0012] In some embodiments, in step S3, the pitch visualization tool is a standalone mobile application or a plugin integrated into digital audio workstation software, which can convert the received singing audio signal into a visual graphic or spectrum in real time and compare and display it with the target pitch baseline.

[0013] In some embodiments, the “independent construction and recording of harmonic audio” in step S4 includes: the trainee, based on preset harmonic construction rules, or by exploratory humming to find pitches that are in harmony with the main melody singing audio and the current chord, forming a harmonic melody line and then recording it.

[0014] In some embodiments, the preset harmonic construction rules include at least one of the rules for superimposing thirds, fifths, and octaves.

[0015] In some embodiments, step S5 is also included: a practical simulation stage, in which the trainee, based solely on the accompaniment audio, performs a chordal singing of a specified section of the target song without prior listening to the original harmony, and records the singing process as practical audio for playback evaluation.

[0016] (III) Beneficial Effects The beneficial effects of this invention are: By designing a path of separation-imitation-interaction-construction-practice, complex harmonic abilities are broken down into progressive training modules, lowering the learning threshold and enhancing the systematic and scientific nature of training. The introduction of pitch visualization tools transforms abstract auditory perceptions into concrete visual images, making pitch problems immediately apparent and greatly improving the efficiency and accuracy of self-correction.

[0017] The training is based on real popular song materials and accompaniment, and is conducted in a multi-track audio environment, which highly simulates the actual singing and collaboration situation, so that the training results can be directly transferred to practical applications. The solution is based on common smart terminal devices and software tools, without the need for professional recording studios or expensive equipment, and is easy to promote and popularize among individuals and educational institutions.

[0018] The interactive switch design of the guide rail forces trainees to switch between relying on external cues and relying on their internal auditory perception, effectively strengthening their internal auditory memory and ability to construct harmony. Attached Figure Description

[0019] Figure 1 This is a schematic diagram of the process structure of the present invention. Detailed Implementation

[0020] To better explain and facilitate understanding of the present invention, a detailed description of the invention will be provided through specific embodiments.

[0021] Example: Please refer to Figure 1 This embodiment illustrates the implementation process of a popular singing real-time harmony-assisted training method, using the chorus of "The Ordinary Road" as an example.

[0022] Step 1: Material Preparation Trainees can install an application with AI voice separation capabilities on their smartphones, such as the Moises software.

[0023] Import the audio file of the song "The Ordinary Road" into the application, and use the vocal separation function to separate it into three independent audio files: "lead vocals," "harmony vocals," and "accompaniment."

[0024] Export the separated harmony files, namely the second audio stream and the accompaniment file, to your computer.

[0025] Step 2: Setting up the training environment Launch digital audio workstation software on your computer, such as BandLab or Cakewalk.

[0026] Create a new project and set up four audio tracks: Track 1 (Reference Track): Optional import of the separated "vocals" file for main melody reminders.

[0027] Track 2 (Guide Track): Imports the "Harmony" file exported from "Moises".

[0028] Track 3 (Accompaniment Track): Import the accompaniment file.

[0029] Track 4 (Practice Track): Set to recording preparation mode, and select the USB microphone connected to the computer as the input.

[0030] Simultaneously launch a pitch visualization software on your computer, such as Vocal Pitch Monitor, and set its input to the same USB microphone.

[0031] Step 3: Interactive Imitation Training Trainees wear headphones to ensure they can clearly hear the audio played by the projector.

[0032] Play track 1 (main melody), track 2 (guide track), and track 3 (accompaniment) in the DAW. The trainee sings along with the harmony of the guide track, and the pitch visualization software screen displays the pitch curve of the trainee's singing in real time.

[0033] Trainees observe whether their pitch curve matches the expected pitch direction of the song's harmony, and immediately adjust and sing again for any inaccurate notes.

[0034] After practicing several times, in the DAW, mute track 2 (guide track), and the trainee tries to sing the harmony with the accompaniment and main melody based solely on memory and hearing. When encountering difficulties, immediately cancel mute track 2 and sing along and imitate again. Repeat this switching until you can sing basically accurately without guidance.

[0035] Step 4: Self-construction Exercises The trainees first recorded their own rendition of the chorus melody of "The Ordinary Road" on a new audio track.

[0036] Consult the chord chart for that section, for example: CG-Am-F.

[0037] Create another recording track, play your own recorded melody, and look at the chord chart. For the first note of the melody, if it is a note within a C major chord, the trainee tries to hum a note that forms a third (such as an upper or lower third) with it and feels whether it is harmonious.

[0038] Record a melody that feels harmonious. After completing a phrase, listen back to the mix of the "main melody track" and the "custom harmony track" to judge the harmony. If you are not satisfied, you can delete and re-record or make minor adjustments.

[0039] Step 5: Practical Simulation Turn off all tracks containing the original vocals (track 1, track 2), and keep only the instrumental track (track 3) and the main melody track you recorded.

[0040] Create a new audio track to prepare for recording.

[0041] Trainees imagine themselves collaborating with another singer, accurately singing the harmonies to the accompaniment and completing the recording.

[0042] Play back the recording of the actual training session, conduct a final evaluation, and complete the training.

[0043] In the description of this invention, it should be understood that the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.

[0044] In this invention, unless otherwise explicitly specified and limited, the terms "installation," "connection," "linking," and "fixing," etc., should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral part; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; they can refer to the internal communication of two components or the interaction between two components. Those skilled in the art can understand the specific meaning of the above terms in this invention according to the specific circumstances.

[0045] In this invention, unless otherwise explicitly specified and limited, "above" or "below" the second feature can mean that the first and second features are in direct contact, or that they are in indirect contact through an intermediate medium. Furthermore, "above," "over," or "on top" the second feature can mean that the first feature is directly above or diagonally above the second feature, or simply indicates that the first feature is at a higher horizontal level than the second feature. "Below," "below," or "beneath" the second feature can mean that the first feature is directly below or diagonally below the second feature, or simply indicates that the first feature is at a lower horizontal level than the second feature.

[0046] In the description of this specification, the terms "one embodiment," "some embodiments," "embodiment," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0047] Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Those skilled in the art can make modifications, alterations, substitutions and variations to the above embodiments within the scope of the present invention.

Claims

1. A popular real-time harmony-assisted training method for singing, characterized in that, Includes the following steps: S1. Material preprocessing stage: Obtain the complete audio data of the target song, and use audio separation technology to separate a first audio stream containing at least the lead singer's voice and a second audio stream containing the harmonies from the complete audio data; S2. Interactive Layered Training Phase: Based on the first audio stream and the second audio stream, a multi-track audio environment containing a guide track and a practice track is constructed, and the accompaniment audio corresponding to the target song is loaded. The guide track is used to play the second audio stream, and the practice track is used to record the trainee's singing audio. S3. Real-time feedback training phase: The trainee wears headphones and sings synchronously while listening to the first audio stream and / or the accompaniment audio and following the second audio stream played by the guide track; the practice track records the trainee's singing audio in real time and performs real-time pitch analysis on the singing audio through a pitch visualization tool to generate a visualized pitch curve feedback. S4. Active Construction and Evaluation Phase: In the multi-track audio environment, the trainee first records the main melody singing audio, and then independently constructs and records the harmony audio based on the main melody singing audio and the chord information of the target song; the harmony is evaluated by comparing the harmony audio with the second audio stream, or by playing a mixed audio of the main melody singing audio and the harmony audio.

2. The method for real-time harmony-assisted training in popular singing according to claim 1, characterized in that, In step S1, the audio separation technology is performed using an AI-based audio separation application or software to separate independent harmonic and vocal audio streams.

3. The method for real-time harmony-assisted training in popular singing according to claim 1, characterized in that, In step S2, the multitrack audio environment is constructed using digital audio workstation software, and the accompaniment audio is either a pure accompaniment separated from the original song or an intelligent accompaniment generated based on the chord progression of the target song.

4. The method for real-time harmony-assisted training in popular singing according to claim 1, characterized in that, The "synchronous singing following the second audio stream played on the guide track" in step S3 includes: S31, First Sing-along Mode: Simultaneously play the guide track and the second audio stream, and the trainee sings along synchronously; S32, Second Sing-along Mode: Only the first audio stream and the accompaniment audio are played, the guide track is turned off, and the trainee sings the harmony part independently; the trainee can switch between the first singing-along mode and the second singing-along mode.

5. The method for real-time harmony-assisted training in popular singing according to claim 1, characterized in that, In step S3, the pitch visualization tool is a standalone mobile application or a plugin integrated into digital audio workstation software. It can convert the received singing audio signal into a visual graphic or spectrum in real time and compare and display it with the target pitch baseline.

6. The method for real-time harmony-assisted training in popular singing according to claim 1, characterized in that, The "independent construction and recording of harmonic audio" in step S4 includes: the trainee, based on preset harmonic construction rules, or by exploratory humming, finds pitches that harmonize with the main melody audio and the current chord, forms a harmonic melody line, and then records it.

7. The method for real-time harmony-assisted training in popular singing according to claim 6, characterized in that, The preset harmonic construction rules include at least one of the rules for superimposing thirds, fifths, and octaves.

8. The method for real-time harmony-assisted training in popular singing according to claim 1, characterized in that, It also includes step S5: the practical simulation stage, which includes: the trainee sings the harmony of a specified section of the target song based solely on the accompaniment audio without listening to the original harmony in advance, and records the singing process as practical audio for playback evaluation.