A device for assisting training of speech articulation in children
By introducing a stable structure, multimodal fusion, and personalized strategies into a children's speech clarity training device, the problems of unstable assessment and misjudgment in existing technologies are solved, and efficient, reliable, and privacy-controlled effect tracking of children's speech training is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GANZHOU MATERNAL & CHILD HEALTH HOSPITAL
- Filing Date
- 2026-04-27
- Publication Date
- 2026-06-19
AI Technical Summary
Existing speech intelligibility training devices for children, when implemented in homes, suffer from unstable assessment results due to structural and sound pickup instability. They lack confidence control, differentiated feedback, and personalized strategies, making it easy to misjudge and discourage children. Training is difficult to advance and its effects are hard to track.
A speech intelligibility training device for children was designed. It adopts a layered stacked structure of a damped positioning hinge, an adjustable universal microphone arm and microphone head, a main control logic board and an expansion board. Combined with a lip-sync camera module and a multimodal fusion strategy, it implements confidence control and personalized training strategies, providing stable sound pickup, privacy control and highly reliable assessment.
It improves the stability and reliability of the training process, reduces evaluation fluctuations caused by posture changes and noise interference, enhances the privacy and controllability of training and the credibility of feedback, enables personalized progression and traceable results, and improves the smoothness of training and user experience.
Smart Images

Figure CN122245159A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of children's oral language training technology, specifically to a device for assisting in the training of children's speech clarity. Background Technology
[0002] Children's speech clarity training usually targets issues such as unclear articulation, phoneme confusion, omissions or additions of sounds, and uncontrolled speech rate. Training methods include demonstration, imitation, error correction, repetition and reinforcement, as well as situational generalization exercises. With the development of multimedia computing and speech processing technology, training systems or devices based on camera capture of mouth movements and audio capture of speech signals have gradually emerged in the industry. These systems provide prompts and evaluations by analyzing mouth images and speech signals.
[0003] In existing technologies, such as the oral pronunciation training device for English teaching with publication number CN113257056A, the proposed method involves collecting and comparing lip movement data and audio recordings of students during pronunciation training through an English teaching system to determine the accuracy of the pronunciation training. The accuracy is displayed on a monitor, allowing students to see and check their learning accuracy records in real time. Furthermore, the use of dot matrix projection technology makes the reading of lip movement data more accurate and allows for comparison and evaluation with data from a cloud database, which is more conducive to controlling students' learning progress and has potential for widespread application. However, existing technologies may still have the following problems in engineering implementation and home use: Firstly, the display, sound pickup, power supply, and sound output components of the training device... Without a highly consistent integrated and stable positioning structure tailored to children's scenarios, the evaluation results are easily affected by changes in the trainee's posture, sound pickup distance, and environmental noise, thus reducing the credibility and reproducibility of training feedback. Secondly, without confidence control, differentiated feedback, and data-driven personalized strategies, the system is prone to misjudging uncertain samples and outputting negative evaluations, causing frustration and decreased compliance in children. At the same time, training content is difficult to automatically advance with ability improvement, and training effects are not easy to form a traceable closed loop. The above problems correspond to improvements in structural stability and near-field sound pickup, privacy control and pattern adaptation, and highly reliable evaluation and closed-loop personalized training to improve the overall training effect and user experience. Summary of the Invention
[0004] The purpose of this invention is to provide a speech intelligibility training device for children, in order to solve the problems mentioned in the background art. In home implementation, the existing technology often suffers from unstable structure and sound pickup, leading to fluctuating assessments. Furthermore, it lacks confidence control, differentiated feedback, and personalized closed-loop strategies, which can easily result in misjudgment and discouragement of children, making it difficult to advance training and track results.
[0005] The technical solution adopted by this application to solve its technical problem is: a speech intelligibility training device for children, comprising: a base, a middle frame shell assembly disposed on the base, the middle frame shell assembly and the base enclosing a receiving cavity, and a sound-emitting mesh disposed on the outside of the middle frame shell assembly;
[0006] The display screen is hinged to the middle frame housing assembly via a pivot hinge, and a lip-sync camera module is provided on one side of the display screen.
[0007] A omnidirectional microphone arm is disposed on one side of the mid-frame housing assembly, and a microphone head is disposed on the omnidirectional microphone arm;
[0008] The main control logic board and the expansion board are disposed within the receiving cavity;
[0009] A large button is disposed on the middle frame housing assembly and electrically connected to the main control logic board or expansion board;
[0010] The battery, acoustic module area, and speaker are disposed in the base and electrically connected to the main control logic board. A charging port is provided on one side of the middle frame housing assembly, and the charging port is electrically connected to the battery and acoustic module area.
[0011] The side of the mid-frame housing assembly is provided with a side sensing group, which is electrically connected to the main control logic board and / or expansion board. The side sensing group includes at least one of an environmental noise detection unit, a distance / proximity detection unit, and an attitude detection unit.
[0012] Preferably, the hinge is a damped positioning hinge structure, which allows the display screen to stay and remain stable at multiple pitch angles.
[0013] Preferably, the lip-sync camera module is located on or near the side bezel of the display screen and is equipped with a physical shield and / or an electronically controlled start / stop switch to achieve privacy control of lip-sync capture.
[0014] Preferably, the omnidirectional microphone arm is a malleable gooseneck structure or a multi-segment omnidirectional joint structure, so that the microphone head can be adjusted and held in a preset pickup area in front of the trainee's mouth for near-field pickup.
[0015] Preferably, the main control logic board and the expansion board are electrically connected via board-to-board connectors, ribbon cable connectors or pluggable connectors and form a layered stacked structure, wherein the expansion board integrates at least one of audio front-end circuits, power amplifier circuits, power management circuits and peripheral interface circuits.
[0016] Preferably, the main control logic board is configured to execute speech evaluation logic: perform endpoint detection and feature extraction on the speech collected by the microphone head, and output at least a clarity score and error correction prompt information at the phoneme or syllable level.
[0017] Preferably, the main control logic board is configured to execute a multimodal fusion strategy: when the lip-sync camera module is turned on, the lip image features and speech features are jointly determined to generate training guidance information; when the lip-sync camera module is turned off, the training guidance information is generated only based on the speech features, and the corresponding training guidance interface is output on the display screen.
[0018] Preferably, the main control logic board is configured to execute a confidence control strategy: when the confidence of the current evaluation result is lower than a preset threshold, a "repeat" prompt is triggered and negative judgment output is suppressed; when the confidence is higher than the preset threshold, clear correct feedback or error correction feedback is output.
[0019] Preferably, the expansion board or the main control logic board is connected to a status feedback unit, the status feedback unit including a microphone head, a light-emitting structure on the device body, and a speaker; the main control logic board is configured to output at least two different feedback states according to the training guidance information, to indicate "achievement reward" and "error correction / repeating" respectively.
[0020] Preferably, the main control logic board is configured to execute a personalized training strategy: in the initial stage, output a baseline test task to determine the target error-prone sounds or target phoneme set; in the training stage, dynamically adjust the difficulty, repetition count, and prompt intensity of the training content based on the trainee's historical scores, error types, and completion rate, and generate training report data to display training trends on the display screen.
[0021] The beneficial effects of this application are:
[0022] This application provides a speech intelligibility training device for children, which integrates a display screen, a damped hinge, an adjustable omnidirectional microphone arm and microphone head, a layered stacked structure of a main control logic board and an expansion board, a battery and acoustic module area, and a speaker into one unit. It achieves stable sound output and protection on the trainee side through a speaker mesh, so that the screen tilt can be kept stable during training, the pickup position can be repeatedly fixed, and the power supply is continuous and reliable. This significantly reduces the evaluation fluctuations caused by changes in posture, pickup distance, and external noise in home or institutional environments, improves the consistency and reproducibility of phoneme / syllable level intelligibility scores, and enhances the smoothness and usability of the overall training process.
[0023] This application provides a speech intelligibility training device for children. A lip-reading camera module is positioned on or near the side bezel of a display screen and equipped with a physical shield and / or an electronically controlled start / stop switch. This allows trainees to quickly switch between "lip-reading assisted training" and "pure speech training" as needed. It provides lip image support to improve the intuitiveness of error correction and learning efficiency when visual guidance of speech is required. Furthermore, it blocks image acquisition at the source in privacy-sensitive or inconvenient scenarios to enhance user trust and acceptance in home settings. Simultaneously, it avoids negative emotions caused by a constantly open camera, thus achieving a closed-loop training experience that balances improved effectiveness and controllable privacy.
[0024] This application provides a speech clarity enhancement training device for children. Based on speech assessment, the main control logic board introduces multimodal fusion and confidence control, linking the microphone head / body light-emitting structure and speaker to form multi-channel real-time feedback. Simultaneously, through baseline testing and historical data-driven personalized training strategies, the system dynamically adjusts task difficulty, repetition frequency, and prompt intensity. When the assessment is uncertain, the system prioritizes triggering "re-speaking for confirmation" and suppresses negative feedback from misjudgments to reduce children's frustration. When the assessment is reliable, it outputs clear corrections or rewards to reinforce correct pronunciation acquisition. The training results are transformed into a trend report for visualization. Thus, through a continuous mechanism of "fewer misjudgments, stronger incentives, progressive learning, and traceability," the system improves participation, training completion rate, and the speed of clarity improvement.
[0025] In addition to the purposes, features, and advantages described above, this application has other purposes, features, and advantages. These will be further described in detail below with reference to figures. Attached Figure Description
[0026] Figure 1 This is a schematic diagram of the overall structure of the present invention;
[0027] Figure 2 This is a schematic diagram of the overall explosion effect structure of the present invention;
[0028] Figure 3 This is a schematic diagram of the overall workflow of the present invention;
[0029] Figure 4 This is a schematic diagram of the multimodal fusion strategy of the present invention;
[0030] Figure 5 This is a schematic diagram of the confidence control strategy of the present invention;
[0031] Figure 6 This is a schematic diagram of the state feedback unit of the present invention;
[0032] Figure 7 This is a schematic diagram of the personalized training strategy of the present invention.
[0033] Drawing number explanation:
[0034] 1. Display screen; 2. Lip-tracking camera module; 3. Hinge; 4. Microphone head; 5. Universal microphone arm; 6. Mid-frame housing assembly; 7. Main control logic board; 8. Expansion board; 9. Side sensor group; 10. Large button; 11. Battery and acoustic module area; 12. Speaker; 13. Base; 14. Charging port; 15. Speaker mesh. Detailed Implementation
[0035] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.
[0036] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present application.
[0037] Please refer to Figures 1 to 7A speech intelligibility training device for children includes: a base 13, on which a middle frame housing assembly 6 is mounted, the middle frame housing assembly 6 and the base 13 forming a receiving cavity, and a speaker mesh 15 on the outside of the middle frame housing assembly 6; a display screen 1, which is hinged to the middle frame housing assembly 6 via a pivot hinge 3, and a lip-tracking camera module 2 is mounted on one side of the display screen 1; a universal microphone arm 5, which is mounted on one side of the middle frame housing assembly 6, and a microphone head 4 is mounted on the universal microphone arm 5; a main control logic board 7 and an expansion board 8, which are located within the receiving cavity; a large button 10, which is mounted on the middle frame housing assembly 6 and electrically connected to the main control logic board 7 or the expansion board 8; a battery, an acoustic module area 11, and a speaker 12, which are located within the base 13 and electrically connected to the main control logic board 7, and a speaker mesh 15 on one side of the middle frame housing assembly 6. A charging port 14 is provided, which is electrically connected to the battery and the acoustic module area 11. A side sensor group 9 is provided on the side of the middle frame shell assembly 6. The side sensor group 9 is electrically connected to the main control logic board 7 and / or the expansion board 8. The side sensor group 9 includes at least one of an environmental noise detection unit, a distance / proximity detection unit, and an attitude detection unit. The pivot hinge 3 is a pivot structure with damping positioning, which allows the display screen 1 to stay and remain stable at multiple pitch angles. The main control logic board 7 and the expansion board 8 are electrically connected through a board-to-board connector, a ribbon cable connector, or a plug-in connector and form a layered stacked structure. The expansion board 8 integrates at least one of an audio front-end circuit, a power amplifier circuit, a power management circuit, and a peripheral interface circuit. The main control logic board 7 is configured to execute voice evaluation logic: perform endpoint detection and feature extraction on the voice collected by the microphone head 4, and output a clarity score result and error correction prompt information that includes at least the phoneme level or the syllable level.
[0038] Specifically, after the device is powered on, the battery and acoustic module area 11 are powered by the power management circuit to the main control logic board 7 and expansion board 8. The battery can also be replenished externally through the charging port 14 to ensure continuous and stable training and avoid interruptions caused by frequent power supply changes. At the start of training, the trainee triggers the training or assessment mode by pressing the large button 10 on the middle frame shell assembly 6. The separately designed large button 10 is easy for young children to understand and operate, reducing accidental touches and improving training compliance. Subsequently, the main control logic board 7 drives the display screen 1 to output the training guidance interface. The display screen 1 is maintained at a tilt angle that matches the trainee's line of sight through the damped positioning hinge 3, so that it can stably present lip-syncing under different sitting postures and height conditions. The system provides visual aids and task guidance, improving viewing comfort and reducing distraction caused by screen movement. During training, the trainee speaks into the microphone head 4 on the omnidirectional microphone arm 5. The omnidirectional microphone arm 5 stably positions the microphone head 4 within the near-field pickup area in front of the mouth, improving the signal-to-noise ratio of the collected speech and reducing background noise interference, thus improving the accuracy and consistency of subsequent evaluations from the source. The audio front-end circuitry integrated into the expansion board 8 amplifies, filters, and performs analog-to-digital conversion on the collected speech, and transmits the audio data to the main control logic board 7. The main control logic board 7 performs endpoint detection on the speech to automatically extract valid pronunciation segments, avoiding misjudgment caused by mistaking breathing sounds, pauses, or ambient sounds as training speech. It also further performs special processing... The system extracts and generates phoneme-level or syllable-level clarity scores and error correction prompts, enabling it to output guidance information such as "where is unclear and how to correct it" rather than simply providing good / bad results, thereby improving the targeting and operability of training. When more intuitive articulation guidance is needed, the lip-shape acquisition camera module 2 on the side of the display screen 1 captures images of the trainee's mouth and links them with the speech evaluation results. Based on this, the main control logic board 7 synchronously presents lip shape illustrations, opening degree, or lip shape trends on the screen, allowing the trainee to combine auditory imitation with visual comparison, reducing the understanding threshold of abstract pronunciation rules and improving error correction efficiency. After the main control logic board 7 generates training guidance information, it displays it on the display screen 1 using animations, progress bars, or comparison prompts. The system outputs visual feedback in one way, and on the other hand, the amplifier circuit of the expansion board 8 can drive the speaker 12 to play standard demonstration sounds, error correction prompts, or reward sound effects. The sound from the speaker 12 is transmitted through the speaker mesh 15 on the outside of the mid-frame housing assembly 6, which facilitates clear sound output and also protects the speaker 12 from dust, thus achieving a more stable auditory prompt effect in home or institutional settings. At the same time, the main control logic board 7 records the scoring results and error types of each pronunciation and uses them to dynamically adjust the prompt intensity and repetition frequency of subsequent tasks, transforming the training from a "fixed question bank" to a personalized process of "adjustment as you practice," achieving faster clarity improvement and higher training completion rates while ensuring that the training burden for children is controllable.
[0039] Furthermore, refer to Figures 1 to 2 The lip-sync camera module 2 is located on the side frame or near the side frame of the display screen 1 and is equipped with a physical shield and / or an electronically controlled start / stop switch to achieve privacy control of lip-sync capture. The universal microphone arm 5 is a malleable gooseneck structure or a multi-segment universal joint structure, so that the microphone head 4 can be adjusted and kept in a preset pickup area in front of the trainee's mouth for near-field pickup.
[0040] The lip-reading camera module 2 is located on or near the side bezel of the display screen 1. During use, the trainee or guardian can choose to turn lip-reading capture on or off according to training needs. When lip-reading assisted training is required, the physical obstruction is removed, and the lip-reading camera module 2 is activated via the electronic start / stop switch to capture mouth image information. This allows the system to provide more intuitive lip shape and mouth opening prompts, reducing the difficulty of articulation learning. During routine training or in scenarios with high privacy requirements, the physical obstruction is maintained, and / or the electronic start / stop switch is turned off, stopping the lip-reading camera module 2 from capturing images. By blocking image acquisition at the source, clear privacy control is achieved, avoiding user concerns about cameras being constantly on and improving acceptability in home settings. Meanwhile, the omnidirectional microphone arm 5 adopts a malleable gooseneck structure or a multi-segment omnidirectional joint structure, allowing the microphone head 4 to be adjusted to a preset pickup area in front of the trainee's mouth before training and kept in a stable position. This makes the pickup distance and angle repeatable and maintainable, thereby forming near-field pickup to improve the speech signal-to-noise ratio and reduce environmental noise and echo interference. This not only improves the accuracy of speech evaluation but also reduces training interruptions caused by frequent equipment adjustments, thereby improving overall training efficiency and experience consistency.
[0041] Furthermore, refer to Figures 3 to 4 The main control logic board 7 is configured to execute a multimodal fusion strategy: when the lip-shape acquisition camera module 2 is turned on, the lip image features and speech features are jointly determined to generate training guidance information; when the lip-shape acquisition camera module 2 is turned off, the training guidance information is generated only based on the speech features, and the corresponding training guidance interface is output on the display screen 1.
[0042] When the main control logic board 7 executes the multimodal fusion strategy, the system first reads the working status of the lip-shape acquisition camera module 2 and selects different information processing paths accordingly. When the lip-shape acquisition camera module 2 is turned on, the mouth image and the voice acquired by the microphone head 4 are simultaneously input to the main control logic board 7. The main control logic board 7 extracts the mouth image features and voice features respectively and performs joint judgment to generate training guidance information. This allows the lip-shape information to improve the reliability of judgment and the specificity of error correction even when the voice is interfered with by noise, some phonemes are easily confused, or it is difficult to locate the pronunciation part by sound alone. This provides more accurate prompts to the trainee on "where to exert force and how to position themselves". When the lip-shape acquisition camera module 2 is turned off, the main control logic board 7 automatically switches to a judgment method based solely on voice features and outputs a training guidance interface matching this mode on the display screen 1. For example, it can provide enhanced demonstration sounds, rhythm prompts, or phoneme comparison prompts. This allows the clarity training to be completed without acquiring images, balancing privacy and computing power. Furthermore, the adaptive mode avoids the need for users to perform additional configuration operations due to turning the camera on and off, improving the applicability and continuity of the device in different scenarios.
[0043] Furthermore, refer to Figure 3 , Figure 5 The main control logic board 7 is configured to execute a confidence control strategy: when the confidence of the current evaluation result is lower than the preset threshold, a "repeat" prompt is triggered and negative judgment output is suppressed; when the confidence is higher than the preset threshold, clear correct feedback or error correction feedback is output.
[0044] Specifically, when the main control logic board 7 executes the confidence control strategy, it simultaneously calculates the confidence of the evaluation result after completing feature extraction and scoring of a pronunciation and compares it with a preset threshold. When the confidence is lower than the preset threshold, the main control logic board 7 triggers a "re-speak" prompt and suppresses negative judgment output. This prevents the system from directly giving "error" feedback when the results are unstable due to sudden background noise, short pronunciation by the trainee, incomplete articulation, or deviation in the collection posture. This avoids the frustration caused by misjudgment and reduces ineffective error correction during training. When the confidence is higher than the preset threshold, the main control logic board 7 outputs clear correct feedback or error correction feedback, so that effective samples are strengthened and corrected in a timely manner. This improves the positive incentive effect and error correction efficiency of training while ensuring the credibility of feedback. At the same time, by converting uncertain samples into a "re-speak confirmation" process, it improves data quality and makes subsequent personalized statistics more accurate.
[0045] Furthermore, refer to Figure 3 , Figure 6The expansion board 8 or the main control logic board 7 is connected to a status feedback unit, which includes a microphone head 4, a light-emitting structure on the device body, and a speaker 12. The main control logic board 7 is configured to output at least two different feedback states according to the training guidance information to indicate "achievement reward" and "error correction / repeating" respectively.
[0046] Specifically, the expansion board 8 or the main control logic board 7 is connected to the status feedback unit, which includes a light-emitting structure on the microphone head 4, a light-emitting structure on the device body, and a speaker 12. During training, after generating training guidance information, the main control logic board 7 selects to output different feedback states based on the training results. When the target is met, the microphone head 4 or the light-emitting structure on the device body is driven into a reward state, and a reward sound effect or affirmation prompt is played through the speaker 12, so that the trainee receives multi-channel reinforcement feedback when completing the correct pronunciation, thereby improving the child's participation and willingness to continue training. When it is determined that error correction or repetition is required, the main control logic board 7 drives the light-emitting structure into a prompt state and plays an error correction prompt sound or repetition instruction through the speaker 12. At the same time, the corresponding lip shape or rhythm guidance is presented on the display screen 1, so that the trainee can quickly understand the action to be taken and immediately repeat the practice. The immediacy of light and sound compensates for the attention distraction that may be caused by visual cues alone, thereby improving the error correction response speed and training continuity.
[0047] Furthermore, refer to Figure 3 , Figure 7 The main control logic board 7 is configured to execute personalized training strategies: in the initial stage, it outputs a baseline test task to determine the target error-prone sounds or target phoneme set; in the training stage, it dynamically adjusts the difficulty, repetitions and prompt intensity of the training content based on the trainee's historical scores, error types and completion rates, and generates training report data to display training trends on the display screen 1.
[0048] Specifically, when the main control logic board 7 executes personalized training strategies, the system first outputs a baseline test task in the initial stage and guides the trainee to complete the pronunciation of several representative phonemes, syllables, or words. The main control logic board 7 scores and statistically analyzes the baseline speech and identifies the target mispronounced sounds or target phoneme sets, so that subsequent training can start from the weakest link to reduce blind practice. During the training phase, the main control logic board 7 continuously records the trainee's historical scores, error types, and completion rates, and dynamically adjusts the difficulty, repetition frequency, and prompt intensity of the training content accordingly. For example, it increases the decomposition practice and demonstration prompts for target sounds with persistent errors, and reduces the repetition of target sounds that have been stably mastered and elevates them to the word and sentence level, thereby ensuring the training challenge while avoiding being too difficult to cause frustration or too easy to cause inefficiency. At the same time, the main control logic board 7 generates training report data and displays the training trend on the display screen 1, so that the caregiver or trainer can intuitively see the improvement in clarity, changes in mispronounced sounds, and training completion status, which facilitates timely adjustment of the training plan and the formation of a traceable rehabilitation loop, thereby improving the manageability and long-term effectiveness of home training.
[0049] Those skilled in the art should understand that the discussion of any of the above embodiments is merely exemplary. Under the framework of this invention, the technical features of the above embodiments or different embodiments can also be combined, the steps can be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
[0050] This invention is intended to cover all such substitutions, modifications, and variations that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this invention should be included within the scope of protection of this invention.
Claims
1. A speech intelligibility training device for children, characterized in that, include: A base (13) is provided with a middle frame shell assembly (6), which is arranged with the base (13) to form a receiving cavity, and a speaker mesh (15) is provided on the outside of the middle frame shell assembly (6). Display screen (1), the display screen (1) is hinged to the middle frame housing assembly (6) by a pivot hinge (3), and a lip-shape capture camera module (2) is provided on one side of the display screen (1). A omnidirectional microphone arm (5) is disposed on one side of the mid-frame housing assembly (6), and a microphone head (4) is disposed on the omnidirectional microphone arm (5). The main control logic board (7) and the expansion board (8) are disposed within the receiving cavity; Large button (10), the large button (10) is disposed on the middle frame housing assembly (6) and electrically connected to the main control logic board (7) or expansion board (8); The battery, acoustic module area (11) and speaker (12) are disposed in the base (13) and electrically connected to the main control logic board (7). A charging port (14) is provided on one side of the middle frame shell assembly (6), and the charging port (14) is electrically connected to the battery and acoustic module area (11). The side of the mid-frame housing assembly (6) is provided with a side sensing group (9), which is electrically connected to the main control logic board (7) and / or expansion board (8). The side sensing group (9) includes at least one of an environmental noise detection unit, a distance / proximity detection unit, and an attitude detection unit.
2. The speech intelligibility training device for children according to claim 1, characterized in that, The pivot hinge (3) is a pivot structure with damping positioning, which allows the display screen (1) to stay and remain stable at multiple pitch angles.
3. The speech intelligibility training device for children according to claim 1, characterized in that, The lip-shape capture camera module (2) is located on the side frame or adjacent to the side frame of the display screen (1), and is equipped with physical shielding and / or electronic start / stop switch to achieve privacy control of lip-shape capture.
4. The speech intelligibility training device for children according to claim 1, characterized in that, The omnidirectional microphone arm (5) is a malleable gooseneck structure or a multi-segment omnidirectional joint structure, so that the microphone head (4) can be adjusted and kept in a preset pickup area in front of the trainee's mouth for near-field pickup.
5. The speech intelligibility training device for children according to claim 1, characterized in that, The main control logic board (7) and the expansion board (8) are electrically connected through a board-to-board connector, ribbon cable connector or plug-in connector and form a layered stacked structure, wherein the expansion board (8) integrates at least one of the following: audio front-end circuit, power amplifier circuit, power management circuit and peripheral interface circuit.
6. The speech intelligibility training device for children according to claim 1, characterized in that, The main control logic board (7) is configured to perform speech evaluation logic: perform endpoint detection and feature extraction on the speech collected by the microphone head (4), and output at least a clarity score and error correction prompt information at the phoneme or syllable level.
7. The speech intelligibility training device for children according to claim 1, characterized in that, The main control logic board (7) is configured to execute a multimodal fusion strategy: when the lip-shape acquisition camera module (2) is turned on, the lip image features and speech features are jointly determined to generate training guidance information; When the lip-sync camera module (2) is turned off, training guidance information is generated based only on speech features, and the corresponding training guidance interface is output on the display screen (1).
8. The speech intelligibility training device for children according to claim 1, characterized in that, The main control logic board (7) is configured to execute a confidence control strategy: when the confidence of the current evaluation result is lower than the preset threshold, a "repeat" prompt is triggered and negative judgment output is suppressed; when the confidence is higher than the preset threshold, clear correct feedback or error correction feedback is output.
9. The speech intelligibility training device for children according to claim 1, characterized in that, The expansion board (8) or the main control logic board (7) is connected to a status feedback unit, which includes a microphone head (4), a light-emitting structure on the device body, and a speaker (12). The main control logic board (7) is configured to output at least two different feedback states according to the training guidance information, to indicate "achievement reward" and "error correction / repeating" respectively.
10. A speech intelligibility training device for children according to claim 1, characterized in that, The main control logic board (7) is configured to execute a personalized training strategy: outputting a baseline test task in the initial stage to determine the target mispronounced sounds or the target phoneme set; During the training phase, the difficulty, number of repetitions, and intensity of prompts of the training content are dynamically adjusted based on the trainee’s historical scores, error types, and completion rates, and training report data is generated to display training trends on the display screen (1).