System and method for interactive audio and visual accompaniment

The system dynamically adapts audio and visual accompaniment based on text, speech, and user data to create a personalized and immersive experience, addressing the limitations of existing technologies by enhancing comprehension and engagement.

WO2026142475A1PCT designated stage Publication Date: 2026-07-02KRIKUNOV ALEKSEY NIKOLAEVICH

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
KRIKUNOV ALEKSEY NIKOLAEVICH
Filing Date
2026-02-26
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing systems for audio and visual accompaniment lack dynamic adaptation to text content, user physiological and emotional states, and external factors, and do not integrate with external devices to create a personalized and immersive experience.

Method used

A system that dynamically adapts audio and visual accompaniment in real-time based on deep analysis of text and speech parameters, user biometric data, and environmental factors, integrating with external devices to control environmental conditions.

Benefits of technology

The system provides personalized and immersive content perception, reducing training time by 20%, increasing reading and listening speed by 15-20%, and improving comprehension and engagement by 20% through emotional enrichment.

✦ Generated by Eureka AI based on patent content.
Patent Text Reader

Abstract

The invention relates to the field of information technology and multimedia systems, and more particularly to methods and systems for creating personalized audio and visual accompaniment to reading written content, listening, inter alia to speech, or other ways of consuming content. The technical result consists in significantly shortening learning time, increasing reading and listening speed, improving comprehension and recall of a read text, and increasing the user's concentration. A system for creating interactive audio and video accompaniment to reading, listening, narrating or other ways of consuming content comprises a data input module, a data preprocessing module, a data analysis and interpretation module, a synchronization module, a physiological data module, an environmental data module, a data combining module, an audio and video accompaniment generating module, an enhancement module, an external device integration module, a personalization and settings module, a feedback and self-learning module, a collaboration module, and an audio and visual accompaniment display module.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] System and method of interactive audio and visual accompaniment

[0002] Field of technology

[0003] The invention relates to information technology and multimedia systems, specifically to methods and systems for creating personalized audio and visual accompaniment while reading, listening to text content, listening to speech, or otherwise consuming content. The system provides interactive and immersive content perception by adapting to user characteristics, text context, external factors, and the speaker's speech parameters.

[0004] State of the art

[0005] U.S. Patent No. 8,135,591 (published March 13, 2012) discloses a method and system for training a text-to-speech system for use in speech synthesis. The method includes creating a speech database of audio files containing voices related to a specific subject area.

[0006] The disadvantage of the known solution is the low level of naturalness of the human voice in synthesized speech.

[0007] US Patent No. US9269347 (published February 23, 2016) discloses a method for converting text to speech, capable of outputting speech with a selected speaker voice and a selected speaker attribute.

[0008] The disadvantage of the known solution is the lack of dynamic adaptation of the accompaniment to the content of the text, as well as the lack of consideration of the physiological state of the user and the parameters of his speech, and limited or absent personalization.

[0009] None of the described systems utilize deep semantic and emotional analysis of text to generate audio and visual narration. Known solutions do not collect or process user biometric data, such as heart rate, breathing rate, or facial expressions, nor do they analyze speech parameters to adapt narration. Existing solutions do not provide the ability to deeply personalize the reading experience, text content, or speech based on the user's individual preferences and states, nor do they integrate with external devices (lighting systems, haptic devices, or smart home systems) to control the user's environment.

[0010] Advantages of the proposed invention compared to the prior art: • Dynamic adaptation of accompaniment in real time: deep analysis of text and speech parameters to generate personalized audio and visual accompaniment synchronized with the pace of reading, listening or speaking of the user.

[0011] • Taking into account physiological data and external factors: collecting and processing the user’s biometric data and environmental data allows the system to adapt support to the user’s current physiological and emotional state.

[0012] • Deep personalization and customization: The user has the ability to customize the experience, including choosing thematic audio and visual styles, and the system learns from their preferences to further improve the experience.

[0013] • Integration with external devices to create a fully immersive experience: the system is capable of controlling external devices, providing dynamic changes in lighting, temperature and other environmental parameters.

[0014] • Use of advanced AI and machine learning technologies: The use of advanced natural language processing models, neural networks and machine learning algorithms ensures high accuracy and efficiency of the system.

[0015] Thus, the proposed invention solves the problems inherent in existing systems and provides unique functionality that is not available in the current state of the art.

[0016] Disclosure of the essence of the invention

[0017] A system and method for interactive audio and visual accompaniment for a user's reading, listening, or speaking is proposed. This accompaniment dynamically adapts to the text content, the user's physiological and emotional state, external factors, and the speaker's speech parameters. The system accepts text and / or voice data as input, analyzes it using specialized algorithms, and generates personalized accompaniment in real time, synchronizing with the user's reading, listening, or speaking tempo.

[0018] The invention is a system for interactive audio and visual support that achieves specific technical results through the deep integration of text analysis, speech analysis, physiological data collection, audio and visual support generation, user personalization, and integration with external devices. The key technical advantages of the system are:

[0019] 1. Personalized support with advanced customization options:

[0020] o Technical advantage: The system provides the user with the ability to customize the accompaniment parameters according to personal preferences, including the choice of thematic audio and visual styles, the intensity of effects, and even the stylistic features of the content.

[0021] How it's achieved: The personalization and settings module interacts with the audio and visual accompaniment generation module, allowing the user to select or create accompaniment profiles. The system uses machine learning and natural language processing algorithms to adapt the generated content to the selected settings.

[0022] 2. Content generation based on films and other source materials:

[0023] o Technical advantage: The system can generate audio and visual accompaniment in the style of specific films, musical works, or other sources, providing a unique and varied experience of interacting with the content. o How it is achieved: The audio and visual accompaniment generation module includes style databases of various sources. When analyzing text, the system can apply stylistic elements from the selected sources. For example, a user can select Quentin Tarantino's style to accompany the reading (listening, telling) of the fairy tale "Kolobok," and the system will generate corresponding audio and visual effects reflecting the characteristic features of this style.

[0024] The system and method ensure the achievement of the following specific technical results:

[0025] 1. A 20% reduction in training time is achieved through precise personalization and emotional enhancement of information perception, i.e. the creation of immersive support that is generated individually for the user;

[0026] 2. An increase in reading and listening speed by 15-20% is ensured by synchronizing the pace with the physiological and emotional state of the user;

[0027] 3. Improved quality of interaction through the integration of multimodal data and personalized support.

[0028] Emotionally rich text, offering a new emotional (immersive) experience, improves reading and listening comprehension. This is because the emotional enrichment of the text engages deeper cognitive processes, engaging the reader (listener, narrator) in analyzing information on an emotional and kinesthetic level. The reader (listener) is less likely to reread (re-listen) to the text, as the semantic structure of the text becomes intuitive thanks to emotional associations and auditory and visual imagery. By avoiding re-reading, the user perceives information 15-20% faster, demonstrating a significant improvement in reading or listening speed. Furthermore, the use of thematic presets, such as the style of famous films, enhances user engagement. This creates a unique experience where the user draws parallels between the film's plot and the text, increasing interest.Increased user engagement also leads to significantly improved learning. Users are less distracted by external factors, as the emotional, auditory, and visual accompaniment keeps their attention focused on the content. This results in a 20% improvement in learning, confirming the effectiveness of an emotionally-focused approach to learning.

[0029] The claimed technical result is achieved by using an interactive audio and visual support system comprising a data input module, a data pre-processing module, a data analysis and interpretation module, a synchronization module, a physiological data module, an environmental data module, a data merging module, an audio and visual support generation module, an expansion module, an integration module with external devices, a personalization and settings module, a feedback and self-training module, a collaboration module, a module for demonstrating audio and visual support, wherein the data input module is connected to the pre-processing module, which is connected to the data analysis and interpretation module, which is connected to the synchronization module, which is connected to the data merging module, which is connected to the audio and visual support generation module,which is connected to a module for demonstrating audio and visual accompaniment and a module for integrating with external devices, which is connected to a module for demonstrating audio and visual accompaniment, which is connected to a feedback and self-learning module, which is connected to a data fusion module, the collaboration module is connected to a personalization and settings module, which is connected to the data fusion module, the physiological data module and the environmental data module are connected to a synchronization module, the expansion module is connected to the audio and visual accompaniment generation module. The claimed technical result is also achieved by using a method for interactive audio and visual accompaniment, including the steps of receiving input data, cleaning and normalizing data, semantic and emotional analysis, collecting and analyzing data from the physiological data and environmental data modules, adapting to the user, merging data from different sources,creation of audio and visual support for the integration, exchange and adaptation of data for external devices, display of content to the user and collection of feedback. Brief description of the drawings,

[0030] Fig. 1 shows the structural diagram of the system.

[0031] Fig. 2 shows a block diagram of the algorithm that reveals the method implemented by the system.

[0032] The system (Fig. 1) contains the following blocks:

[0033] 101 - data entry module

[0034] 102 - data pre-processing module

[0035] 103 - Data Analysis and Interpretation Module

[0036] 104 - synchronization module

[0037] 105 - physiological data module

[0038] 106 - Environment Data Module

[0039] 107 - Data Merger Module

[0040] 108 - module for generating audio and visual accompaniment

[0041] 109 - expansion module

[0042] PO - module for integration with external devices

[0043] 111 - personalization and settings module

[0044] 112 - Feedback and self-training module

[0045] 113 - Collaboration Module

[0046] 114 - module for demonstrating audio and visual accompaniment.

[0047] The algorithm (Fig. 2) includes the following steps (the system modules that implement the algorithm blocks are indicated in brackets):

[0048] Block 200: Receiving Input Data (Data Input Module, 101)

[0049] Block 201: Data Cleaning and Normalization (Pre-Processing Module, 102)

[0050] Block 202: Semantic and emotional analysis (Data Analysis Module, 103) Block 203: Collection and analysis of data from physiological data modules (105) and environmental data (106) Block 204: User adaptation (Synchronization Module, 104) Block 205: Combining data from different sources (Data Fusion Module, 107)

[0051] Block 205a: Personalization and User Collaboration (Modules 111 and 113)

[0052] Block 206: Creating audio and visual accompaniment (Content Generation Module, 108)

[0053] Block 206a: Integration, exchange and adaptation of data for external devices (Extension modules, 109, and integration with external devices, software)

[0054] Block 207: Outputting content to the user (Demo module, 114)

[0055] Block 208: Collecting Feedback (Feedback and Self-Learning Module, 112).

[0056] Implementation of the invention

[0057] The system is a modular architecture consisting of the following components:

[0058] Data Entry Module (101):

[0059] This module is responsible for receiving, processing and converting various types of input information coming into the system.

[0060] Main functions:

[0061] • Text input:

[0062] o Import electronic documents: Support for TXT, DOC, DOCX, PDF, EPUB, HTML and other formats to obtain content from e-books, articles, web pages and documents.

[0063] o Processing structured and unstructured data: Ability to work with formatted text, tables, images, videos, lists, and other elements.

[0064] o Optical Character Recognition (OCR): Convert printed or handwritten text from images into text.

[0065] • Auditory input:

[0066] o Import audio documents: Support MP3, AAC, M4B, WAV, FLAG, WMA, Audible (AA, AAX), OGG Vorbis, AIFF, DSD and other formats to obtain content from audiobooks, articles, web pages and documents.

[0067] o Processing structured and unstructured data: Ability to work with formatted audio markup, tables, images, videos, lists, and other elements.

[0068] • Voice input:

[0069] o Real-time speech recognition and conversion of speech from one or more users into text by roles.

[0070] o Processing of pre-recorded audio in various formats (MP3, WAV, AAC, etc.) taking into account the role model. o Multilingualism and multitimbrality - Speech recognition in different languages, with different accents and voice tones, determination of gender and age.

[0071] Connections with other modules:

[0072] • Transfers received data (text, voice) to the pre-processing module (102) for further normalization and preparation for analysis.

[0073] Data preprocessing module (102):

[0074] The module normalizes, cleans, and prepares input data (text and voice) for further analysis by other system components. Key functions:

[0075] • Processing text data:

[0076] o Language identification: Determination of the language of the text for subsequent adaptation of processing.

[0077] o Text normalization: Removing unnecessary characters, correcting errors, and standardizing the text. o Tokenization and lemmatization: Dividing the text into words and sentences, returning words to their original form to simplify analysis.

[0078] • Voice data processing:

[0079] o Noise Filtering: Remove background noise to improve audio quality.

[0080] o Speech segmentation: Dividing the speech stream into phrases and sentences. o Voice parameterization: Extracting voice characteristics (frequency, energy, spectral parameters) for further analysis.

[0081] Connections with other modules:

[0082] • Receives data from the data input module (101).

[0083] • Transfers cleaned and normalized (information from which unnecessary characters have been removed, typos and inconsistencies have been corrected, brought to a single standard (for example, standardized text, standardized numerical values ​​of physiological indicators, a single language, text encoding, absence of typos, noise, etc.)) data to the analysis and interpretation module (103) for semantic and emotional processing.

[0084] Data Analysis and Interpretation Module (103):

[0085] The module is designed for in-depth analysis of input information, including text and voice data, in order to extract semantic, contextual, and emotional characteristics.

[0086] Main functions:• Semantic analysis:

[0087] o Identifying the topic and context of data using natural language processing (NLP) algorithms.

[0088] o Recognizing entities (e.g., characters, places, events) and establishing relationships between them.

[0089] • Emotional analysis:

[0090] o Determining the emotional tone of a text or speech (positive, negative, neutral).

[0091] o Identifying complex emotional states such as joy, sadness or anger.

[0092] • Parametric speech analysis:

[0093] o Analysis of intonation, melody and rhythm of speech to identify key accents.

[0094] o Measuring speech rate and pauses to assess the speaker's condition and adapt the system.

[0095] Connections with other modules:

[0096] • Receives cleaned data from the pre-processing module (102).

[0097] • Transfers analyzed data (data processed to highlight key features such as context, emotional coloring, semantic accents, speech parameters, visual elements, etc.) to the synchronization module (104) for further use in generating personalized content.

[0098] Synchronization module (104):

[0099] The module is designed to ensure precise synchronization of audio and visual accompaniment with user actions, adapting the system's operating speed to individual needs.

[0100] Main functions:

[0101] • Tracking the pace of interaction:

[0102] o Analysis of the user's reading, listening, speech, or action rate. o Prediction of next steps and statements using machine learning algorithms.

[0103] • Processing of environmental data and physiological indicators:

[0104] o Taking into account external factors (lighting, noise) and the user's state (stress, fatigue) to adapt content.

[0105] • Synchronization of user actions:

[0106] o Using data from text, speech, physiological parameters, and the external environment to fine-tune audio and visual accompaniment.

[0107] o Implementation of dynamic changes in the tempo and style of content in real time.

[0108] Relationships with other modules:• Receives data from:

[0109] o Data Analysis and Interpretation Module (103): Transmits text and speech analysis results, including semantics and emotional tone. o Environmental Data Module (106): Environmental parameters, such as light or noise levels.

[0110] o Physiological Data Module (105): User biometrics (e.g., heart rate, temperature). • Transmits data to:

[0111] o Data Merger Module (107): Transfer of synchronized parameters for integration with other data and formation of a single user profile.

[0112] Environment Data Module (106) and Physiological Data Module (105) These modules are designed to collect, analyze, and transmit information about the user's state and environment, allowing the system to adapt content and interaction based on the user's physiological characteristics and external conditions.

[0113] Main functions:

[0114] Physiological Data Module (105):

[0115] • Biometric data:

[0116] o Measuring heart rate, breathing rate and body temperature through wearable devices.

[0117] o Skin conductivity analysis to assess stress and emotional arousal levels.

[0118] o Facial expression and gesture recognition:

[0119] o Facial expression analysis to determine emotional state. o Pose and movement detection to assess the user's overall response. o Using eye tracking to monitor attention.

[0120] Environment Data Module (106):

[0121] • Environmental sensors:

[0122] o Measuring light and noise levels to adjust audio and visual effects.

[0123] o Taking into account location and time of day to adapt content.

[0124] • Integration with smart devices:

[0125] o Receiving data from smart home systems (e.g. temperature sensors, lighting).

[0126] o Using data from mobile and wearable devices to adapt the system.

[0127] Connections with other modules:

[0128] 1. Input data:• Physiological data module (105):

[0129] o Receives biometric data from wearable devices (smart watches, fitness bracelets, medical sensors).

[0130] o Receives data on facial expressions, postures and movements through cameras and sensors (eye tracking, facial expression recognition).

[0131] • Environment Data Module (106):

[0132] o Receives data from environmental sensors (illumination, noise). o Uses geolocation and time of day APIs to determine the context of system usage.

[0133] o Connects to smart home systems and external devices to receive parameters (temperature, lighting).

[0134] 2. Outgoing data:

[0135] • Physiological data module (105):

[0136] o Transmits biometric data (heart rate, stress level) to the Synchronization Module (104): To adjust the tempo and emotional tone of the accompaniment.

[0137] • Environment Data Module (106):

[0138] o Transmits environmental parameters (illumination, noise level, time of day) to the Synchronization Module (104): To adapt content to external conditions.

[0139] Data Fusion Module (107)

[0140] The data fusion module is designed to integrate and analyze data from various sources (physiological, environmental, text, audio, and others) to create a single user profile that is used by the system to personalize content and tailor interactions.

[0141] Main functions:

[0142] • Integration of data from different sources:

[0143] o Combining text, audio, visual, physiological and environmental data.

[0144] o Applying weighting factors to determine the importance of each type of data depending on the context.

[0145] • Formation of a user profile:

[0146] o Remembering user reactions and preferences.

[0147] o Applying personal settings

[0148] o Formation of a personalized profile to predict future needs and adapt the system.

[0149] • Taking into account collective preferences:

[0150] o Analyzes the settings and preferences of other users from the collaboration module (113) to adapt shared content.

[0151] • Profile Correction: Takes into account user feedback from the module (112) to understand what the user likes and dislikes and make adjustments.

[0152] Connections with other modules:

[0153] • Receives data from:

[0154] o Synchronization Module (104): Synchronized parameters (parameters adjusted according to the rhythm, tempo, or actions of the user, including voice, eye movements, text scrolling, physiological indicators, etc.) of the user's interaction.

[0155] o Personalization and Settings Module (111): System settings. o Feedback and Self-Learning Module (112): Takes into account user feedback.

[0156] • Transfers data to:

[0157] o Audio and visual accompaniment generation module (108):

[0158] Integrated data (the results of combining data from various sources (text, audio, physiological sensors, environmental data) into a single model or profile for comprehensive analysis and personalization) to create adaptive and personalized content.

[0159] Audio and visual accompaniment generation module (108)

[0160] The audio and visual accompaniment generation module (108) is designed to create personalized content, including audio, visual effects, animations, and images. Using advanced artificial intelligence (AI) and neural network technologies, the module adapts the generated content to the user's emotional state, current context, and selected thematic styles. The generated content is synchronized with text, plot, or other input data and can be transmitted and played on various devices (e.g., VR / AR headsets, screens, audio systems, and mobile devices). Key features:

[0161] • Creation of personalized content:

[0162] o Creating musical accompaniment: The module creates background music that matches the plot, tempo of the text, emotional coloring, and genre of the content.

[0163] o Sound effects generation: Includes natural sounds, noises, voice effects, footstep sounds, collisions, as well as specific effects for various plot events (e.g. sounds of battle, machines, magic).

[0164] o Text-to-speech: The module converts text data into synthesized speech, adapted in timbre and style to the user's preferences and the nature of the content. o Spatial audio: Creates SD sounds and spatial audio accompaniment, enhancing the effect of presence in the content.

[0165] o Audio and visual accompaniment generation: Uses neural network algorithms such as generative adversarial networks (GANs) and recurrent neural networks (RNNs), among others, to create personalized music compositions, visual and audio effects, and animations.

[0166] o Contextual Adaptation: Creates content that takes into account the emotional background, theme, genre, plot, meaning of the text (and other data) or speech, as well as the user's current physiological data and environmental data.

[0167] o Animation and graphics creation: Generates animated scenes, illustrations, and dynamic effects that match the story and style.

[0168] o Visualization of Story Events: Creation of graphic elements that reflect key moments in text or audio (e.g., visualization of forests, urban scenes, magical effects). o Device Adaptation: Visual effects and animations are optimized for various devices, including VR / AR headsets, smartphone screens, projectors, and smart home systems.

[0169] o Image Creation: Generate static illustrations or slides that match the storyline and user preferences.

[0170] • Using theme presets:

[0171] o Library of Styles and Genres: Contains ready-made presets based on popular films, musical genres, and artistic styles.

[0172] o Select or Load Presets: Allows the user to select a preset from the library or load their own to create a unique accompaniment.

[0173] • Style transfer:

[0174] o Neural Style Transfer: Uses trained models to transfer characteristics of known content (such as a movie or music) to the material being created.

[0175] o Adaptive Wrap Adjustment: Adjusts the amount of wrap to suit user preferences.

[0176] • Personalization and training:

[0177] o Adjust Effect Intensity: Allows the user to adjust how much the selected style affects the generated content.

[0178] o Combine Styles: Allows you to mix multiple presets or styles to create unique content.

[0179] o Self-learning: Analyzes user reactions to created content and uses this information to improve future generation.

[0180] o Adaptive generation: Content automatically adapts to the user's physiological data (e.g., heart rate, breathing), as well as environmental parameters (e.g., lighting level, noise).

[0181] • Data Integration:

[0182] o Processing data from the system: Takes into account user settings, physiological data, environmental data, and feedback to accurately generate content.

[0183] o API integration: Passes parameters and content to external applications and devices through standardized interfaces. o Extensibility: Updates the preset library and learns from new data provided by users.

[0184] o Matching user preferences: Takes into account selected parameters such as style, genre, plot and intensity of accompaniment.

[0185] o Processing data from external devices: The module uses data received from smart home devices, wearables, VR / AR sensors, and other connected devices. o API for external applications: Generated content and parameters can be transferred to external systems for display or playback.

[0186] o Expandable library: The module allows you to add new styles, genres and thematic presets that can be uploaded by the user or developed by the system.

[0187] • Real-time content generation:

[0188] o Implementation of story adaptations: Content is generated in real time, synchronizing with text, voice, or user actions.

[0189] o Multimodal Integration: Simultaneous creation of audio and visual content for a fully immersive user experience.

[0190] Feedback: The module analyzes user preferences and responses to content to adapt subsequent generations. Links with other modules:

[0191] • Receives data from:

[0192] o Data Fusion Module (107): An aggregated user profile that includes physiological parameters, environmental data and preferences, settings, styles, including for collaboration, input data such as text (taking into account style, genre, semantic blocks, emotional tone, text composition, etc.), the user's rate of content consumption, and other data.

[0193] about Expansion Module (109)

[0194] • Passes data to: o User Audio Visual Accompaniment Demonstration Module (114): Generated audio and visual effects for playback to the user.

[0195] o External Device Integration Module (110): API for external systems: Integration with devices and applications for external demonstration.

[0196] Expansion module (109)

[0197] Purpose:

[0198] The extension module is designed to provide system flexibility by adding new features, integrating with external technologies, and adapting to changing requirements or user preferences. It also supports the connection of augmented and virtual reality devices to expand interaction capabilities.

[0199] Main functions:

[0200] • Expanded functionality:

[0201] o Modular architecture: allows the addition of new modules or functions without the need to change the basic structure of the system.

[0202] o Plugin and add-on support: Easy integration of additional plug-ins to expand the system's capabilities. • Integration with augmented and virtual reality devices:

[0203] o Connecting VR and AR devices: Ability to work with virtual reality (VR) headsets and augmented reality (AR) devices.

[0204] o Tracking and Interaction Support: Integrate motion, position, and object interaction data to create immersive experiences.

[0205] o Transmission of visual and audio content: Ensuring synchronization of generated content with VR / AR devices.

[0206] • Integration with external systems:

[0207] o Provides standardized interfaces (APIs) for connecting third-party devices, applications, or services.

[0208] o Support for integration with new standards and technology platforms.

[0209] o Ensuring compatibility with smart devices, IoT and other systems.

[0210] Connections with other modules:

[0211] • Receives and transmits data from external systems

[0212] • Receives and transmits data to: Audio and visual support generation module (108):

[0213] Module for integration with external devices (110)

[0214] Purpose:

[0215] The module is designed to facilitate system interaction with external devices, applications, and services. It provides standardized APIs that enable system integration with various platforms and expansion of its functionality.

[0216] Main functions:

[0217] • Integration with external applications and devices:

[0218] o Standardized APIs: Provide open interfaces for connecting external systems such as mobile applications, cloud services, smart home systems, and IoT devices.

[0219] o Support for multi-format interaction: Providing integration with various data transfer protocols (REST, WebSocket, gRPC, etc.).

[0220] • Data synchronization:

[0221] o Transfer of data between the system and external devices in real time.

[0222] o Ensuring correct processing of requests and responses to synchronize functionality.

[0223] Connections with other modules:

[0224] • Receives data from:

[0225] o Audio and visual support generation module (108) mathematically described generated content.

[0226] • Transfers data to:

[0227] o Display Module (114): Data for playback on external devices (e.g. color tone of external lighting of a room).

[0228] o To external devices

[0229] Personalization and Settings Module (111)

[0230] The personalization and settings module provides convenient user interaction with the system through a customized interface. It allows you to configure system parameters and enter user data to receive visual or auditory feedback. Key features:

[0231] • Intuitive user interface:

[0232] o Responsive design: The interface is automatically optimized for various devices, including smartphones, tablets, computers and specialized devices.

[0233] o Multilingual: Support for multiple languages, allowing users to choose the most convenient interface language.

[0234] • Availability:

[0235] o Support for users with special needs (e.g. voice control, large font, high-contrast themes) o Accessibility to assistive technologies for people with visual, hearing, or motor disabilities.

[0236] • Setting up system parameters:

[0237] o Allows users to set preferences for interactions, such as choosing the theme of visual accompaniment or audio effects.

[0238] o Setting the intensity and nature of personalization depending on preferences, etc.

[0239] • Entering user data:

[0240] o Ability to add user preferences, load presets and configure system operating parameters.

[0241] Connections with other modules:

[0242] • Receives data from:

[0243] o Collaboration Module (CMM): Information about shared styles and presets, artwork, themes, etc.

[0244] o From the user

[0245] • Transfers data to:

[0246] o Data Merge Module (107): User personal settings and preferences, as well as collaboration data.

[0247] Feedback and Self-Learning Module (112)

[0248] The module is designed to collect user feedback, analyze interactions, and automatically train the system based on the collected data. Its primary purpose is to improve the system's functionality, increase its accuracy, and adapt it to changing user needs.

[0249] Main functions:

[0250] • Collecting feedback:

[0251] o User interaction evaluation: Users have the opportunity to provide feedback on the system's performance (e.g., through ratings or comments), which helps identify gaps and areas for improvement. o User behavior monitoring: Analyze user actions (e.g., changing settings or choosing styles) to understand user preferences.

[0252] • Self-study:

[0253] o Adaptive Algorithms: Using machine learning to improve the accuracy and efficiency of a system based on user interaction.

[0254] o Model updating: Automatic adjustment of algorithms and system operating parameters depending on new information. • Trend analysis:

[0255] o Big Data Processing: Analyzes accumulated usage statistics to identify common preferences and trends among users.

[0256] o Prediction: Uses acquired data to predict user preferences and improve interactions.

[0257] Connections with other modules:

[0258] • Receives data from:

[0259] o Demonstration Module (114): User reactions to the played content, such as ratings or comments, actions.

[0260] • Transfers data to:

[0261] o Data Merge Module (107): Adjusted preferences and feedback analysis results to update the user profile.

[0262] Collaboration Module (113)

[0263] The collaboration module is designed to facilitate interaction between system users. It allows for sharing content, settings, and preferences, as well as synchronizing actions across multiple devices for shared reading or listening.

[0264] Main functions:

[0265] • Exchange settings and content:

[0266] o Social platform: Users have the ability to share their settings, created playlists, selected styles, and preferences with other system participants.

[0267] o Group Access: Provides the ability to share content within groups or specific users.

[0268] • Shared Reading and Listening: o Device Sync: Enables simultaneous content playback on multiple devices, allowing users to interact with the same content in real time.

[0269] o Interaction Management: One user acts as a leader, setting the pace of reading or listening, while others synchronize with him.

[0270] • Support for collective preferences:

[0271] o Aggregates the preferences of a user group, analyzes them, and uses the data to adapt content to general requirements. Links to other modules:

[0272] • Exchanges data with similar systems

[0273] • Transfers data to:

[0274] o Personalization and Settings Module (111): Information about collective settings and preferences to create a common interaction profile.

[0275] • Ensuring compatibility:

[0276] o Support for popular operating systems (Windows, macOS, Linux, iOS, Android).

[0277] Audio and visual demonstration module (114)

[0278] The module is responsible for displaying generated audio and visual accompaniment to the user. Content presentation is synchronized with the user's actions (reading, listening, speaking, pauses) and adapts to the current device settings and preferences.

[0279] Main functions:

[0280] • Audio accompaniment:

[0281] o Sound effects and music playback: Generate sounds that match the context of the text (e.g. background music, nature sounds, character voices, terrain noise).

[0282] o Volume and tone adaptation: The system changes the intensity of sounds based on the user's state (for example, reducing the volume when falling asleep or during pauses).

[0283] • Visual support:

[0284] o Graphic element output: Projections of images, animations, or videos that reflect the plot of the text (e.g. landscapes, action scenes, fantasy elements).

[0285] • Integration with VR / AR:

[0286] o 3D visualization: Virtual realization of text in three-dimensional space (for example, the user sees scenes from a book in virtual reality).o Interactive elements: The user interacts with objects supplemented with sounds and effects (for example, virtual doors that can be opened).

[0287] • Synchronization with the user's tempo:

[0288] o Response to pauses and the pace of reading, listening, and speech: The system monitors the speed of reading, listening, and speech, making smooth transitions between sounds and animations, and pausing effects during pauses.

[0289] o Voice response: The accompaniment automatically adapts to the user's speech (for example, it emphasizes important points in the text when reading or listening).

[0290] • Adaptation to device:

[0291] o Content optimization for various platforms: Adaptation of sound and image quality for devices (tablet, smartphone, computer, VR / AR headsets, projectors).

[0292] o Playback on connected devices: Works with speakers, smart lamps, monitors, and virtual reality headsets.

[0293] Below we consider the algorithm (see Fig. 2) for the operation of the interactive (immersive) audio and visual support system, based on the functional blocks of the system.

[0294] Step 1: Entering Data

[0295] Block 200: Receiving Input Data (Data Input Module, 101)

[0296] 1. The system accepts input data: text, audio or multimodal (e.g. text + voice).

[0297] • Data type: Raw text data, voice data, multimodal signals.

[0298] • Result: Data is passed to the pre-processing module. Step 2. Pre-processing

[0299] Block 201: Data Cleaning and Normalization (Pre-Processing Module, 102)

[0300] 1. Text data is recognized, cleared of unnecessary characters, errors are corrected, and normalized for analysis.

[0301] 2. Audio data is cleared of noise, segmented, and key speech parameters (tone, frequency, etc.) are identified.

[0302] • Data type: Cleaned and normalized text and voice data.

[0303] • Result: Data is ready for further analysis.

[0304] Step 3. Data analysis and interpretation

[0305] Block 202: Semantic and emotional analysis (Data Analysis Module, 103)1. Text data is analyzed to identify themes and key entities (characters, objects, places).

[0306] 2. Voice data is analyzed to determine emotional coloring (joy, anger) and intonation.

[0307] • Data type: Analyzed data including semantic, emotional and parametric characteristics.

[0308] • Result: Data is transferred to the synchronization module.

[0309] Step 4. Data Collection Synchronization

[0310] Block 203: Collection and analysis of data from physiological data modules (105) and environmental data (106)

[0311] 1. Physiological data collection (Physiological data module, 105):

[0312] Measuring biometric indicators:

[0313] • Wearable devices (such as smartwatches or fitness trackers) transmit data on heart rate, breathing rate, body temperature and skin conductivity.

[0314] • This data is used to determine the user's stress or emotional arousal level.

[0315] Analysis of facial expressions and movements:

[0316] • Cameras and sensors process facial expressions, postures and gestures to determine the user's emotional state and response.

[0317] Attention tracking:

[0318] • Eye tracking records the concentration and focal points of the gaze, which helps assess the user's level of attention.

[0319] • Data on the movement of the user and / or his device) records the state of the user.

[0320] 2. Environmental Data Collection (Environmental Data Module, 106):

[0321] Analysis of external factors:

[0322] • Light sensors detect light levels to adapt the visual effects of content.

[0323] • Noise sensors record the acoustic environment to adjust the volume and balance of audio.

[0324] Context of use:

[0325] • Geolocation and time of day are obtained via API to adapt content to the user's current conditions (e.g. morning or night mode).

[0326] Integration with smart devices:

[0327] • Information from smart home systems (such as temperature or lighting intensity) helps further customize content.

[0328] 3. Data processing and transmission:• Data from modules 105 and 106 are analyzed and structured for further content adaptation:

[0329] • Biometric parameters are used to determine emotional and physical state.

[0330] • Environment settings help you adjust the audio and visual accompaniment.

[0331] • After processing, the data is transferred to the Synchronization Module (104) for integration with user interaction.

[0332] Data type: Processed physiological data (heart rate, emotional state) and environmental data (lighting, noise, time of day).

[0333] Result: Data is sent to Step 204 for further synchronization with user interaction.

[0334] Block 204: Customization (Synchronization Module, 104)

[0335] 1. Data on the pace of reading, listening (eye tracking, text scrolling, user or user device movement in space) or speech is analyzed for synchronization.

[0336] 2. The emotional parameters of text and speech are compared with the pace of user interaction.

[0337] • Data type: Synchronized interaction parameters.

[0338] • Result: Parameters are passed to Step 205 of data integration. Step 5. Data Integration

[0339] Block 205: Combining data from different sources (Data Combination Module, 107)

[0340] 1. Synchronized parameters are combined with:

[0341] • Physiological data (heart rate, respiration, skin conductivity).

[0342] • Environmental sensors (lighting, noise).

[0343] • User settings and preferences (111).

[0344] • Information from other users via the collaboration module (113).

[0345] • Feedback from the user (112).

[0346] Data type: Integrated data (user profile taking into account all parameters).

[0347] Result: Data is passed to the content generation module.

[0348] Block 205a: Personalization and User Collaboration (Modules 111 and 113)

[0349] 1. Processing of individual user data:

[0350] Input data:

[0351] • User settings, including selected styles, topic preferences (genre, visual effects, audio), intensity of personalization, level of interaction with the system.• Data on language preferences, support for multilingualism and settings for special needs (contrast, large font, voice control).

[0352] Processing:

[0353] • Formation of a detailed user profile taking into account his preferences, loaded presets and dynamic changes.

[0354] • Support for adapting settings to different devices (smartphones, tablets, VR / AR headsets, desktop systems) to optimize user experience.

[0355] 2. Processing collective preferences:

[0356] Input data:

[0357] • Collective data from a group of users, including shared preferences, interaction tempo, device synchronization.

[0358] • Information about content uploaded and shared through the system's social platform.

[0359] Processing:

[0360] • Analysis of individual and group settings to create a single interaction profile.

[0361] • Aggregate data for sharing, including styles, sound effects, and timing options.

[0362] 3. Sync devices and share content:

[0363] Input data:

[0364] • Metadata about the synchronization status of devices, including mobile devices, desktop systems, projectors, VR / AR headsets.

[0365] • Settings for shared access to content, including management of the leading user.

[0366] Processing:

[0367] • Implementing shared reading, storytelling, or listening, including synchronizing pacing, story elements, and visual accompaniment.

[0368] • Support for simultaneous operation on multiple devices with adaptation for various platforms.

[0369] 4. Integration with external platforms and devices:

[0370] Input data:

[0371] • Compatibility options with popular platforms (Windows, macOS, Linux, iOS, Android), as well as external system APIs.

[0372] • Information from external devices such as smart home sensors, wearable gadgets, climate and light sensors.

[0373] Processing:

[0374] • Adaptation of data for cross-platform compatibility, including conversion of content parameters for synchronization with external systems.

[0375] • Creation of interfaces for managing the settings of external devices and receiving feedback from them. Joint configuration capabilities:

[0376] Input data:

[0377] • User presets uploaded directly or obtained through the collaboration platform.

[0378] • Dynamic content parameters, including real-time changes to styles and effects.

[0379] Processing:

[0380] • Combine multiple presets to create a unique style.

[0381] • Dynamic adjustment of parameters such as intensity, genre, plot, lighting effects, and integration of changes into the ongoing interaction process.

[0382] Data type:

[0383] • Personal and collective settings (theme, styles, language, etc.).

[0384] • Interaction metadata (synchronization state, device parameters).

[0385] • API data for integration with external devices and systems. Result:

[0386] • Formation of an adapted personal and collective interaction profile.

[0387] • Synchronized multimodal content for individual or shared use.

[0388] • Transfer of data to the merging module (107) and the integration module with external devices (110) for further processing and use.

[0389] Step 6. Content generation

[0390] Block 206: Creating audio and visual accompaniment (Content Generation Module, 108)

[0391] 1. Processing input parameters:

[0392] • The module accepts integrated data from previous steps, including text information, audio files, multimodal data, user physiological parameters (e.g. heart rate and skin conductivity), environmental data (illumination, noise) and user settings.

[0393] • Additionally, data coming from external devices, such as smart home systems, wearable gadgets, mobile devices, VR / AR headsets and other connected devices, is processed.

[0394] 2. Audio content generation:

[0395] • Musical compositions, sound effects, backgrounds and voice-overs are created, adapted to the theme, plot, style, genre, emotional background and pace of interaction.

[0396] • Utilizes generative neural network (GAN, RNN) technologies and parametric synthesis algorithms to generate unique audio content.• Supports multi-channel audio generation, including 3D audio, spatial effects, and dynamic volume control for a fully immersive audio experience.

[0397] 3. Generation of visual content:

[0398] • The module generates animations, graphics, video clips, 3D objects and dynamic lighting effects that match the plot twists, theme and emotional background of the data.

[0399] • Neural style transfer algorithms are used to adapt visual content to given parameters and style.

[0400] • Generated content is adapted for display on screens, projectors and VR / AR headsets, supporting immersive experiences.

[0401] 4. Support for multimodal elements:

[0402] • Audio and visual effects are simultaneously created and synchronized to form a comprehensive multimodal experience.

[0403] • Text information is integrated in the form of subtitles, explanations or accompanying descriptions that are relevant to the context.

[0404] 5. Extensibility and customization:

[0405] • It is possible to dynamically adjust the generation parameters, including the choice of styles, effect intensity, tempo and other characteristics.

[0406] • The user can add their own presets and styles that are integrated into the content generation process.

[0407] • The module allows you to combine several presets to create unique styles.

[0408] 6. Integration with external systems:

[0409] • Generate content parameters in an API-compatible format for transmission to external systems.

[0410] • Use of external data (e.g. geolocation, climate conditions, noise levels) to adapt audio and visual effects.

[0411] • Sending data for synchronization with music platforms, media centers and virtual reality applications.

[0412] 7. Realization of joint opportunities:

[0413] • Generate content for individuals or groups, including synchronizing multimodal effects for sharing.

[0414] • Create personalized notifications, system sounds, or other interaction elements.

[0415] • Integration with machine learning mechanisms for adapting content in real time based on user reactions. Result: Creation of personalized, immersive and thematic multimodal content (audio, video, animations, visual effects) adapted to the physiological, emotional and contextual data of the user, including individual preferences, shared settings, environmental data, with support for style transfer, preset combinations, API integration, adaptation to external devices and automatic learning to further improve user interaction

[0416] Stage 7. Integration with external devices and expansion of system capabilities

[0417] Block 206a: Integration, exchange and adaptation of data for external devices (Extension modules, 109, and integration with external devices, software)

[0418] 1. Preparing data for external devices:

[0419] From the Audio and Visual Accompaniment Generation Module (108):

[0420] • The generated content (audio and visual data) is converted into a format suitable for external devices such as VR / AR headsets, smart home devices or IoT.

[0421] From Extension Module (109):

[0422] • The possibilities of expanding the system’s functionality through additional plugins or new devices are being checked.

[0423] From the Synchronization Module (104):

[0424] • Content adjustments are made taking into account data about the user’s state and external conditions.

[0425] 2. Integration with external systems:

[0426] External Device Integration Module (110):

[0427] • Transfer of synchronized content parameters (e.g. brightness, volume, visual effects) to VR / AR devices, projectors or smart home systems.

[0428] • Synchronization with external APIs for correct transmission of audiovisual data in real time.

[0429] 3. Playing content on external devices:

[0430] Data for virtual and augmented reality (VR / AR) devices:

[0431] • Generation of 3D effects, interactive visualizations and spatial sound.

[0432] • Synchronization of user movements with content (e.g. through pose or gesture tracking).

[0433] Integration with smart devices:

[0434] • Change the ambient lighting or sound atmosphere in the room depending on the context of the content.

[0435] Connecting additional devices via standardized APIs:

[0436] • The ability to use new technologies or services that provide a unique user experience.

[0437] Data type: Synchronized audio and visual data adapted to the format of external devices, external and internal API. Result: VR / AR devices, IoT and other integrated systems receive, transmit and play content that matches the user's parameters, environment and interact with other systems via API.

[0438] Stage 7. Demonstration

[0439] Block 207: Outputting content to the user (Demo module, 114)

[0440] 1. Content is adapted to the user's device (smartphone, tablet, VR headset, projector, smart speaker, wearable device, TV, smart home devices).

[0441] 2. Playback parameters (volume, intensity, brightness, emotionality, speed, rhythm and other characteristics) are adjusted to external conditions (lighting, noise, time of day, weather, etc.).

[0442] • Data Type: Displayed content.

[0443] • Result: The user sees, hears, and feels adapted (immersive) accompaniment.

[0444] Stage 8. Feedback and self-learning

[0445] Block 208: Collecting Feedback (Feedback and Self-Learning Module, 112)

[0446] 1. The user leaves feedback on the quality of the content.

[0447] 2. The module analyzes reactions and preferences and updates the user profile.

[0448] • Data Type: Correction Information and Preferences.

[0449] • Result: Data is returned to the merge module (107) to update the profile and subsequently generate content.

[0450] In the above description, embodiments of the present invention are set forth for clarity with reference to specific functional circuits and blocks. However, it is understood that any suitable distribution of functionality between different functional circuits or blocks may be used without detriment to the present invention. For example, the illustrated functionality to be implemented by separate computers or blocks may be implemented by the same computer or block. Therefore, references to specific functional blocks or circuits should be considered only as references to suitable means for providing the described functionality, and not as an indication of a strict logical or physical structure of the system. The present invention may be implemented in any suitable form, including hardware, software, or any combination thereof.Although the present invention has been described in connection with certain embodiments, this should not be construed as limiting it to the specific form set forth herein. The scope of the present invention is limited only by the appended claims. Furthermore, although individual features may be included in different claims, they may possibly be effectively combined, and inclusion in different claims does not imply that the combination of features is impracticable and / or disadvantageous. Furthermore, the order of features in the claims does not indicate a specific order in which these features must be applied.

[0451] Taking into account the above, it can be concluded that the essential features of the claimed invention are not known from the prior art and ensure full compliance of the claimed invention with the patentability conditions of “novelty” and “inventive step”.

[0452] The claimed invention can be used in industry and social settings for interactive audio and visual support of technological processes and social events. Thus, the claimed invention satisfies the patentability requirement of "industrial applicability."

[0453] It follows that, in the opinion of the applicant, the claimed invention fully complies with the conditions of patentability.

Claims

Invention formula 1. An interactive audio and visual support system comprising a data input module capable of receiving test, audio and voice data, a data pre-processing module capable of processing test and voice data for language identification, text normalization, tokenization and lemmanization, noise filtering, speech segmentation, voice parameterization, a data analysis and interpretation module capable of in-depth analysis of test and voice data for semantic analysis, emotional analysis, parametric speech analysis, a synchronization module capable of precisely synchronizing audio and visual support with user actions, adapting the system's operating speed to individual needs for tracking the pace of interaction, processing environmental data and physiological indicators, synchronizing user actions, a physiological data module,configured to collect biometric data of the user, an environmental data module configured to collect data from sensors that specify environmental parameters, a data fusion module configured to integrate and analyze physiological data, environmental data, text and audio messages, in order to create a single user profile, an audio and visual support generation module configured to create personalized content including audio, visual effects, animations and images, an expansion module configured to connect other systems, an integration module with external devices configured to connect external devices, including augmented and virtual reality to expand interaction capabilities, a personalization and settings module configured to input user data to obtain visual or auditory feedback,a feedback and self-training module configured to collect feedback from users, analyze interactions, and automatically train the system based on the collected data, a collaboration module configured to ensure interaction between users of the system, an audio and visual accompaniment demonstration module configured to output generated audio and visual accompaniment to the user, wherein the data input module is connected to a pre-processing module, which is connected to a data analysis and interpretation module, which is connected to a synchronization module, which is connected to a data fusion module, which is connected to an audio and visual accompaniment generation module, which is connected to a module for demonstrating audio and visual accompaniment, and a module for integrating with external devices, which is connected to the module for demonstrating audio and visual accompaniment,which is connected to the feedback and self-learning module, which is connected to the data fusion module, the collaboration module is connected to the personalization and settings module, which is connected to the data fusion module, the physiological data module and the environmental data module are connected to the synchronization module, the expansion module is connected to the audio and visual accompaniment generation module.

2. A method for interactive audio and visual support using the system according to I. 1, including the steps of receiving input data, cleaning and normalizing data, semantic and emotional analysis, collecting and analyzing data from physiological and environmental data modules, adapting to the user, combining data from different sources, creating audio and visual support for integration, exchanging and adapting data for external devices, displaying content to the user, and collecting feedback.