Method and system for generating ai dynamic wallpaper based on music emotion features

CN122308997APending Publication Date: 2026-06-30SUPER SENSE DIGITAL TECHNOLOGY (DONGGUAN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SUPER SENSE DIGITAL TECHNOLOGY (DONGGUAN) CO LTD
Filing Date
2026-02-10
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing live wallpaper systems cannot dynamically adjust visual presentation based on the emotional characteristics of music. Music visualization tools lack artistry, have simplistic sound-image mapping relationships, and the generated live wallpapers cannot be adapted to the characteristics of dedicated display hardware.

Method used

An AI-powered live wallpaper generation method based on musical emotional features is adopted. Through multi-dimensional audio feature extraction, multi-modal emotion computing model and image generation model, combined with user feedback optimization, live wallpapers that match the emotions of music are generated and adapted to different display hardware characteristics.

Benefits of technology

It achieves multi-level emotional understanding, intelligent cross-modal mapping, generates highly artistic visual content, has precise hardware adaptation, high real-time and personalized satisfaction, and improves user satisfaction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122308997A_ABST
    Figure CN122308997A_ABST
Patent Text Reader

Abstract

This invention relates to an AI-powered dynamic wallpaper generation method and system based on musical emotional features, belonging to the interdisciplinary field of artificial intelligence and multimedia. Key features include: Multi-layered emotional understanding: Breaking through the limitations of traditional spectrum analysis, it extracts three dimensions of features—rhythm, tonality, and emotion—to more comprehensively capture musical emotions; Intelligent cross-modal mapping: Learning the non-linear relationship between sound and image through an attention mechanism, aligning with human cognitive patterns of sound-image association; High-artistic output: Generating hand-drawn visual content using conditional GANs to avoid mechanical repetition; Deep hardware adaptation: Optimized for wide color gamut and high-resolution hardware to ensure a superior user experience; Balancing real-time performance and personalization: Low end-to-end processing latency, supporting incremental learning based on user feedback to meet personalized user needs.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of interdisciplinary applications of artificial intelligence, specifically to the integration of computer vision, digital audio processing and multimodal machine learning technologies, and in particular to a method and system for generating dynamic wallpapers based on the emotional features of music. Background Technology

[0002] Live wallpapers, as a technology to enhance the user's visual experience, have been widely used in products such as smart TVs, display hardware, and mobile devices; however, existing technologies have significant limitations: 1. Superficial Emotional Understanding: Only analyzes low-level audio features such as amplitude and frequency, without understanding the semantic meaning of musical emotions, and cannot deeply match musical emotions; 2. Simple mapping relationships: mostly using linear mapping methods, lacking precise logical connections between sound and image emotions; 3. Lack of artistry: The visual effects are mechanically repetitive and lack creativity. Furthermore, some designs lack dynamism and cannot be synchronized with the music in real time. 4. Lack of hardware compatibility: The characteristics of display devices are not considered, and some solutions have complex processes and poor real-time performance, making it difficult to meet the actual needs of live wallpaper use. Summary of the Invention

[0003] This application aims to address the technical problems of existing dynamic wallpaper systems being unable to dynamically adjust visual presentation based on the emotional characteristics of music, the lack of artistry in music visualization tools, the simplistic sound-image mapping relationship, and the inability of generated dynamic wallpapers to adapt to the characteristics of dedicated display hardware, and to provide an AI dynamic wallpaper generation method based on the emotional characteristics of music.

[0004] This application employs the following technical means to solve the technical problem: An AI-powered live wallpaper generation method based on musical emotional features includes the following steps: Acquire audio signals and extract features from them, including multi-dimensional audio features such as rhythm features, tonality features, and emotional features; The multi-dimensional audio features are mapped to visual parameters using a multimodal emotion computing model; Based on the visual parameters, a dynamic wallpaper image is generated in real time using an image generation model; The generated live wallpaper will be adapted and displayed on the user's device.

[0005] Furthermore, in the step of acquiring the audio signal and extracting features from the audio signal, the extracted content includes multi-dimensional audio features such as rhythm features, tonality features, and emotional features, The rhythmic features include BPM value, rhythm intensity, and beat position; the tonality features include chromaticity features, tonality, and chord progressions; and the emotional features include spectral centroid, spectral roll-off point, spectral flux, and Mel frequency cepstral coefficients.

[0006] Furthermore, in the step of mapping the multi-dimensional audio features to visual parameters through a multimodal emotion computing model, The emotion mapping model employs a multilayer perceptron based on an attention mechanism, including: Feature encoding layer: Encodes audio features into high-dimensional feature vectors; Cross-modal attention layer: Calculates the correlation weights between audio features and visual features; Mapping layer: Maps the weighted features to the visual parameter space.

[0007] Furthermore, in the step of mapping multi-dimensional audio features to visual parameters through a multimodal emotion computing model, The visual parameters include: Color distribution parameters: primary color tone, secondary color tone, and distribution ratio; Texture parameters: texture complexity, texture direction, and texture density; Motion parameters: particle velocity, trajectory curvature, and frequency of change.

[0008] Furthermore, in the step of generating a live wallpaper image in real time using an image generation model based on the visual parameters, The dynamic wallpaper generation step employs a conditional generative adversarial network. Its generator is based on the U-Net architecture, receives the visual parameters as conditional input, and outputs a dynamic wallpaper image that matches the emotion of the music through upsampling and convolution operations.

[0009] Furthermore, after the step of generating the live wallpaper image in real time using an image generation model based on the visual parameters, It also includes a feedback optimization step, which is as follows: Obtain user preference data for live wallpapers through user interaction; The parameters of the sentiment mapping model are adjusted using an online gradient descent algorithm based on preference data; We optimized the subsequent dynamic wallpaper generation effect and added a variety of regular expressions to prevent pattern collapse.

[0010] Furthermore, in the step of adapting the generated live wallpaper to be displayed on the user's terminal, The hardware adaptation output on the user terminal includes optimization processing for the characteristics of the display hardware, and the optimization processing is as follows: Adjust the output image parameters according to the resolution and color gamut of the hardware display screen; Optimize rendering frame rate and image quality based on hardware processing capabilities; Adjust the scale of visual elements according to the hardware's suspension mode and the user's viewing distance.

[0011] A system for implementing the method as described in any one of claims 1-7, comprising: The audio feature extraction module is used to extract multi-dimensional audio features from the input audio. The emotion mapping calculation module is used to map audio features to visual parameters; The live wallpaper generation module is used to generate live wallpapers based on visual parameters; The hardware adapter output module is used to adapt the live wallpaper to the hardware display device.

[0012] A computer-readable medium storing instructions that, when executed by a processor, implement the AI ​​dynamic wallpaper generation method based on musical emotional features as described in any one of claims 1-7.

[0013] Beneficial effects 1. Multi-layered emotional understanding: Breaking through traditional spectrum analysis, it extracts three dimensions of features: rhythm, tonality, and emotion, to capture musical emotions more comprehensively; 2. Intelligent cross-modal mapping: Learns the non-linear relationship between sound and image through an attention mechanism to match the cognitive patterns of human perception of sound-image association; 3. High artistic output: Conditional GAN ​​generates hand-drawn visual content, avoiding mechanical repetition, with an audio-visual matching score of 8.3±0.8. 4. Deep hardware adaptation: Optimized for wide color gamut and high resolution hardware, color error ΔE<1.5, stable frame rate of 45-60fps, user satisfaction 8.7 / 10; 5. Real-time performance and personalization: End-to-end processing latency is 76±12ms (<100ms), supports incremental learning based on user feedback, and improves personalization satisfaction to 8.9 / 10. Attached Figure Description

[0014] Figure 1 This is an overall flowchart of the AI ​​dynamic wallpaper generation method based on musical emotional features of the present invention; Figure 2 This is a system architecture diagram of the present invention; Figure 3 This is a flowchart of the audio feature extraction process of the present invention; Figure 4 This is a schematic diagram of the emotion mapping model structure of the present invention; Figure 5 This is a flowchart of the dynamic wallpaper generation module of the present invention; Figure 6 This is a flowchart of the hardware adaptation and output optimization process of the present invention; Figure 7 Workflow diagram of the feedback optimization mechanism of this invention; Figure 8 This is a schematic diagram illustrating the display effect of the hardware of this invention. Detailed Implementation

[0015] It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to limit this application.

[0016] The technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0017] It should be noted that the terms "comprising," "including," and "having," and any variations thereof, in the specification, claims, and accompanying drawings of this application, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or apparatus that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or apparatuses. Terms such as "first" and "second" in the claims, specification, and accompanying drawings of this application, as well as relational terms, are used merely to distinguish one entity / operation / object from another entity / operation / object, and do not necessarily require or imply any such actual relationship or order between these entities / operations / objects.

[0018] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a mutually exclusive, independent, or alternative embodiment. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.

[0019] Reference Appendix Figures 1-8 This is an overall flowchart of an AI dynamic wallpaper generation method based on musical emotional features in one embodiment of this application; An AI-powered live wallpaper generation method based on musical emotional features includes the following steps: Acquire audio signals and extract features from them, including multi-dimensional audio features such as rhythm features, tonality features, and emotional features; The multi-dimensional audio features are mapped to visual parameters using a multimodal emotion computing model; Based on the visual parameters, a dynamic wallpaper image is generated in real time using an image generation model; The generated live wallpaper will be adapted and displayed on the user's device.

[0020] In this embodiment, in the step of acquiring the audio signal and extracting features from the audio signal, the extracted content includes multi-dimensional audio features such as rhythm features, tonality features, and emotional features, The rhythmic features include BPM value, rhythm intensity, and beat position; the tonality features include chromaticity features, tonality, and chord progressions; and the emotional features include spectral centroid, spectral roll-off point, spectral flux, and Mel frequency cepstral coefficients.

[0021] Specifically, The music app's dynamic wallpaper is linked to specific scenarios. When a user plays the dream-core style music "Piano and Rain's Whispers" in the app, the system extracts multi-dimensional features from the audio signal: In terms of rhythm features, it identifies a BPM value of 70, weak rhythm intensity, and evenly distributed beat positions, matching the music's soothing tone; in terms of tonality features, it extracts chromaticity features leaning towards A minor, stable tonality in A minor, and a chord progression of Am-E-Am, matching the melancholic and dreamy style; in terms of emotional features, it captures a low spectral centroid, a gentle spectral roll-off, low spectral flux, and a soft curve in the Mel frequency cepstral coefficients, corresponding to the music's tranquil and repressive emotional expression, providing accurate data support for subsequent visual mapping.

[0022] In this embodiment, in the step of mapping the multi-dimensional audio features to visual parameters through a multimodal emotion computing model, The emotion mapping model employs a multilayer perceptron based on an attention mechanism, including: Feature encoding layer: Encodes audio features into high-dimensional feature vectors; Cross-modal attention layer: Calculates the correlation weights between audio features and visual features; Mapping layer: Maps the weighted features to the visual parameter space.

[0023] Specifically, The app generates audio-visual synchronized scenes. When a user plays the Lofi-style music "Midnight Walk," the model completes the mapping through a multilayer perceptron: the feature encoding layer encodes the extracted audio features of "slow tempo, minor key, and low spectral flux" into high-dimensional vectors; the cross-modal attention layer calculates that the correlation weights between rhythm intensity and particle movement speed, and between emotional features and color distribution are the highest, focusing on the core correlation dimensions; the mapping layer transforms the weighted features into visual parameters: the main color is dark blue (H=240°), the texture complexity is 0.3, and the particle movement speed is slow, ensuring that the visual expression is highly consistent with the lazy and tranquil emotions of the music.

[0024] In this embodiment, in the step of mapping multi-dimensional audio features to visual parameters through a multimodal emotion computing model, The visual parameters include: Color distribution parameters: primary color tone, secondary color tone, and distribution ratio; Texture parameters: texture complexity, texture direction, and texture density; Motion parameters: particle velocity, trajectory curvature, and frequency of change.

[0025] Specifically, The app's aesthetic style is customized for each scene. When a user selects music from the "Weird Dream Core" playlist, the color distribution parameters are set to a primary color of deep purple and an auxiliary color of dark gray, matching the mysterious atmosphere of the dream core; the texture parameters use a texture complexity of 0.2, no obvious texture direction, and low texture density to present a blurry and hazy visual texture; the motion parameters are set to slow particle movement speed, gentle motion trajectory curves, and low change frequency, combined with rain sound samples and vinyl noise in the music to create a dynamic effect like "rainy night fog". If the user switches to the "Party High" theme music, the visual parameters are adjusted to a primary color of bright orange, a texture complexity of 0.8, and fast particle movement speed to match the cheerful and exciting emotions.

[0026] In this embodiment, in the step of generating a live wallpaper image in real time based on the visual parameters using an image generation model, The dynamic wallpaper generation step employs a conditional generative adversarial network. Its generator is based on the U-Net architecture, receives the visual parameters as conditional input, and outputs a dynamic wallpaper image that matches the emotion of the music through upsampling and convolution operations.

[0027] Specifically, The dynamic wallpaper generation process can be implemented within the app's core "Music-Wallpaper Linkage" function. Users click "Generate Music Wallpaper" in the app, selecting "Rainy Night City" (a dream-core + Lofi style) created from an image-to-song format. The conditional GAN ​​generator U-Net architecture receives visual parameters: a deep blue primary color, a texture complexity of 0.4, and smooth particle motion trajectories as input. Upsampling operations are used to amplify the image resolution, and convolution operations optimize details, generating a dynamic wallpaper: a dark-toned city night scene background, raindrop particles falling slowly to the music's beat, accompanied by subtle blurring and texture changes, and vinyl noise corresponding to the random flickering of the particles. The overall visual effect perfectly matches the melancholy and immersive atmosphere of the music, avoiding mechanically repetitive spectral visualization effects.

[0028] In this embodiment, after the step of generating a live wallpaper image in real time based on the visual parameters using an image generation model, It also includes a feedback optimization step, which is as follows: Obtain user preference data for live wallpapers through user interaction; The parameters of the sentiment mapping model are adjusted using an online gradient descent algorithm based on preference data; We optimized the subsequent dynamic wallpaper generation effect and added a variety of regular expressions to prevent pattern collapse.

[0029] Specifically, For multi-device usage scenarios, if the user uses the app on a mobile phone, the system adjusts the output image parameters: visual elements are concentrated in the central area of ​​the vertical screen, the main color contrast is enhanced to adapt to the characteristics of the mobile phone screen, and the rendering frame rate is stable at 60fps. If the user switches to a smart TV: in landscape mode, with a resolution of 3840×2160 and 98% DCI-P3 color gamut, and playing the same music, the system optimizes the resolution of image details, expands the visual elements to cover the entire screen, adjusts the horizontal distribution of particle motion trajectories, adapts to viewing distances of 1.5-3 meters, and ensures that dynamic effects can still be clearly perceived at a distance. If connected to an aesthetic display device supported by the app, the system adjusts the color parameters according to the wide color gamut characteristics of the display device, and controls the color error within ΔE<1.5 to match the artistic display needs of the display device.

[0030] In this embodiment, in the step of adapting the generated live wallpaper to be displayed on the user terminal, The hardware adaptation output on the user terminal includes optimization processing for the characteristics of the display hardware, and the optimization processing is as follows: Adjust the output image parameters according to the resolution and color gamut of the hardware display screen; Optimize rendering frame rate and image quality based on hardware processing capabilities; Adjust the scale of visual elements according to the hardware's suspension mode and the user's viewing distance.

[0031] Specifically, The app's audio-visual collaboration function operates on a complete workflow. When a user plays "Midnight Walk" by Dream Weaver on the app's "Explore" page: the audio feature extraction module extracts rhythm (BPM 65), tonality (A minor), and emotion (low spectral flux) features in real time; the emotion mapping calculation module maps these features to visual parameters through an attention mechanism: dark blue as the main color, texture complexity of 0.3, and slow particle movement speed; the dynamic wallpaper generation module uses conditional GAN ​​to generate a dynamic wallpaper of "city night scene + slow particle rain"; the hardware adaptation output module adjusts the image scale and frame rate according to the user's current tablet device: landscape mode, resolution 2048×1536; if the user feels the wallpaper color is too dark, they can select "increase brightness" through the app's "feedback" function. The feedback data is transmitted to the system, and when similar music is played subsequently, the emotion mapping calculation module automatically adjusts the color parameters to generate a brighter dynamic wallpaper, achieving a closed-loop process.

[0032] A system for implementing the method as described in any one of claims 1-7, comprising: The audio feature extraction module is used to extract multi-dimensional audio features from the input audio. The emotion mapping calculation module is used to map audio features to visual parameters; The live wallpaper generation module is used to generate live wallpapers based on visual parameters; The hardware adapter output module is used to adapt the live wallpaper to the hardware display device.

[0033] The core data for the app's audio-visual linkage. After a user generates and saves a dynamic wallpaper corresponding to "The Piano and the Rain" in the app, the storage medium saves the wallpaper's visual parameters: main color light blue, texture complexity 0.2, etc.; audio feature data, user preference records, such as brightness adjustment records and the current parameters of the emotion mapping model; when the user plays the music again after 3 days, the system directly calls the historical data from the storage medium, without needing to re-extract audio features and retrain the model, quickly generating a dynamic wallpaper that matches the user's preferences; if the user is in an offline environment, the preset model parameters and basic visual templates saved in the storage medium can still support the system to complete the basic dynamic wallpaper generation, ensuring a good user experience in weak network or no network scenarios, while saving device computing power consumption.

[0034] A computer-readable medium storing instructions that, when executed by a processor, implement the AI ​​dynamic wallpaper generation method based on musical emotional features as described in any one of claims 1-7.

[0035] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0036] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0037] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0038] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0039] Although embodiments of this application have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principles and spirit of this application, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A method for generating AI dynamic wallpapers based on the emotional characteristics of music, characterized in that, Includes the following steps: Acquire audio signals and extract features from them, including multi-dimensional audio features such as rhythm features, tonality features, and emotional features; The multi-dimensional audio features are mapped to visual parameters using a multimodal emotion computing model; Based on the visual parameters, a dynamic wallpaper image is generated in real time using an image generation model; The generated live wallpaper will be adapted and displayed on the user's device.

2. The method as described in claim 1, characterized in that, In the step of acquiring the audio signal and extracting features from the audio signal, the extracted features include multi-dimensional audio features such as rhythm features, tonality features, and emotional features. The rhythmic features include BPM value, rhythm intensity, and beat position; the tonality features include chromaticity features, tonality, and chord progressions; and the emotional features include spectral centroid, spectral roll-off point, spectral flux, and Mel frequency cepstral coefficients.

3. The method as described in claim 1, characterized in that, In the step of mapping the multi-dimensional audio features to visual parameters using a multimodal emotion computing model, The emotion mapping model employs a multilayer perceptron based on an attention mechanism, including: Feature encoding layer: Encodes audio features into high-dimensional feature vectors; Cross-modal attention layer: Calculates the correlation weights between audio features and visual features; Mapping layer: Maps the weighted features to the visual parameter space.

4. The method as described in claim 3, characterized in that, In the step of mapping multi-dimensional audio features to visual parameters through a multimodal emotion computing model, The visual parameters include: Color distribution parameters: primary color tone, secondary color tone, and distribution ratio; Texture parameters: texture complexity, texture direction, and texture density; Motion parameters: particle velocity, trajectory curvature, and frequency of change.

5. The method as described in claim 1, characterized in that, In the step of generating a live wallpaper image in real time using an image generation model based on the visual parameters... The dynamic wallpaper generation step employs a conditional generative adversarial network. Its generator is based on the U-Net architecture, receives the visual parameters as conditional input, and outputs a dynamic wallpaper image that matches the emotion of the music through upsampling and convolution operations.

6. The method as described in claim 1, characterized in that, After the step of generating the live wallpaper image in real time using an image generation model based on the visual parameters. It also includes a feedback optimization step, which is as follows: Obtain user preference data for live wallpapers through user interaction; The parameters of the sentiment mapping model are adjusted using an online gradient descent algorithm based on preference data; We optimized the subsequent dynamic wallpaper generation effect and added a variety of regular expressions to prevent pattern collapse.

7. The method as described in claim 1, characterized in that, In the step of adapting the generated live wallpaper to be displayed on the user's terminal... The hardware adaptation output on the user terminal includes optimization processing for the characteristics of the display hardware, and the optimization processing is as follows: Adjust the output image parameters according to the resolution and color gamut of the hardware display screen; Optimize rendering frame rate and image quality based on hardware processing capabilities; Adjust the scale of visual elements according to the hardware's suspension mode and the user's viewing distance.

8. A system for implementing the method as described in any one of claims 1-7, characterized in that, include: The audio feature extraction module is used to extract multi-dimensional audio features from the input audio. The emotion mapping calculation module is used to map audio features to visual parameters; The live wallpaper generation module is used to generate live wallpapers based on visual parameters; The hardware adapter output module is used to adapt the live wallpaper to the hardware display device.

9. A computer-readable medium storing instructions, characterized in that, When the instruction is executed by the processor, the AI ​​dynamic wallpaper generation method based on musical emotional features as described in any one of claims 1-7 is implemented.