Display method and system for music playback interface, and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By working collaboratively between the music client and the server, personalized customization of the music playback interface has been achieved, solving the problem of monotonous content in the existing interface and improving the user experience.

WO2026123710A1PCT designated stage Publication Date: 2026-06-18TENCENT MUSIC ENTERTAINMENT TECH (SHENZHEN) CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: TENCENT MUSIC ENTERTAINMENT TECH (SHENZHEN) CO LTD
Filing Date: 2025-07-30
Publication Date: 2026-06-18

Application Information

Patent Timeline

30 Jul 2025

Application

18 Jun 2026

Publication

WO2026123710A1

IPC: G06F9/451

AI Tagging

Application Domain

Execution for user interfaces

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

The existing music playback interface is simplistic and does not support user customization, resulting in a poor user experience.

⚗Method used

By providing a player-style settings interface in the music client, users can upload target images. The server separates the foreground and background of the image, generates foreground layer information, and displays the foreground image on the foreground layer, the background image on the background layer, and the song-related layers between the foreground and background layers in the music playback interface.

🎯Benefits of technology

It enables personalized customization of the music playback interface, increases the sense of depth and content richness, and enhances the user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN2025111541_18062026_PF_FP_ABST

Patent Text Reader

Abstract

The present application relates to the technical field of audio and video, and provides a display method and system for a music playback interface, and a storage medium. The music playback interface comprises a foreground layer, a background layer, and a song-related layer, and the method comprises: a music client provides a settings interface for a player style, and sends, via the settings interface, a target image uploaded by a user to a server side; the server side performs background-foreground separation processing on the target image to obtain first foreground layer information of the target image, and sends the first foreground layer information to the music client; the music client determines a first foreground image on the basis of the first foreground layer information and the target image; and the music client displays the target image in the background layer, displays the first foreground image in the foreground layer, and displays the song-related layer between the background layer and the foreground layer. In the method, personalized customization capability is provided to users to personalize a music playback interface, thereby enhancing the richness of the music playback interface.

Need to check novelty before this filing date? Find Prior Art

Description

Music playback interface display methods, systems, and storage media

[0001] This application claims priority to Chinese Patent Application No. 202411842287.2, filed on December 13, 2024, entitled "Method, System and Storage Medium for Displaying Music Playback Interface", the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of audio and video technology, and in particular to a method, system and storage medium for displaying a music playback interface. Background Technology

[0003] With the development of computer and network technologies, music clients have become increasingly popular, allowing users to play music anytime, anywhere. However, the music playback interface of a music client typically includes lyrics and artist information, making the content relatively basic. Summary of the Invention

[0004] This application provides a method, system, and storage medium for displaying a music playback interface, which can solve the problem of limited display content. The technical solution is as follows:

[0005] On one hand, a method for displaying a music playback interface is provided, which is applied to a music client. The music playback interface includes a foreground layer, a background layer, and song-related layers. The method includes:

[0006] The music client provides a player-style settings interface for sending user-uploaded target images to the server.

[0007] The server performs background and foreground separation processing on the target image to obtain the first foreground layer information of the target image, and sends the first foreground layer information to the music client;

[0008] The music client determines the first foreground image based on the first foreground layer information and the target image;

[0009] The music client displays the target image on the background layer, the first foreground image on the foreground layer, and the song-related layer between the background layer and the foreground layer.

[0010] In an alternative embodiment, displaying the first foreground image on the foreground layer includes:

[0011] The music client displays the first foreground image in the target area of the foreground layer, or displays the partially transparent portion of the first foreground image in the target area, wherein the target area covers the foreground area corresponding to the target image in the background layer.

[0012] In this way, the foreground image can completely cover the corresponding foreground area in the original image, thus avoiding the display of two foreground images.

[0013] In one alternative approach, the first foreground layer information includes mask information;

[0014] The music client determines the first foreground image based on the first foreground layer information and the target image, including:

[0015] The music client uses the mask information to perform blending processing on the target image to obtain the first foreground image.

[0016] In this way, by using mask information, not only is the amount of data transmitted between the server and the music client reduced, but the calculation method using mask information is also simple, enabling the music client to quickly obtain the foreground image.

[0017] In one alternative approach, the music client uses the mask information to perform blending processing on the target image to obtain the first foreground image, including:

[0018] The music client uses the values at each position in the mask information to determine the alpha value of the corresponding pixel in the target image, and combines the RGB (red, green, blue) values of each pixel in the target image with the alpha value to obtain the pixel value of each pixel in the first foreground image; or...

[0019] The music client uses the value of each position in the mask information to determine the alpha value of the corresponding pixel in the target image, combines the RGB value and alpha value of each pixel in the target image to obtain the pixel value of each pixel in the second foreground image, and uses the pixel value of each pixel in the second foreground image to enlarge the second foreground image to obtain the first foreground image.

[0020] In this way, using RGB values and alpha values to represent the first foreground image indicates that the pixel size of the first foreground image and the target image are the same, making it easier for the music client to match the target image and the first foreground image when displaying them.

[0021] In one optional approach, the server performs background and foreground separation processing on the target image to obtain first foreground layer information of the target image, including:

[0022] The server performs target type identification on the target image to obtain a type identification result; the server performs separation processing on the target image based on the type identification result to obtain first foreground layer information containing content of the target type; or,

[0023] The server inputs the target image into a U-Net (U-shaped network) to obtain first foreground layer information containing content of the target type. The U-Net is used to identify the content of the target type.

[0024] The target types include human types and / or animal types.

[0025] This provides multiple ways to obtain foreground layer information.

[0026] In an alternative approach, the method further includes:

[0027] If the proportion of the occlusion area to the area of the song-related layer exceeds a target threshold, the music client will display the foreground layer between the song-related layer and the background layer, wherein the occlusion area is the area of the song-related layer occluded by the foreground layer.

[0028] In this way, when the foreground image obscures the song-related layers significantly, the song-related layers are placed above the foreground layers to avoid affecting their display.

[0029] In an alternative approach, the method further includes:

[0030] The music client responds to a layer selection command by selecting one of the foreground layer, the background layer, and the song-related layer as the target layer.

[0031] The music client responds to the adjustment command for the target layer and adjusts the target layer accordingly.

[0032] This also allows for adjustments to the images within the layers and the positions of the layers, thereby enhancing the user experience.

[0033] In one alternative approach, the adjustment process includes adjusting the position of an image within a layer, resizing it, or adjusting its position between layers.

[0034] In an alternative approach, the method further includes:

[0035] In response to receiving an adjustment instruction for the target image, the music client performs the first operation indicated by the adjustment instruction on the target image in the background layer;

[0036] The adjustment commands include commands to adjust size, adjust position, or adjust style.

[0037] This also allows for adjustments to the image in the background layer, enhancing the user experience.

[0038] In an alternative approach, the method further includes:

[0039] In response to performing the first operation on the target image, the music client determines a second operation to be performed on the first foreground image, wherein the second operation corresponds to the first operation;

[0040] The second operation is performed on the first foreground image in the foreground layer.

[0041] In this way, after adjusting the image in the background layer, the foreground image is adjusted so that the images in the foreground layer and the images in the background layer do not conflict in display.

[0042] In one alternative approach, the song-related layer includes a record player icon and / or a song icon;

[0043] The method further includes:

[0044] The music client controls the rotation of the record player icon and / or song icon in the song-related layer.

[0045] In one alternative approach, the music playback interface further includes a user interface layer;

[0046] The method further includes:

[0047] The music client displays the user interface layer on top of the music playback interface.

[0048] By setting up a user interface layer, users can control the music playback interface, thereby improving the user experience.

[0049] In an alternative approach, the method further includes:

[0050] The music client drives the first foreground image in the foreground layer to change according to rhythm information, wherein the rhythm information is the rhythm information of the music currently being played by the music client.

[0051] In this way, the foreground image can also change in accordance with the rhythm information, enhancing the user experience.

[0052] In one alternative approach, the rhythm information includes one or more of sound loudness, frequency information, or timbre sensitivity information.

[0053] In an alternative embodiment, the method of displaying the first foreground image after the foreground layer includes:

[0054] The music client sends a first request to the server, wherein the first request includes the target image;

[0055] The server performs super-resolution processing on the target image to obtain a super-resolution image, and then performs background and foreground separation processing on the super-resolution image to obtain the second foreground layer information of the super-resolution image.

[0056] The server sends the second foreground layer information and the super-resolution image to the music client.

[0057] The music client determines the third foreground image based on the second foreground layer information and the super-resolution image;

[0058] The music client displays the super-resolution image on the background layer and the third foreground image on the foreground layer.

[0059] In this way, even when the image is not clear, the image sharpness can be adjusted to make the displayed image clearer.

[0060] In an alternative approach, the music client displays the super-resolution image on the background layer and displays the third foreground image before the foreground layer; the method further includes:

[0061] The music client responds to receiving an operation instruction for the target image by performing the operation indicated by the operation instruction on the target image, wherein the operation instruction includes one or more of a move instruction, a cropping instruction, or an editing instruction;

[0062] The music client performs operations corresponding to the operation on the super-resolution image and the third foreground image.

[0063] In this way, the image can be adjusted while its sharpness is being adjusted.

[0064] In one alternative approach, the music client provides a player-style settings interface for sending user-uploaded target images to the server, including:

[0065] The music client provides a player-style settings interface to offer super-resolution options after receiving the target image uploaded by the user.

[0066] In response to receiving the trigger command of the super-resolution option, the music client sends a second request to the server, wherein the second request includes the target image;

[0067] Before the server performs background and foreground separation processing on the target image, the method further includes: the server performing super-resolution processing on the target image;

[0068] After the server performs background and foreground separation processing on the target image, the method further includes: sending the super-resolution image to the music client;

[0069] The music client uses the super-resolution image as the target image.

[0070] In this way, even when the image is not clear, the image sharpness can be adjusted to make the displayed image clearer.

[0071] On the other hand, a display system for a music playback interface is provided, the display system including a terminal and a server;

[0072] The terminal is equipped with the music client described in the preceding aspect or any of the optional methods described in the preceding aspect;

[0073] The server is equipped with the server-side described in the preceding aspect or any of the optional methods of the preceding aspect.

[0074] In another aspect, this application provides a computer-readable storage medium storing at least one program instruction, which is loaded and executed by a processor to perform the operations performed by the method for displaying a music playback interface as described in the preceding aspect or any of the alternative methods of the preceding aspect.

[0075] The beneficial effects of the technical solutions provided in this application are:

[0076] The music client offers a user-uploaded image feature. The server separates the foreground and background of the uploaded image, allowing the music client to display the foreground image. In the music playback interface, the client displays the foreground image on a foreground layer, the user-uploaded image on a background layer, and song-related layers between the background and foreground layers. This provides users with personalized customization capabilities, allowing the music playback interface to display user-uploaded images. Furthermore, the foreground and uploaded images are displayed on different layers, increasing the sense of depth and enriching the content of the music playback interface. Attached Figure Description

[0077] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0078] Figure 1 is a flowchart of the method for displaying a music playback interface provided in an embodiment of this application;

[0079] Figure 2 is a schematic diagram of the player style settings interface provided in an embodiment of this application;

[0080] Figure 3 is a schematic diagram of the pixel values of the mask image provided in the embodiment of this application;

[0081] Figure 4 is a schematic diagram of the mask pattern provided in an embodiment of this application;

[0082] Figure 5 is a schematic diagram of adjusting the foreground image provided in an embodiment of this application;

[0083] Figure 6 is a schematic diagram of the kneading action marked with a shearing symbol provided in an embodiment of this application;

[0084] Figure 7 is a schematic diagram of the display interface of the music client during the image super-resolution process provided in the embodiment of this application;

[0085] Figure 8 is a schematic diagram illustrating the principle of image super-resolution provided in the embodiments of this application;

[0086] Figure 9 is a schematic diagram of the structure of the terminal provided in an embodiment of this application;

[0087] Figure 10 is a schematic diagram of the server structure provided in an embodiment of this application. Detailed Implementation

[0088] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.

[0089] With the development of computer and network technologies, music clients are becoming increasingly popular, allowing users to play music anytime, anywhere. During music playback, the music client displays a music playback interface, which includes lyrics and artist information. This interface is relatively basic and does not support user customization, resulting in limited flexibility.

[0090] Based on this, this application provides a method for displaying a music playback interface, which is a music playback interface within a music client. The music playback interface includes a foreground layer, a background layer, and song-related layers. In this method, the music client provides a player style settings interface. Through this settings interface, user-uploaded images can be uploaded to the server. The server separates the foreground and background of the image and returns foreground layer information to the music client. The music client uses the foreground layer information and the target image to determine the foreground image. The music client displays the target image on the background layer, the foreground image on the foreground layer, and the song-related layers between the background layer and the foreground layer. In this way, the music playback interface can not only display user-uploaded images but also display the foreground and background of the image in separate layers, resulting in a better sense of depth. This not only provides users with the possibility of customizing the content of the music playback interface but also enriches the content.

[0091] The execution entities of this method are a music client and a server. The music client is a music application installed on the user's terminal, which includes, but is not limited to, mobile phones, tablets, laptops, or desktop computers. The server is a program that interacts with the music client on the server side.

[0092] The music client includes a music playback interface, which is the interface displayed when music is played. This interface includes a foreground layer, a background layer, a song-related layer, and a user interface (UI) layer. The background layer is placed at the bottom, the background layer and the song-related layer are placed in the middle, and the UI layer is placed at the top. Optionally, the song-related layer is used to display content related to the song and / or the record player, such as the album art, the artist's image, or a related image of the song. In one example, the song-related layer includes a record player icon and / or a song icon. The record player icon is also called a record icon, which is the icon of a vinyl record. The song icon can be an album art, an artist's image, or a related image of the song, which can be an image from the song's audio. The album icon can be displayed in the center of the record icon.

[0093] The following describes the process of displaying the music playback interface, as shown in Figure 1. The processing steps of this method are described in steps 101 to 104.

[0094] Step 101: The music client provides a player-style settings interface for sending the target image uploaded by the user to the server.

[0095] In this embodiment, when the music playback interface is displayed, it includes settings options. Users can trigger these settings options to customize the music playback interface. The music client will then redirect to a transition screen, which includes player-style settings options, sharing options, a timer to close, and song information. Users can click on the player-style settings option, and the music client will then redirect to the next settings screen. This next settings screen includes options such as viewing the currently used music playback interface, custom options, vinyl options, or classic options. Users can click on the view option to redirect the music client to the player-style settings screen. Alternatively, the music playback interface may display player-style settings options; clicking on these options will directly redirect the music client to the player-style settings screen.

[0096] As shown in Figure 2, the player's settings interface includes an image addition option. Users can click the image addition button to open their local photo album, where they can select an image, referred to as the target image. The music client then sends the target image to the server.

[0097] Optionally, after selecting a target image, the user can adjust it here. For example, they can adjust the size of the target image to better match the music playback interface, or adjust its position so that the foreground is displayed in the desired location (e.g., where the record player icon is), or adjust the style of the target image. Style refers to the specific characteristics and expressive techniques exhibited in visual arts, including color, composition, lines, texture, and overall atmosphere. Style can reflect an artist's personal style, cultural background, historical period, or specific art movement. For example, styles include, but are not limited to, cartoon style, traditional Chinese style, anime style, or pixel art style. After the user completes the adjustments, the music client sends the adjusted target image to the server.

[0098] Here, when the music playback interface is displayed, the music is either in a playing state or a paused state. That is to say, we can set the music playback interface either during music playback or when no music is playing.

[0099] In an alternative approach, as shown in Figure 2, the settings options for song-related layers are displayed in the player style settings interface.

[0100] 1. This settings interface displays various styles of turntable baseboards (i.e., styles of vinyl baseboards), including but not limited to: no baseboard, no baseboard with stylus, and baseboard with stylus. In Figure 2, the vinyl baseboards from left to right are no baseboard, no baseboard with stylus, and baseboard with stylus. Users can click to select, and after selection, the music client displays the effect in this settings interface.

[0101] Optionally, image addition options and various styles for the phono stage can be displayed in a single line.

[0102] 2. This settings interface displays options for turntable color (i.e., vinyl record color) and vibrancy, which users can select.

[0103] 3. This settings interface displays various styles of record player icons (i.e., vinyl style), such as simple semi-transparent and complex semi-transparent. Users can click to select a style, and the music client will display the effect in this settings interface.

[0104] In an alternative approach, as shown in Figure 2, the player style settings interface also displays various English font styles and text formats, allowing users to select the text format and English font style for the music playback interface. Users can click to select. Text formats include, but are not limited to, left alignment and centering.

[0105] In one alternative approach, referring to Figure 2, the player style settings interface also displays various button combination styles, which users can click to select. These button combinations include playback control buttons, and include, but are not limited to, simple, rich, and complete styles. Simple displays fewer buttons than rich, and rich displays fewer buttons than complete.

[0106] In an alternative approach, as shown in Figure 2, the player style settings interface also displays various styles of playback control buttons, which users can select. These playback control buttons include, but are not limited to, pause, fast forward, rewind, or loop buttons.

[0107] In an alternative approach, as shown in Figure 2, the player style settings interface also displays various light effect styles, which users can select. Users can choose a light effect style when a solid color background is displayed. Alternatively, the light effect style option can be hidden from the settings interface after the user has selected to upload a target image.

[0108] In an alternative approach, referring to Figure 2, the player style settings interface also displays options for adjusting the blur and transparency of the target image added by the user. Blur indicates the degree of blur of the target image, and transparency indicates the degree of transparency of the target image. These adjustment options include, but are not limited to, progress bars.

[0109] In an alternative approach, as shown in Figure 2, the user can also select only one background image without displaying it in layers. In the options under "Background", the user can select a background color, add a background image, or select a background image provided by the music client.

[0110] In one alternative approach, as shown in Figure 2, which also displays the current style A, the adjustment effect is displayed in style A when the user selects to adjust it. After the user has finished setting the music playback interface, they can click the "Done" or "Use" option in the upper right corner, and the music client will return to display the music playback interface.

[0111] The content shown in Figure 2 is only an example, and the embodiments of this application are not limited. For example, only a part of the content in Figure 2 may be shown.

[0112] Step 102: The server performs background and foreground separation processing on the target image to obtain the first foreground layer information of the target image, and sends the first foreground layer information to the music client.

[0113] The first foreground layer information includes mask information, and may also include foreground position information. The mask information is also called a mask image, which is usually a grayscale image. The size of the mask image is the same as the size of the target image, meaning the pixel resolution is the same. In some cases, the value of each pixel in the mask image is 255 or 0, making the mask image black and white. See Figure 3(a) for pixel values. See Figure 4 for the mask image; the white area represents the foreground, and the black area represents the background. In other cases, to achieve a smoother transition in the mask image, the values of the pixels at the boundary between the background and foreground of the target image are not necessarily 255 or 0. See Figure 3(b) for pixel values. Here, the value of each pixel ranges from 0 to 255, where 0 represents black and 255 represents white. In another approach, white can also be represented by 1.

[0114] In this embodiment, after the server receives the target image, there are several ways to separate the foreground and background of the target image. Two feasible methods are provided below:

[0115] In Method 1, for different foreground types, the server uses different matting models to separate the foreground and background. The matting model can be U-Net, which can recognize some foreground types, or other neural network models. The server receives the target image, performs target type recognition on the target image to obtain the type recognition result, obtains the matting model corresponding to the type recognition result, inputs the target image into the matting model, and the matting model outputs the first foreground layer information. For example, if the target type includes human and / or animal types, and the type recognition result is human, then the matting model corresponding to the human type is obtained, the target image is input into the matting model, and the matting model outputs the first foreground layer information, which is the foreground layer information containing human type content.

[0116] Method Two: Considering the complexity and diversity of user-uploaded images, most commonly human and animal images, while matting models may perform well in recognizing human and animal images, they fail to correctly identify non-human scenes. Furthermore, for scenes with complex compositions and unclear foreground subjects (such as images containing both blurry human figures and objects), current matting models may not be able to correctly identify the foreground. Therefore, in this embodiment, an end-to-end U-Net can be used for foreground and background separation. The U-Net architecture includes an Encoder and a Decoder. The encoder is responsible for gradually reducing the resolution of the input image while extracting high-level semantic features. It typically consists of multiple convolutional and pooling layers. Each convolutional layer uses a 3x3 kernel, and the activation function is usually ReLU (rectified linear unit). Pooling layers are used to reduce the size of the feature map. The decoder is symmetrical to the encoder and is responsible for progressively upsampling the feature map to restore it to the same size as the original image. After each upsampling operation, the decoder concatenates the feature maps of the corresponding layers in the encoder to preserve more spatial information and details. Furthermore, a key feature of U-Net is Skip Connection, which directly connects feature maps of the same resolution in the encoder to those in the decoder. This connection method helps preserve low-level features, thereby improving the accuracy and precision of segmentation.

[0117] The server inputs the target image into U-Net, and U-Net outputs a matting mask image, which is the first foreground layer information. This first foreground layer information indicates foreground content including, but not limited to, the target type, such as non-human objects and other objects. The matting mask image is a type of mask image, but compared to a standard mask image, it has a smoother transition at the foreground and background boundary. Using U-Net here eliminates the need to distinguish between people and objects beforehand, shortening the foreground and background separation time and reducing the processing time for a single image to less than one second.

[0118] After obtaining the first foreground layer information, the server sends the first foreground layer information to the music client.

[0119] In one alternative approach, the mask image output by U-Net may exhibit jagged edges in certain situations. For example, there might be white borders around the edges of hair. If these white borders happen to be located in an area overlaid with song-related layers, they can negatively impact the user experience. Therefore, before obtaining the first foreground layer information, post-processing is performed on the first foreground layer information to improve the display effect in the music client. There are several post-processing methods, and two possible methods are provided below. These two methods can be used simultaneously or only one of them can be used.

[0120] 1. Filter out edges with low confidence based on a certain threshold.

[0121] First, confidence is calculated: the alpha value corresponding to the mask image output by U-Net is between [0,1]. The confidence can be the alpha value itself, or it can be an enhanced alpha value obtained in some way, such as through local contrast enhancement.

[0122] Next, edge filtering is performed: A suitable threshold is chosen, which can be an empirical value such as 0.3. For edges below the threshold, their opacity can be set to 0 (completely transparent), or their opacity can be reduced to a lower value so that they are less noticeable during edge compositing. Then, the updated alpha value is mapped to the mask image to obtain an updated mask image. In this way, the first foreground layer information includes the updated mask image.

[0123] 2. Soften edges based on bilateral filter.

[0124] Bilateral filtering is a non-linear filtering method that reduces noise while preserving edge information. It considers both the spatial distance and color similarity between pixels. Therefore, a bilateral filter has two main parameters: a spatial parameter (controlling the size of the filtering window) and a color parameter (controlling the weight of color similarity). The spatial and color parameters can be set empirically, such as a spatial parameter of 75 and a color parameter of 75.

[0125] The server applies a bilateral filter to the mask image to soften the edges, resulting in a filtered mask image. In this way, the first foreground layer information includes the updated mask image.

[0126] This post-processing is performed on the server side. In another implementation, the music client performs the same post-processing on the mask image.

[0127] The above explanation uses the example of the first foreground layer information including a mask image. In another implementation, the first foreground layer information includes the alpha value of each pixel in the target image. In this way, the music client does not need to determine the alpha value based on the mask image.

[0128] Step 103: The music client determines the first foreground image based on the first foreground layer information and the target image.

[0129] In this embodiment, the music client receives first foreground layer information. If the first foreground layer information includes a mask image, the music client uses the mask image to perform alpha blending on the target image to obtain the first foreground image.

[0130] In one alternative approach, each pixel of the image is represented using an RGBA (red, green, blue, alpha) value, where the value of A is called the alpha value. The alpha value ranges from 0 to 1, reflecting transparency. A value of 0 indicates complete transparency, a value of 1 indicates complete opacity, and values between 0 and 1 indicate partial transparency. Values closer to 0 are more transparent, and values closer to 1 are less transparent. The music client uses the pixel value of each pixel in the mask image to determine the alpha value of the corresponding pixel in the target image. The RGB values and alpha values of each pixel in the target image are combined to obtain the RGBA values of each pixel in the first foreground image. For example, in the mask image, if a pixel's value is 0, the alpha value of the corresponding pixel in the target image is set to 0; if a pixel's value is 255, the alpha value is set to 1; and if a pixel's value is a value P between 0 and 255, the alpha value is set to P / 255. In this way, the first foreground image and the target image have the same size, except that the background is either fully transparent or partially transparent. In this method, the first foreground image and the target image have the same size, resulting in better alignment during display.

[0131] In another alternative approach, the music client uses the value of each pixel in the mask image to determine the alpha value of the corresponding pixel in the target image. The RGB values of each pixel in the target image are combined with the alpha value to obtain the pixel values of each pixel in the second foreground image. Using the pixel values of each pixel, the second foreground image is magnified to obtain the first foreground image. In this way, the size of the foreground region in the first foreground image is larger than the size of the foreground in the target image.

[0132] When the first foreground layer information includes alpha values, the RGB values of each pixel in the target image are directly combined with the alpha values to obtain the pixel values of each pixel in the first foreground image. In this case, the first foreground image can also be magnified to obtain a magnified first foreground image.

[0133] When the first foreground layer information includes foreground position information, the music client uses this position information to obtain the pixel values of each pixel in the area indicated by the position information in the target image, thus obtaining the first foreground image. In this method, the first foreground image and the target image have different sizes, so it is necessary to establish a positional correspondence between the first foreground image and the target image so that the first foreground image occludes the foreground area in the background layer.

[0134] Step 104: The music client displays the target image on the background layer, displays the first foreground image on the foreground layer, and displays the song-related layers between the background layer and the foreground layer.

[0135] In this embodiment, the music client displays the target image on the background layer according to the pixel values of each pixel in the target image, displays the first foreground image on the foreground layer according to the pixel values of each pixel in the first foreground image, and displays the song-related layers between the background layer and the foreground layer, as shown in Figure 5(a). Furthermore, a preview diagram is provided at the top of Figure 2. After the user confirms that it can be used, they can click the "Use" option to redirect the music client to the music playback interface.

[0136] It should be noted that this can be done by directly displaying the target image, or by removing the foreground area (the foreground area is the area where the foreground is located in the target image) from the target image and then displaying it.

[0137] In an alternative approach, in step 104, to prevent the foreground of the target image in the background layer from being displayed, a first foreground image is used to occlude the foreground. If the first foreground image and the target image are the same size, they overlap. The music client then displays the partially transparent portion of the first foreground image in the target area of the foreground layer. This target area covers the corresponding foreground area of the target image in the background layer. The foreground area is the region where the foreground of the target image is displayed in the background layer. For example, if the mask image is black and white, the foreground area is the region displaying a specified pixel in the target image, where the specified pixel is the pixel with a pixel value of 255 in the mask image corresponding to the target image.

[0138] When the first foreground image only includes the foreground pixels in the target image, the music client displays the first foreground image in the target area of the foreground layer, and the target area covers the foreground area of the target image in the background layer.

[0139] In one alternative approach, in step 104, when displaying layers, the music client can directly display the song-related layer between the foreground layer and the background layer, or the music client can determine the area of the song-related layer obscured by the foreground layer; this area is called the obscuring area. If the proportion of the obscuring area to the area of the song-related layer is less than or equal to a target threshold, the music client displays the song-related layer between the background layer and the foreground layer. If the proportion of the obscuring area to the area of the song-related layer exceeds the target threshold, the music client displays the foreground layer between the song-related layer and the background layer. Here, the target threshold can be set based on experience, such as 85%, or the user can set the target threshold themselves.

[0140] In an alternative approach, to make it easier for users to manage the music playback interface, after displaying the layers in step 104, the music client also provides ways to adjust the positional relationship between layers or the images in the layers.

[0141] In one implementation, the music client also displays a UI layer, which is placed on top. The UI layer displays adjustment options for each layer, or has a single adjustment option, but the selection methods differ for different layers. Users can select a specific layer by triggering the adjustment options. The music client responds to the layer selection command, choosing one of the foreground layer, background layer, or song-related layer as the target layer. After selecting the target layer, the user can perform operations on it. Specifically, the user inputs adjustment commands for the target layer, and the music client adjusts the target layer accordingly. Assuming the image in the target layer is image A, this adjustment includes adjusting the position of image A within the target layer, adjusting the size of image A, or adjusting the positional relationship between the target layer and other layers. For example, the user can zoom in on image A, or move the target layer to the previous or next layer.

[0142] In another alternative approach, to facilitate user management of the music playback interface, after displaying the layers in step 104, adjustment options are displayed on the UI layer. These adjustment options are used to adjust the vertical position relationship between the foreground layer and the song-related layer. After the user triggers the adjustment options, if the foreground layer is displayed above the song-related layer, the foreground layer is moved below the song-related layer; if the foreground layer is below the song-related layer, the background layer is moved above the song-related layer.

[0143] In one example, as shown in Figures 5(a) and (b), the adjustment option changes according to the position of the foreground layer. For instance, if the foreground layer is above the song-related layer, the adjustment option is displayed as a downward arrow; when the user clicks the option, the music client displays the background image below the song-related layer. If the foreground layer is below the song-related layer, the adjustment option is displayed as an upward arrow; when the user clicks the option, the music client displays the foreground image above the song-related layer.

[0144] In another example, the adjustment option changes according to the position of the song-related layer. For instance, if the song-related layer is below the foreground layer, the adjustment option is displayed as an upward arrow; when the user clicks it, the music client displays the song-related layer above the background image. If the song-related layer is above the foreground layer, the adjustment option is displayed as a downward arrow; when the user clicks it, the music client displays the song-related layer below the foreground image.

[0145] In an alternative approach, as described above, the target image can be adjusted when a user uploads an image. To provide more convenient adjustment functionality, after step 104, the music client can display various adjustment buttons for the target image, such as buttons to adjust size, position, or style. If the user makes a certain adjustment to the target image, they click the corresponding adjustment button. After receiving the adjustment instruction, the music client executes the operation indicated by the instruction on the target image. This operation is called the first operation, and the music client displays the adjusted target image on the background layer. For example, if the target image is enlarged, the music client displays the enlarged target image on the background layer.

[0146] Optionally, after adjusting the target image, in order to make the first foreground image correspond to the target object, the first foreground image is also adjusted accordingly. The process is as follows:

[0147] After performing a first operation on the target image, the music client determines the operation to be performed on the first foreground image. This operation is called the second operation. The second operation corresponds to the first operation, ensuring that after performing the first operation on the target image, a corresponding operation can also be performed on the first foreground image. The music client performs the second operation on the first foreground image in the foreground layer. For example, if the first operation is to enlarge the target image, the foreground area will also be enlarged after the target image is enlarged. To ensure that the first foreground image covers the corresponding foreground area of the target image in the background layer, the first foreground image is also enlarged proportionally. Another example is if the first operation is to change the style of the target image. After changing the style of the target image, the first foreground image is changed to the same style.

[0148] Here, the second operation is performed directly on the first foreground image. In another process, the first foreground image is redefined using the first foreground layer information and the adjusted target image, and the redefined first foreground image is displayed in the foreground layer.

[0149] Optionally, resizing may not involve buttons, but rather gestures. For example, as shown in Figure 6, the settings interface displays a pinch-to-cut icon, allowing users to crop the displayed image using a pinch gesture. A "Done" option can also be displayed; after cropping, clicking the "Done" option will cause the music client to display the cropped image.

[0150] In one alternative approach, the music client controls the rotation of the turntable icon and / or song icon within the song-related layer. In one example, the rotation speed can be a default value; in another, the rotation speed can be configured by the user. For instance, referring to Figure 2, the player style settings interface displays a turntable speed adjustment option, allowing the user to select the turntable speed. Figure 2 shows three rotation speeds: slow, normal, and fast. This is merely an example, and this application embodiment does not limit the method of adjusting the rotation speed.

[0151] In one alternative approach, to enrich the content of the music playback interface, the music client can make the first foreground image change in accordance with rhythm information during music playback, which is the rhythm information of the currently playing music.

[0152] Optionally, rhythm information includes one or more of sound loudness, frequency information, or timbre sensitivity information.

[0153] When rhythm information includes sound loudness, where loudness refers to the volume of the sound, the music client determines the maximum or average loudness of the sound across different time segments in the currently playing music. Based on the correspondence between loudness ranges and jitter amplitudes, it determines the loudness range to which the maximum or average loudness belongs, and then determines the jitter amplitude corresponding to that loudness range. During music playback, when the music client reaches the corresponding time segment, it displays the first foreground image according to the jitter amplitude.

[0154] When rhythm information includes frequency information, the music client acquires the audio data of the currently playing music. After performing a Fourier transform on this audio data, frequency domain data is obtained. N (N greater than 1, such as N=16) spectral data sources are sampled from the frequency range sensitive to humans (e.g., 20Hz to 700Hz). These N spectral data sources are then analyzed, and the one with the highest current amplitude is extracted. The change value of this spectral data source (the difference between the current value and the previous value, which can be understood as the value at the previous time point) is then subjected to a visually magnified normalization process, such as magnifying the data from 0.35 to 0.7 to 0 to 1. This accumulated value of change is then used instead of the time parameter for coloring. The faster the change value, the faster the distortion of the first foreground image, thus achieving a sensitive change in the first foreground image following the fluctuating tempo of the music. Alternatively, this change value can be used as the amplitude of the particle fission effect in the image, causing the first foreground image to vibrate in sync with the music.

[0155] The process of sampling N spectrum data sources from a frequency band is as follows: the frequency band is divided into N-1 sub-frequency bands, so that N spectrum data sources can be sampled.

[0156] It should be noted that the reason for using variable values instead of directly using spectrum values is that a more intuitive and sensitive feedback is needed for the visual effect. However, spectrum values fluctuate within a stable range, while variable values have a more exaggerated effect than spectrum values. Therefore, using variable values here will result in a better visual effect and a better user experience.

[0157] When the rhythm information includes timbre-sensitive information, the music client obtains the audio data of the currently playing music and uses a timbre extraction model to extract the timbre data. From the timbre data, timbre-sensitive information at each playback time point is extracted, and the changes in the first foreground image are controlled according to this timbre-sensitive information.

[0158] Optionally, the change in the first foreground image can be understood as the presence of an animation effect.

[0159] In one alternative approach, the image used as the background of the music playback interface should ideally have sufficient clarity. Considering that user-uploaded images come from different sources and have varying qualities, when the image clarity is low, super-resolution processing can be applied to improve it, thereby enhancing the display effect of the music playback interface. There are two situations where super-resolution processing can be performed on the image, which will be explained below.

[0160] The first method: After step 104, the UI layer of the player's settings interface displays a super-resolution option. If the user believes the target image is not sharp enough, they can click this option. The music client sends a first request to the server, instructing the target image to undergo super-resolution and foreground / background separation processing. This first request includes the target image. The server performs super-resolution processing on the target image to improve its sharpness, resulting in a super-resolution image. The server then separates the background and foreground of the super-resolution image to obtain foreground layer information, referred to as the second foreground layer information. The server sends the second foreground layer information and the super-resolution image to the music client. Based on this information, the music client determines a third foreground image. The music client displays the super-resolution image on the background layer and the third foreground image on the foreground layer.

[0161] Additionally, if the user deems the target image clear enough, they can click "Use Options" or "Complete Options" to redirect the music client to the music playback interface, which will be the music playback interface that the user has currently set up.

[0162] Optionally, during the super-resolution processing, the user can also manipulate the target image. The player's style settings interface displays operation options. The user can click on these options, and the music client receives the operation instructions and executes them on the target image. These instructions include one or more of the following: moving, cropping, or editing. Moving refers to moving the target image's position; cropping refers to cropping the target image; and editing includes, but is not limited to, adjusting the target image's color or style. After the music client determines the third foreground image, it performs the same operation on both the super-resolution image and the third foreground image. Thus, operations on the target image also apply to the super-resolution image and the third foreground image, eliminating the need for secondary user intervention. For example, after cropping the target image, the corresponding super-resolution image is also cropped.

[0163] The second method involves selecting to perform super-resolution processing on the target image when the user initially uploads it. The process is as follows:

[0164] After a user uploads a target image through a player-style settings interface, a super-resolution option is provided. The user can click this option, and the music client, upon receiving the trigger command, sends a second request to the server. This second request instructs for super-resolution and foreground / background separation processing of the target image, and includes the target image itself. Upon receiving the second request, the server retrieves the target image. Before performing foreground / background separation processing, the server performs super-resolution processing on the target image to obtain a super-resolution image. The server then separates the background and foreground of the super-resolution image to obtain foreground layer information, referred to as the second foreground layer information. The server sends the second foreground layer information and the super-resolution image to the music client. Based on this information, the music client determines a third foreground image. The music client then displays the super-resolution image on the background layer and the third foreground image on the foreground layer.

[0165] Here, the separation process is described in step 102, and the process of determining the third foreground image is described in step 103. These will not be repeated here.

[0166] When the third foreground image is displayed later, it can also change according to the rhythm information, and the relationship between layers can also be adjusted. It can also adjust the super-resolution image. These processes are described in the previous section on the first foreground image, and will not be repeated here.

[0167] Optionally, during the super-resolution process, the music client can display either the original image or the super-resolution image. For example, as shown in Figure 7, original quality corresponds to the original image, and enhanced quality corresponds to the super-resolution image. Before enhancement, the original quality is displayed; during enhancement, the image in progress is displayed; and after enhancement, the user can choose to display either the original image or the super-resolution image. If the user selects original quality, the original image is displayed; if the user selects enhanced quality, the super-resolution image is displayed. This allows the user to stay informed about the super-resolution progress.

[0168] Alternatively, the server may choose not to send the second foreground layer information to the music server. When determining the third foreground image, the first foreground layer information can be used. However, since using a higher-resolution image for foreground and background separation can make the separation results more accurate, the second foreground layer information can still be sent.

[0169] Optionally, the super-resolution processing procedure is as follows:

[0170] The server uses a super-resolution model to perform super-resolution processing on the target image, obtaining the super-resolution image. As shown in Figure 8, the super-resolution model includes an image encoder, a diffusion denoising model, an adaptive enhancement model, and an image decoder. Before processing with the adaptive enhancement model, the image is also processed to predict the degree of blur and / or noise. The image encoder includes multiple convolutional layers, the adaptive enhancement model is an attention model including convolutional layers, and the diffusion denoising model can adopt any architecture, such as the U-Net diffusion denoising model. The decoding process of the image decoder is the inverse of the encoding process of the image encoder.

[0171] First, the training process of the super-resolution model is described. The training principle of the image encoder is as follows: low-quality images have different distributions than normal images. Using an encoder for normal images will mask the underlying features, thus losing the true expression of low-quality images. Therefore, it is necessary to fine-tune the parameters of the encoder for low-quality images.

[0172] The training principle of the adaptive enhancement model is as follows: In the real world, image degradation varies greatly. Using a uniform image enhancement module will not be enough to enhance strong degradation, resulting in the actual result still not meeting the sharpness requirements, or it will enhance weak degradation too much, resulting in bad cases. Therefore, by combining the adaptive enhancement model, we can deal with various user images and achieve a clear effect after enhancement.

[0173] The training principle of the Diffusion denoising model is as follows: Traditional image enhancement models can repair images with a certain degree of loss, but the improvement is not significant for images with substantial loss because they lack generative capabilities and cannot recreate the original appearance of the lost parts. In contrast, the Diffusion denoising method has inherent generative capabilities. Trained on large datasets, it achieves good generation results across various scenarios, making it ideal for supplementing the lost parts of low-quality images. Therefore, this application uses the Diffusion denoising model as the foundation for its generative capabilities. This part also undergoes weight updates, but these updates are performed using LORA (low-rank adaptation), which is more lightweight.

[0174] The super-resolution model employs an end-to-end design. It acquires high-resolution real-world images and uses processes such as blurring, noise addition, and image compression to construct low-quality images to simulate real-world low-quality images. These low-quality images are then input into the super-resolution model for training, yielding predicted images. Backpropagation is then used to update the parameters of the image encoder and the adaptive augmentation model, ensuring that the predicted images fit as closely as possible to the high-quality images paired with their corresponding low-quality images. Here, "pairing" refers to pairing a low-quality image with a high-quality image when the low-quality image was generated from a high-quality image.

[0175] Here, the device for training the super-resolution model can be a server with server-side software installed, or it can be another server. After training on another server, it is deployed to the server with server-side software installed.

[0176] The server performs image encoding on the target image to obtain an encoded result, which is a multi-dimensional vector. It also predicts the degree of blur and / or noise in the target image to obtain a prediction result, which indicates the degree of image degradation. Based on the prediction result, the server performs adaptive enhancement processing to obtain an enhanced result, which enhances the image according to the degree of degradation, resulting in good enhancement effects. The server inputs the encoded and enhanced results into a Diffusion denoising model to obtain the denoised result. The server then performs image decoding on the denoised result to recover the image, i.e., the super-resolution image.

[0177] In cases where faces or text are particularly small, the server performs the following processing to improve the quality of the super-resolution image:

[0178] The server performs image decoding on the denoised result to obtain an intermediate image. The server then detects faces and / or text in the intermediate image to determine their locations. The server inputs the target image into a GAN (generative adversarial networks) model to obtain its output. For the detected faces and text, the server fuses the detection result with the output to obtain the fused face and text portions. For other portions, the decoded image is used. This process yields the super-resolution image.

[0179] The fusion process here can be averaging pixel values, using only the pixel values of faces and text in the output, or weighting the pixel values of faces and text in the detection results and the output, etc.

[0180] In this embodiment of the application, a U-Net for foreground and background separation processing is also provided, and the training process is as follows.

[0181] First, a segmentation model (such as a convolutional neural network model in deep learning) is used to process the input image to identify the main subject area, which is the foreground data we need. This process may involve different tasks such as portrait segmentation and object segmentation. Then, the segmentation mask image is manually annotated to filter out the foreground regions that are considered suitable. The pixel values in the segmentation mask image are either 0 or 255, where 0 corresponds to complete transparency and 255 corresponds to complete opacity. However, to make the edges of the foreground regions smoother, the segmentation mask image is converted into a matting mask image. But manually annotating matting mask images is usually difficult, so here the segmentation mask image is input into a pre-trained model called Matting Anything to obtain the matting mask image. In this way, we obtain the image and the corresponding matting mask image, that is, we obtain the training samples.

[0182] Then, the initial architecture of U-Net is trained using training samples to make the output of U-Net close to the matting mask map in the training samples, so as to obtain the final U-Net.

[0183] Here, the device for training U-Net can be a server with the server-side software installed, or it can be another server. After training is completed on another server, it is deployed to the server with the server-side software installed.

[0184] This application also provides a music playback interface display system, which includes a terminal and a server. The music client is installed on the terminal, and the server is installed on the server.

[0185] It should be noted that in Figures 5 to 7, for ease of understanding, the foreground and background are marked, but in actual display, the words for foreground and background will not be displayed on the interface.

[0186] Figure 9 shows a structural block diagram of a terminal 900 provided in an exemplary embodiment of this application. The terminal 900 may be a smartphone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III), MP4 player (Moving Picture Experts Group Audio Layer IV), laptop computer, or desktop computer. The terminal 900 may also be referred to as a user device, portable terminal, laptop terminal, desktop terminal, or other names.

[0187] Typically, terminal 900 includes a processor 901 and a memory 902.

[0188] Processor 901 may include one or more processing cores, such as a quad-core processor or an octa-core processor. Processor 901 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). Processor 901 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, processor 901 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, processor 901 may also include an AI (Artificial Intelligence) processor, which is used to handle computational operations related to machine learning.

[0189] The memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In some embodiments, the non-transitory computer-readable storage media in the memory 902 is used to store at least one program instruction, which is executed by the processor 901 to implement the music playback interface display method provided in the method embodiments of this application.

[0190] In some embodiments, the terminal 900 may also optionally include a peripheral device interface 903 and at least one peripheral device. The processor 901, memory 902, and peripheral device interface 903 can be connected via a bus or signal line. Each peripheral device can be connected to the peripheral device interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes at least one of the following: a radio frequency circuit 904, a touch display screen 905, a camera 906, an audio circuit 907, a positioning component 908, and a power supply 909.

[0191] Peripheral device interface 903 can be used to connect at least one I / O (Input / Output) related peripheral device to processor 901 and memory 902. In some embodiments, processor 901, memory 902 and peripheral device interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of processor 901, memory 902 and peripheral device interface 903 can be implemented on separate chips or circuit boards, which is not limited in this embodiment.

[0192] The radio frequency (RF) circuit 904 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The RF circuit 904 communicates with communication networks and other communication devices via electromagnetic signals. The RF circuit 904 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals back into electrical signals. Optionally, the RF circuit 904 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, etc. The RF circuit 904 can communicate with other terminals through at least one wireless communication protocol. This wireless communication protocol includes, but is not limited to: metropolitan area networks (MANs), various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks (WLANs), and / or WiFi (Wireless Fidelity) networks. In some embodiments, the RF circuit 904 may also include circuitry related to NFC (Near Field Communication), which is not limited in this application.

[0193] Display screen 905 is used to display a UI (User Interface). This UI may include graphics, text, icons, videos, and any combination thereof. When display screen 905 is a touch display screen, it also has the ability to collect touch signals on or above its surface. These touch signals can be input as control signals to processor 901 for processing. In this case, display screen 905 can also be used to provide virtual buttons and / or a virtual keyboard, also known as soft buttons and / or a soft keyboard. In some embodiments, there may be one display screen 905, which serves as the front panel of terminal 900; in other embodiments, there may be at least two display screens 905, respectively disposed on different surfaces of terminal 900 or in a folded design; in still other embodiments, display screen 905 may be a flexible display screen, disposed on a curved or folded surface of terminal 900. Furthermore, display screen 905 may be configured as a non-rectangular, irregular shape, i.e., a non-rectangular screen. Display screen 905 may be made of materials such as LCD (Liquid Crystal Display) or OLED (Organic Light-Emitting Diode).

[0194] The camera assembly 906 is used to acquire images or videos. Optionally, the camera assembly 906 includes a front-facing camera and a rear-facing camera. Typically, the front-facing camera is located on the front panel of the terminal, and the rear-facing camera is located on the back of the terminal. In some embodiments, there are at least two rear-facing cameras, which are any one of a main camera, a depth-sensing camera, a wide-angle camera, and a telephoto camera, to achieve background blurring by fusion of the main camera and the depth-sensing camera, panoramic shooting by fusion of the main camera and the wide-angle camera, VR (Virtual Reality) shooting, or other fusion shooting functions. In some embodiments, the camera assembly 906 may also include a flash. The flash can be a single-color temperature flash or a dual-color temperature flash. A dual-color temperature flash refers to a combination of a warm-light flash and a cool-light flash, which can be used for light compensation at different color temperatures.

[0195] The audio circuit 907 may include a microphone and a speaker. The microphone is used to collect sound waves from the user and the environment, converting them into electrical signals that are input to the processor 901 for processing, or to the radio frequency circuit 904 for voice communication. For stereo sound acquisition or noise reduction purposes, multiple microphones may be used, each positioned at a different location on the terminal 900. The microphone may also be an array microphone or an omnidirectional microphone. The speaker is used to convert electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The speaker may be a conventional diaphragm speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, it can convert electrical signals not only into audible sound waves but also into inaudible sound waves for purposes such as distance measurement. In some embodiments, the audio circuit 907 may also include a headphone jack.

[0196] The positioning component 908 is used to determine the current geographic location of the terminal 900 in order to enable navigation or LBS (Location Based Service). The positioning component 908 can be a positioning component based on the US GPS (Global Positioning System), China's BeiDou system, Russia's Granas system, or the European Union's Galileo system.

[0197] The power supply 909 is used to power the various components in the terminal 900. The power supply 909 can be AC power, DC power, a disposable battery, or a rechargeable battery. When the power supply 909 includes a rechargeable battery, the rechargeable battery can support wired or wireless charging. The rechargeable battery can also be used to support fast charging technology.

[0198] In some embodiments, the terminal 900 further includes one or more sensors 910. The one or more sensors 910 include, but are not limited to: an accelerometer 911, a gyroscope 912, a pressure sensor 913, a fingerprint sensor 914, an optical sensor 915, and a proximity sensor 916.

[0199] Accelerometer 911 can detect the magnitude of acceleration along the three coordinate axes of a coordinate system established by terminal 900. For example, accelerometer 911 can be used to detect the components of gravitational acceleration along the three coordinate axes. Processor 901 can control touchscreen 905 to display the user interface in landscape or portrait view based on the gravitational acceleration signal acquired by accelerometer 911. Accelerometer 911 can also be used for games or for acquiring user motion data.

[0200] The gyroscope sensor 912 can detect the orientation and rotation angle of the terminal 900. The gyroscope sensor 912, in conjunction with the accelerometer sensor 911, can collect the user's 3D movements on the terminal 900. Based on the data collected by the gyroscope sensor 912, the processor 901 can perform the following functions: motion sensing (e.g., changing the UI based on the user's tilt), image stabilization during shooting, game control, and inertial navigation.

[0201] The pressure sensor 913 can be disposed on the side bezel of the terminal 900 and / or on the lower layer of the touch display screen 905. When the pressure sensor 913 is disposed on the side bezel of the terminal 900, it can detect the user's grip signal on the terminal 900, and the processor 901 can perform left / right hand recognition or quick operation based on the grip signal collected by the pressure sensor 913. When the pressure sensor 913 is disposed on the lower layer of the touch display screen 905, the processor 901 can control the operable controls on the UI interface based on the user's pressure operation on the touch display screen 905. The operable controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.

[0202] The fingerprint sensor 914 is used to collect the user's fingerprint. The processor 901 identifies the user's identity based on the fingerprint collected by the fingerprint sensor 914, or vice versa. When the user's identity is identified as trusted, the processor 901 authorizes the user to perform relevant sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, making payments, and changing settings. The fingerprint sensor 914 can be located on the front, back, or side of the terminal 900. When the terminal 900 has physical buttons or a manufacturer's logo, the fingerprint sensor 914 can be integrated with the physical buttons or manufacturer's logo.

[0203] An optical sensor 915 is used to collect ambient light intensity. In one embodiment, the processor 901 can control the display brightness of the touch screen 905 based on the ambient light intensity collected by the optical sensor 915. Specifically, when the ambient light intensity is high, the display brightness of the touch screen 905 is increased; when the ambient light intensity is low, the display brightness of the touch screen 905 is decreased. In another embodiment, the processor 901 can also dynamically adjust the shooting parameters of the camera assembly 906 based on the ambient light intensity collected by the optical sensor 915.

[0204] The proximity sensor 916, also known as a distance sensor, is typically located on the front panel of the terminal 900. The proximity sensor 916 is used to detect the distance between the user and the front of the terminal 900. In one embodiment, when the proximity sensor 916 detects that the distance between the user and the front of the terminal 900 is gradually decreasing, the processor 901 controls the touchscreen display 905 to switch from a screen-on state to a screen-off state; when the proximity sensor 916 detects that the distance between the user and the front of the terminal 900 is gradually increasing, the processor 901 controls the touchscreen display 905 to switch from a screen-off state to a screen-on state.

[0205] Those skilled in the art will understand that the structure shown in FIG9 does not constitute a limitation on the terminal 900, and may include more or fewer components than shown, or combine certain components, or use different component arrangements.

[0206] Figure 10 is a schematic diagram of a server structure provided in an embodiment of this application. The server 1000 can vary significantly due to different configurations or performance. It may include one or more central processing units (CPUs) 1001 and one or more memories 1002. The memories 1002 store at least one program instruction, which is loaded and executed by the processor 1001 to implement the music playback interface display method provided in the above-described method embodiments. Of course, the server may also have wired or wireless network interfaces, a keyboard, and input / output interfaces for input and output. The server may also include other components for implementing device functions, which will not be elaborated here.

[0207] In an exemplary embodiment, a computer-readable storage medium is also provided, such as a memory including program instructions that can be executed by a processor in a terminal or server to complete the display method of the music playback interface in the above embodiments. This computer-readable storage medium can be non-transitory. For example, the computer-readable storage medium can be a read-only memory (ROM), random access memory (RAM), compact disc read-only memory (CD-ROM), magnetic tape, floppy disk, and optical data storage device, etc.

[0208] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0209] In the embodiments of this application, "A and / or B" includes three cases: A, B, and A and B.

[0210] The above description is merely an optional embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A method for displaying a music playback interface, applied to a music client, characterized in that, The music playback interface includes a foreground layer, a background layer, and song-related layers; the method includes: The music client provides a player-style settings interface for sending user-uploaded target images to the server. The server performs background and foreground separation processing on the target image to obtain the first foreground layer information of the target image, and sends the first foreground layer information to the music client; The music client determines the first foreground image based on the first foreground layer information and the target image; The music client displays the target image on the background layer, the first foreground image on the foreground layer, and the song-related layer between the background layer and the foreground layer.

2. The method according to claim 1, characterized in that, Displaying the first foreground image in the foreground layer includes: The music client displays the first foreground image in the target area of the foreground layer, or displays the partially transparent portion of the first foreground image in the target area, wherein the target area covers the foreground area corresponding to the target image in the background layer.

3. The method according to claim 1 or 2, characterized in that, The first foreground layer information includes mask information; The music client determines the first foreground image based on the first foreground layer information and the target image, including: The music client uses the mask information to perform blending processing on the target image to obtain the first foreground image.

4. The method according to claim 3, characterized in that, The music client uses the mask information to perform blending processing on the target image to obtain the first foreground image, including: The music client uses the values at each position in the mask information to determine the alpha value of the corresponding pixel in the target image, and combines the RGB values of each pixel in the target image with the alpha value to obtain the pixel value of each pixel in the first foreground image; or... The music client uses the value of each position in the mask information to determine the alpha value of the corresponding pixel in the target image, combines the RGB value and alpha value of each pixel in the target image to obtain the pixel value of each pixel in the second foreground image, and uses the pixel value of each pixel in the second foreground image to enlarge the second foreground image to obtain the first foreground image.

5. The method according to any one of claims 1 to 4, characterized in that, The server performs background and foreground separation processing on the target image to obtain the first foreground layer information of the target image, including: The server performs target type identification on the target image to obtain a type identification result; the server performs separation processing on the target image based on the type identification result to obtain first foreground layer information containing content of the target type; or, The server inputs the target image into a U-Net to obtain first foreground layer information containing content of the target type; the U-Net is used to identify content of the target type. The target types include human types and / or animal types.

6. The method according to any one of claims 1 to 5, characterized in that, The method further includes: If the proportion of the occlusion area to the area of the song-related layer exceeds a target threshold, the music client will display the foreground layer between the song-related layer and the background layer, wherein the occlusion area is the area of the song-related layer occluded by the foreground layer.

7. The method according to any one of claims 1 to 6, characterized in that, The method further includes: The music client responds to a layer selection command by selecting one of the foreground layer, the background layer, and the song-related layer as the target layer. The music client responds to the adjustment command for the target layer and adjusts the target layer accordingly.

8. The method according to claim 7, characterized in that, The adjustment process includes adjusting the position or size of the image within the layer, or adjusting the position between layers.

9. The method according to any one of claims 1 to 8, characterized in that, The method further includes: In response to receiving an adjustment instruction for the target image, the music client performs the first operation indicated by the adjustment instruction on the target image in the background layer; The adjustment commands include commands to adjust size, adjust position, or adjust style.

10. The method according to claim 9, characterized in that, The method further includes: In response to performing the first operation on the target image, the music client determines a second operation to be performed on the first foreground image, wherein the second operation corresponds to the first operation; The second operation is performed on the first foreground image in the foreground layer.

11. The method according to any one of claims 1 to 10, characterized in that, The song-related layers include a record player icon and / or a song icon; The method further includes: The music client controls the rotation of the record player icon and / or song icon in the song-related layer.

12. The method according to any one of claims 1 to 11, characterized in that, The music playback interface also includes a user interface layer; The method further includes: The music client displays the user interface layer at the top of the music playback interface.

13. The method according to any one of claims 1 to 12, characterized in that, The method further includes: The music client drives the first foreground image in the foreground layer to change according to rhythm information, wherein the rhythm information is the rhythm information of the music currently being played by the music client.

14. The method according to claim 13, characterized in that, The rhythm information includes one or more of the following: sound loudness, frequency information, or timbre sensitivity information.

15. The method according to any one of claims 1 to 14, characterized in that, The method of displaying the first foreground image after the foreground layer includes: The music client sends a first request to the server, wherein the first request includes the target image; The server performs super-resolution processing on the target image to obtain a super-resolution image, and then performs background and foreground separation processing on the super-resolution image to obtain the second foreground layer information of the super-resolution image. The server sends the second foreground layer information and the super-resolution image to the music client. The music client determines the third foreground image based on the second foreground layer information and the super-resolution image; The music client displays the super-resolution image on the background layer and the third foreground image on the foreground layer.

16. The method according to claim 15, characterized in that, The music client displays the super-resolution image on the background layer and the third foreground image in front of the foreground layer. The method further includes: The music client responds to receiving an operation instruction for the target image by performing the operation indicated by the operation instruction on the target image, wherein the operation instruction includes one or more of a move instruction, a cropping instruction, or an editing instruction; The music client performs operations corresponding to the operation on the super-resolution image and the third foreground image.

17. The method according to any one of claims 1 to 16, characterized in that, The music client provides a player-style settings interface for sending user-uploaded target images to the server, including: The music client provides a player-style settings interface to offer super-resolution options after receiving the target image uploaded by the user. In response to receiving the trigger command of the super-resolution option, the music client sends a second request to the server, wherein the second request includes the target image; Before the server performs background and foreground separation processing on the target image, the method further includes: the server performing super-resolution processing on the target image; After the server performs background and foreground separation processing on the target image, the method further includes: sending the super-resolution image to the music client; The music client uses the super-resolution image as the target image.

18. A display system for a music playback interface, characterized in that, The display system includes a terminal and a server; The terminal is equipped with a music client as described in any one of claims 1 to 17; The server is equipped with the server-side software as described in any one of claims 1 to 17.

19. A computer-readable storage medium, characterized in that, The storage medium stores at least one program instruction, which is loaded and executed by a processor to perform the operation performed by the display method of the music playback interface as described in any one of claims 1 to 17.