Audio content publishing method, device, equipment, storage medium and program product

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By synthesizing and publishing first and second audio content in music applications, the limitations of user interaction and communication are solved, enabling collaborative creation and interaction among users and enhancing communication and competition.

CN122245281APending Publication Date: 2026-06-19TENCENT TECHNOLOGY (SHENZHEN) CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date: 2022-01-04
Publication Date: 2026-06-19

Application Information

Patent Timeline

04 Jan 2022

Application

19 Jun 2026

Publication

CN122245281A

IPC: G10L13/02; G10L13/08; G10H1/00; G06F40/30; G06F9/451

CPC: G10L13/02; G10L13/08; G06F40/30; G06F9/451; G10H2220/096; G10H2220/106; G10H2220/116; G10H2210/101

AI Tagging

Application Domain

Electrophonic musical instruments Semantic analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Acoustic device, control method for acoustic device, and program
JPWO2025134279A5Electrophonic musical instrumentsMusic aids
Acoustic signal processing method, acoustic signal processing device, and recording medium
US20260162642A1Electrophonic musical instruments Sound producing devices
Communication device, method of making the same and computer-readable medium
US20260164487A1Power management Electrophonic musical instruments
Generating audio using generative neural networks
CN122228545ATime continuityquality improvement Electrophonic musical instruments Speech recognition
System and method for detecting musical performance errors
US20260171051A1Electrophonic musical instruments AcousticsSpeech recognition

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122245281A_ABST

Patent Text Reader

Abstract

This application discloses a method, apparatus, device, storage medium, and program product for publishing audio content, relating to the fields of computer and internet technology. The method includes: displaying a creation interface corresponding to first audio content, the creation interface displaying relevant information about the first audio content; adding second audio content to the creation interface in response to a content addition operation; and displaying the publication result of the first created content in response to a content publication operation. This application provides a technical solution for creating and publishing audio content based on audio content published by others, enabling collaborative creation of audio content among users and enhancing communication and interaction between users. Compared to choral singing, the first and second audio contents in this application are two independent audio contents, allowing users to independently appreciate and evaluate the two audio contents for comparison, thus enhancing competition and communication among users.

Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application is a divisional application of the invention patent application filed on January 4, 2022, with application number 202210002252.2 and titled "Method, Apparatus, Device, Storage Medium and Program Product for Publishing Audio Content". Technical Field

[0002] This application relates to the fields of computer and internet technology, and in particular to a method, apparatus, device, storage medium, and program product for publishing audio content. Background Technology

[0003] Some music apps have the function of recording a user's own singing and publishing the recorded song for others to listen to.

[0004] Users can sing and record songs using the accompaniment provided in music apps. After recording, the music app will generate an audio file of the recorded song. Users can then publish the recorded song to personal communities and other places for others to listen to.

[0005] However, the aforementioned music apps offer relatively limited functionality and restrict user interaction. Summary of the Invention

[0006] This application provides a method, apparatus, device, storage medium, and program product for publishing audio content. The technical solution is as follows: According to one aspect of the embodiments of this application, a method for publishing audio content is provided, the method comprising: The creation interface corresponding to the first audio content is displayed. The creation interface displays relevant information about the first audio content, which is the already created audio content. In response to the content addition operation, audio data of a second audio content is added to the creation interface. The second audio content and the first audio content are two independent audio files. In response to the content publishing operation, the publishing result of the first created content is displayed, the first created content including the first audio content and the second audio content.

[0007] According to one aspect of the embodiments of this application, an audio content publishing apparatus is provided, the apparatus comprising: The creation interface display module is used to display the creation interface corresponding to the first audio content. The creation interface displays relevant information about the first audio content, which is the already created audio content. The audio creation module is used to add a second audio content in response to the content addition operation in the creation interface. The second audio content and the first audio content are two independent audio files. The creation and publishing module is used to respond to the content publishing operation and display the publishing result of the first created content, which includes the first audio content and the second audio content.

[0008] According to one aspect of the embodiments of this application, a terminal device is provided, the terminal device including a processor and a memory, the memory storing at least one instruction, at least one program, code set or instruction set, the at least one instruction, the at least one program, the code set or instruction set being loaded and executed by the processor to implement the above-described method for publishing audio content.

[0009] According to one aspect of the embodiments of this application, a computer-readable storage medium is provided, the storage medium storing at least one instruction, at least one program, code set or instruction set, the at least one instruction, the at least one program, the code set or instruction set being loaded and executed by a processor to implement the above-described method for publishing audio content.

[0010] According to one aspect of the embodiments of this application, a computer program product is provided, the computer program product including computer instructions stored in a computer-readable storage medium, and a processor reading from the computer-readable storage medium and executing the computer instructions to implement the above-described method for publishing audio content.

[0011] The technical solutions provided in this application embodiment may have the following beneficial effects: This application provides a technical solution for creating and publishing audio content based on audio content published by others. By adding audio data and text data of a second audio content to the creation interface corresponding to the first audio content, the first and second audio content are combined into a single creative content and published. This enables the collaborative creation of audio content between users and enhances communication and interaction between users.

[0012] Meanwhile, compared with choral singing, the first audio content and the second audio content in the technical solution provided in this application are two independent audio contents. Users can appreciate and evaluate the two audio contents independently to compare their merits and demerits, thereby enhancing competition and communication among users. Attached Figure Description

[0013] Figure 1 This is a schematic diagram of the implementation environment of a solution provided in one embodiment of this application; Figure 2This is a flowchart of an embodiment of an audio content publishing method provided in this application; Figure 3 This is a schematic diagram of a continuation writing interface provided in one embodiment of this application; Figure 4 This is a schematic diagram of the publishing result interface provided in one embodiment of this application; Figure 5 This is a flowchart of an audio content publishing method provided in another embodiment of this application; Figure 6 This is a schematic diagram of a content selection interface provided in one embodiment of this application; Figure 7 This is a schematic diagram of a listening interface provided in one embodiment of this application; Figure 8 This is a flowchart of an audio content publishing method provided in another embodiment of this application; Figure 9 This is a schematic diagram of a rewrite interface provided in one embodiment of this application; Figure 10 This is a flowchart of an audio content publishing method provided in another embodiment of this application; Figure 11 This is a schematic diagram of a content publishing interface provided in one embodiment of this application; Figure 12 This is a flowchart of an audio content publishing method provided in another embodiment of this application; Figure 13 This is a schematic diagram of a collaborative creation room provided in one embodiment of this application; Figure 14 This is a schematic diagram of a personal information interface provided in one embodiment of this application; Figure 15 This is a schematic diagram of a ranking interface provided in one embodiment of this application; Figure 16 This is a schematic diagram of an IM system interface provided in one embodiment of this application; Figure 17 This is a schematic diagram of the interface of a livehouse provided in one embodiment of this application; Figure 18 This is a flowchart of a "like" function provided in one embodiment of this application; Figure 19 This is a flowchart illustrating a search process provided in one embodiment of this application; Figure 20 This is a schematic diagram of a search interface provided in one embodiment of this application; Figure 21 This is a block diagram of an audio content generation apparatus provided in one embodiment of this application; Figure 22This is a block diagram of an audio content generation apparatus provided in another embodiment of this application; Figure 23 This is a schematic diagram of the structure of a terminal device provided in one embodiment of this application. Detailed Implementation

[0014] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.

[0015] Please refer to Figure 1 This diagram illustrates an implementation environment provided by one embodiment of the present application. This implementation environment can be implemented as an audio content processing system. The implementation environment may include: a terminal device 10 and a server 20.

[0016] The audio content processing system uses terminal device 10 and server 20 to realize functions such as audio content creation, publishing, and storage.

[0017] Terminal device 10 can be an electronic device such as a mobile phone, tablet computer, PC (Personal Computer), wearable device, in-vehicle terminal device, VR (Virtual Reality) device, and AR (Augmented Reality) device, etc., and this application is not limited thereto. A client application running the target application can be installed on terminal device 10. For example, the target application can be an audio content processing application or other application with audio content processing capabilities. Optionally, the target application is an application with audio content generation capabilities, such as a music application, social application, live streaming application, audio editing application, etc., and this application is not limited thereto.

[0018] The following description uses an application with audio content generation capabilities as an example. The client of the target application has the function of generating audio content. Users can generate audio content by uploading a series of data such as audio and lyrics, or create (such as continuing) audio content from others.

[0019] Server 20 can be a standalone physical server, a server cluster or distributed system consisting of multiple physical servers, or a cloud server providing cloud computing services. Server 20 can be the backend server for the aforementioned target application, providing backend services to the client of the target application. Server 20 is used to provide audio content to users, offering corresponding audio content based on user searches and interests, and can also store user-published audio content in a database.

[0020] Optionally, the terminal device 10 runs a client of the target application, and the server 20 can be a server for the target application. The target application can be a separately developed independent app, a mini-program, or a web application, etc., and this application does not limit this. Optionally, when the target application is a separately developed independent app, the type of audio content in the independent app can be rap, pop music, instrumental music, poetry recitation, etc. Optionally, when the independent app is a rap app, users can obtain rap audio content through the rap app. Simultaneously, users can publish audio content (e.g., a verse) or further create content from others' audio content (e.g., a cypher).

[0021] Please refer to Figure 2 This document illustrates a flowchart of an audio content publishing method provided in one embodiment of this application. The main body executing this method is... Figure 1 The terminal device 10 shown can be executed by a client of the target application, as each step can be performed. The method may include at least one of the following steps (210-230): Step 210: Display the creation interface corresponding to the first audio content. The creation interface displays relevant information about the first audio content, which is the already created audio content.

[0022] The first audio content is used for audio playback and includes at least one segment of audio data. Optionally, the first audio content is generated from text data and its corresponding audio data. The text data is composed of words, such as lyrics, poems, articles, etc. The audio data is generated from the audio data corresponding to the text data, such as songs, poems, recitations, rap, crosstalk, stand-up comedy, etc. Optionally, the first audio content is generated solely from audio data, and this application does not limit the composition of the first audio content.

[0023] The creation interface corresponding to the first audio content is a user interface for users to create content based on the first audio content. Optionally, creating the first audio content can be a continuation of the first audio content. A continuation refers to the user adding subsequent audio content to the first audio content based on relevant information and adding it after the first audio content. In this application, the creation embodiment of the first audio content is described using the continuation of the first audio content as an example. Optionally, creating the first audio content can also be a supplement to the preceding text of the first audio content. Supplementing the preceding text of the first audio content refers to the user adding subsequent audio content to the first audio content based on relevant information and adding it after the first audio content. This application does not limit this. In the embodiments of this application, creation refers to the creation of audio data of audio content. Optionally, the creation interface corresponding to the first audio content displays relevant information about the first audio content, such as, but not limited to, at least one of the following: the name of the first audio content, playback controls, tag information, text data, and audio data.

[0024] Step 220: In response to the content addition operation, add a second audio content in the creation interface. The second audio content and the first audio content are two independent audio files.

[0025] The content addition operation is performed by the user to add second audio content to the creation interface. Optionally, adding second audio content to the creation interface refers to adding the audio data of the second audio content.

[0026] In response to the user's content addition action, the client displays the audio data of the added second audio content in the creation interface. This second audio content is a different audio content from the first audio content. The audio data of the second audio content is generated when the user performs the creation action.

[0027] Optionally, the so-called second audio content and first audio content are two independent audio files, meaning that these two audio contents meet at least one of the following conditions: 1. The first audio content and the second audio content are two audio files with different semantic meanings; 2. The first audio content and the second audio content are two audio files with different melodies; 3. The first audio content and the second audio content are two audio files from different sources.

[0028] In some embodiments, the first audio content and the second audio content are two audio files with different semantic expressions. The aforementioned semantic expression refers to the semantic information displayed by the first audio content and the second audio content, such as descriptions of people, animals, things, and events. For example, the semantic expression of the first audio content might be a declaration of love to someone, while the semantic expression of the second audio content might be a critique of the lack of environmental protection.

[0029] In some embodiments, the first audio content and the second audio content are two audio files with different audio melodies. The aforementioned audio melodies refer to the beat, rhythm, etc., of the audio data in the first and second audio content, which are generated through different beats and rhythms. For example, the first audio content may be in 4 / 3 time signature, while the second audio content may be in 8 / 3 time signature; the beats of the first and second audio content are different.

[0030] In some embodiments, the first audio content and the second audio content are two audio files from different sources. The source refers to the creator of both the first and second audio content. The first and second audio content are created by two different authors. For example, the first audio content was created by user A, and the second audio content was created by user B.

[0031] Because the creation scenarios of the first and second audio content are different—for example, the first audio content is created by user A and the second audio content is created by another user B—the expressive semantics, audio melody, and source of these two audio contents can be different. However, in some possible cases, at least one or more of the expressive semantics, audio melody, and source of the first and second audio content may be the same, and this application does not limit this.

[0032] In some embodiments, when a complete song file is split into two independent audio files, these two audio files share the same expressive semantics, the same audio melody, and the same source. However, in this application, the first and second audio content do not need to meet these conditions. A newly created audio file can be generated simply by adding second audio content to the existing first audio content. For example, in the scenario where the first audio content is a verse, adding second audio content (another verse) to the first audio content generates a newly created cypher. Here, the first and second verses are two independent audio files.

[0033] In some embodiments, the audio data of the second audio content is audio data selected by the user from a plurality of audio data to be added. Optionally, a first add control is displayed in the authoring interface, and the content addition operation is an operation on the first add control. Step 220 may include the following sub-steps: in response to the operation on the first add control, displaying a plurality of audio content to be added; in response to the selection operation of the second audio content among the plurality of audio content to be added, adding the second audio content in the authoring interface.

[0034] The first add control is used to display multiple audio contents to be added. Optionally, the first add control can be a button, a text box, or an area; this application does not limit the implementation form of the first add control. For example, in the authoring interface, the first add control is an area. By clicking any point in this area, multiple candidate audio data to be added are displayed. The audio data of the first audio content selected by the user is added to the authoring interface.

[0035] like Figure 3 As shown, taking the continuation of the story as an example, Figure 3 An exemplary diagram of the continuation interface corresponding to the first audio content is shown. Figure 3 The middle area 31 displays the text data of the first audio content, and the user can listen to the first audio content using the playback control 32 in area 31. Figure 3 In the middle, the first add control 33 is displayed. The user adds audio data by clicking the first add control 33. Multiple audio data to be selected are displayed. The user selects the audio data of the second audio content to be added from the multiple audio data to be selected and adds the audio data of the second audio content to the continuation interface. The audio data of the second audio content is displayed in the first add control 33 of the continuation interface.

[0036] In some embodiments, the audio data of the second audio content is audio data recorded by the user in the content recording interface. Optionally, a second add control is displayed in the creation interface, and the content add operation is an operation on the second add control. Step 220 may include the following sub-steps: in response to the operation on the second add control, displaying the content recording interface; obtaining the second audio content recorded in the content recording interface; in response to the recording completion operation of the second audio content, adding the second audio content to the creation interface.

[0037] The second add control is used to display the content recording interface. Optionally, the second add control can be a button, a text box, or an area; this application does not limit the implementation form of the second add control. For example, in the authoring interface, the second add control is an area; clicking any point in this area displays the content recording interface. The content recording interface is the interface used by the user to record audio data. For example, in the content recording interface, based on the user's voice input, audio data of the second audio content is generated; the client obtains the audio data of the second audio content and adds it to the authoring interface.

[0038] By providing two different methods for uploading audio data, users can directly upload recorded audio data through the control, or upload audio data through real-time recording, thus enriching the product's functionality and enhancing its versatility.

[0039] Step 230: In response to the content publishing operation, display the publishing result of the first created content, which includes the first audio content and the second audio content.

[0040] The publication result indicates whether the first created content was published successfully. The publication result includes a successful publication result and a failed publication result. A successful publication result indicates that the first created content was published successfully, and a failed publication result indicates that the first created content was not published. Specifically, if the publication is successful, the successful publication result is displayed in the client; if the publication fails, the failed publication result is displayed in the client application.

[0041] The content publishing operation is performed by the user to publish the first created content. For example, when a user initiates a content publishing operation, the client merges the first audio content and the second audio content to generate the first created content. Then, the client sends a content storage request to the server, which includes the first created content. The client displays the publishing result of the first created content based on whether the server responds to the first created content notification in time. Optionally, when a user initiates a content publishing operation, the client first sends the identification information of the second audio content and the first audio content to the server. The identification information of the first audio content instructs the server to merge the first audio content and the second audio content into the first created content. A corresponding timer is set in the client, which starts counting after the client sends data information to the server. The data information can be either the first created content or the second audio content; this application does not limit this. If the server responds to the client within a preset time and indicates that the first created content storage is complete, the client displays a successful publishing result. Optionally, if the server does not respond to the client within the preset time, the client displays a publishing failure result.

[0042] Optionally, the first created content is synthesized by combining the first audio content and the second audio content through a client or server. For example, if the second audio content is a continuation of the first audio content, the client or server synthesizes the first and second audio content into the first created content. In this first created content, the first audio content is played first, followed by the second audio content. The first created content is a single, unified whole. Alternatively, the first created content is not a complete whole. If the second audio content is a continuation of the first audio content, the first created content consists of both the first and second audio content. In this case, the first and second audio content are two independent audio contents, and the first created content plays the first audio content first, followed by the second audio content.

[0043] like Figure 3 As shown, taking the continuation of the story as an example, Figure 3 This is an example illustration of the continuation interface corresponding to the first audio content. The user clicks... Figure 3 After publishing control 35, the client merges the first and second audio content to generate the first continuation content, and then sends the first continuation content to the server. Depending on whether the server responds to the first continuation content submission message on time, the client displays the corresponding publishing result of the first continuation content. For example... Figure 4 As shown, Figure 4 An example diagram illustrating the publication result corresponding to the first continuation of the content is shown. Figure 4 As shown in Figure (a), if the server responds to the client within the preset time and indicates that the first continuation content has been stored successfully, the client will display a successful publication result 41. At this time, the first continuation content can be viewed by clicking button 42. Optionally, as shown in Figure (a), Figure 4 As shown in Figure (b), if the server does not respond to the client within the preset time, the client will display the publishing failure result 43. At this time, the client can click button 44 to republish the first continuation content.

[0044] This application provides a technical solution for creating and publishing audio content based on audio content published by others. By adding audio data and text data of a second audio content to the creation interface corresponding to the first audio content, the first and second audio content are combined into a single creative content and published. This enables collaborative creation of audio content between users and enhances communication and interaction between users.

[0045] Furthermore, compared to choral singing, the first and second audio contents in this application are two independent audio contents, allowing users to appreciate and evaluate them independently, compare their merits, and enhance competition and communication among users.

[0046] Please refer to Figure 5 This illustrates a flowchart of a method for publishing audio content according to another embodiment of this application. The main body executing this method is... Figure 1 The terminal device 10 shown can be used to perform each step by a client of the target application. The method may include at least one of the following steps (510-570): Step 510: Display the content selection interface, which shows multiple candidate audio content.

[0047] Optionally, the content selection interface is the client's initial interface, which is the default interface displayed when the client is opened. The initial interface provides users with the most frequently used information and functions. The content selection interface contains multiple candidate audio content items and brief information about each item. This brief information is a portion of the relevant information for the audio content within the content selection interface. For example... Figure 6 Taking the continuation of the story as an example, Figure 6 Figure (a) shows the content selection interface, which is the initial interface. Figure 6 Figure (b) shows the content selection interface after the search tags have been changed. Figure 6 Figure (a) shows the content selection interface for audio content. Figure 6 Figure (b) shows the content selection interface for the continuation of the story. Areas 62 and 63 display multiple audio content items. Users can obtain multiple audio content items and their basic information in areas 62 and 63 of the content selection interface 61, such as... Figure 6 In the app, users can see the lyrics, tag information, playback controls, appreciation controls, favorite controls, play count, appreciation count, and favorite count of the audio content.

[0048] Step 520: In response to the selection operation of the first audio content among multiple candidate audio content, display the listening interface corresponding to the first audio content.

[0049] The first audio content is any audio content available in the content selection interface. The user selects the first audio content in the content selection interface and clicks the play button for that audio content to display a preview interface. For example... Figure 6 As shown, when a user clicks the playback control 64 in area 63, they can preview the audio content, which displays a preview interface showing relevant information about the audio content. Figure 7 As shown, taking the continuation of the story as an example, Figure 7 These are two different audio content preview interfaces. Figure 7 Figure (a) in the figure exemplarily shows a schematic diagram of the audio content preview interface. Figure 7Figure (b) in the figure exemplifies a schematic diagram of the audio interface for the continuation content. Figure 7 Figure (a) and Figure 7 Figure (b) shows the complete lyrics, complete tag information, tag information, playback controls, appreciation controls, favorite controls, play count, appreciation count, and favorite count of the audio content and the continuation interface. The listening interface 71 also includes a continuation control 72 and a rewrite control 73. The continuation control 72 is used to continue the audio content, merging the continuation content with the original audio content to generate the continued audio content. The rewrite control 73 is used to rewrite the audio content, generating the rewritten audio content. The rewritten audio content and the original audio content are two different audio contents.

[0050] Step 530: In response to the operation of the creation controls in the listening interface corresponding to the first audio content, display the creation interface corresponding to the first audio content.

[0051] The content selection interface and / or the listening interface corresponding to the first audio content display relevant information about the first audio content, including at least one of the following: text information of the first audio content, tag data of the first audio content, playback control, appreciation control, favorite control, number of plays, number of appreciations, and number of favorites.

[0052] The authoring controls are used to display the authoring interface. Taking a continuation writing exercise as an example, such as... Figure 7 As shown, users can click on the continuation control 72 in the listening interface 71 to display the continuation interface and continue writing the audio content.

[0053] Step 540: In response to the content addition operation, add a second audio content in the creation interface. The second audio content and the first audio content are two independent audio files.

[0054] Step 550: In response to the text editing operation, the text data of the second audio content is displayed in the creation interface; wherein, the first creation content also includes the text data of the second audio content.

[0055] Text editing is a user-initiated operation used to add text data of the second audio content to the authoring interface. In response to the user's text editing operation, the client displays the added text data of the second audio content in the authoring interface. This text data of the second audio content is the text data generated when the user performs the text editing operation.

[0056] In some embodiments, a text editing control is displayed in the authoring interface, and the text editing operation is an operation performed on the text editing control. Step 550 may include the following sub-steps: in response to the operation on the text editing control, displaying text data in an editing state in the authoring interface; in response to the editing completion operation, determining the text data in the editing state as the text data of the second audio content and displaying it in the authoring interface.

[0057] A text editing control is a control used to edit text data in the authoring interface. Optionally, the text editing control can be a button, a text box, or an area; this application does not limit the implementation form of the text editing control. For example, the text editing control in the authoring interface is an area; clicking any point in this area displays the text data in the editing state in the authoring interface. After the user finishes editing, the client determines the edited text data as the text data of the second audio content and displays this text data in the authoring interface.

[0058] like Figure 3 As shown, taking the continuation of the story as an example, Figure 3 An exemplary diagram of the continuation interface corresponding to the first audio content is shown. Figure 3 In the middle, a text editing control 34 is displayed. The user clicks the text editing control 34 to display the text data in the editing state in the continuation interface. After the user finishes editing the text data, the client determines the edited text data as the text data of the second audio content and displays the text data in the text editing control 34 of the continuation interface.

[0059] Optionally, step 550 further includes: obtaining the speech recognition result of the second audio content by performing speech recognition on the second audio content; and displaying the initial text data generated based on the speech recognition result of the second audio content in the creation interface, wherein the initial text data refers to the text data that is initialized and in the editing state.

[0060] Initialization text data refers to the speech recognition result of the second audio content generated after performing AI speech recognition on the audio data of the second audio content. For example, after the client obtains the audio data of the second audio content, the client uses an AI speech recognition system to perform speech recognition on the audio data of the second audio content, generating a recognition result for the second audio content. The client then generates initialization text data based on the recognition result and displays the generated initialization text data in the client. Optionally, the user can modify the initialization text data to obtain the following text data of the second audio content, which is then displayed in the authoring interface.

[0061] Through an artificial intelligence speech recognition system, speech recognition is directly performed on audio data to obtain initial text data, and then the user modifies the initial text data to obtain the completed modified text data. Compared with the ordinary method of directly inputting text data, the artificial intelligence speech recognition system simplifies the user's operation and improves the user's product experience.

[0062] Optionally, step 550 further includes: displaying a rhyme prompt message based on the rhyme detection result corresponding to the text data of the second audio content; wherein, the rhyme prompt message includes at least one of the following: the words in the text data of the second audio content that conform to the rhyme specification, the words in the text data of the second audio content that do not conform to the rhyme specification, the number of rhymes corresponding to the text data of the second audio content, the rhyming finals in the text data of the second audio content that conform to the rhyme specification, the rhyming finals in the text data of the second audio content that do not conform to the rhyme specification.

[0063] Optionally, for the words in the text data of the second audio content that do not conform to the rhyme specification obtained by performing rhyme detection on the text data of the second audio content. When the user edits the text data of the second audio content, the client can perform rhyme detection on the text data, obtain the rhyme prompt message of the text data, mark the unrhymed target words in the text data of the second audio content edited by the user according to the rhyme prompt message, and display them in the creation interface. For example, the last word at the end of the previous sentence of the text data edited by the user is "injury", when the last word at the end of the next sentence of the text data edited by the user is "window", the client determines that they rhyme and does not perform a marking operation; if the last word at the end of the next sentence of the text data edited by the user is "bed", since "injury" and "bed" do not rhyme, the client will mark and display "bed". Optionally, the client can mark and display "injury" and "bed" together. This application does not limit how to mark and display unrhymed words. Optionally, step 550 further includes: obtaining at least one recommended replacement word corresponding to the target word, the recommended replacement word being a word that conforms to the rhyme specification; displaying at least one recommended replacement word in the creation interface; and in response to a selection operation on the target recommended replacement word in the at least one recommended replacement word, replacing the target word with the target recommended replacement word.

[0064] Optionally, when the client detects the above unrhymed words, it can not only mark the unrhymed words, but also provide the user with replaceable rhyming words. For example, the last word at the end of the previous sentence of the text data edited by the user is "injury", if the last word at the end of the next sentence of the text data edited by the user is "bed", the client detects that "injury" and "bed" do not rhyme, then the client will provide the user with target recommended replacement words such as "help" and "when" for the user to replace.

[0065] In some embodiments, such as Figure 3 shown, taking continuation as an example, Figure 3 if the client detects that the word 37 (OO) does not rhyme with the previous text, the client displays three target recommended words 38: AA, BB, and CC for the user to select in the creation interface.

[0066] Optionally, perform a rhyme detection on the text data of the second audio content to obtain the rhyming finals in the text data of the second audio content that comply with the rhyme rules and the finals in the text data of the second audio content that do not comply with the rhyme rules. When the user edits the text data of the second audio content, the client can perform a rhyme detection on the text data, obtain the rhyme hint information of the text data, and display the finals of the above rhyme feet according to the rhyme hint information. For example, if the last word of a line of lyrics in the text data edited by the user is "injury", the client performs a rhyme detection on the text data, obtains that the final corresponding to "injury" is "ang", and the client will display its final "ang" near the rhyme foot "injury" in the creation interface. Similarly, for the rhyme foot "help", the client performs a rhyme detection on the text data, obtains that the final corresponding to "help" is "ang", and the client will display its final "ang" near the rhyme foot "help" in the creation interface. For the rhyme foot "window", the client performs a rhyme detection on the text data, obtains that the final corresponding to "window" is "uang".

[0067] Optionally, perform a rhyme detection on the text data of the second audio content to obtain the words in the text data of the second audio content that comply with the rhyme rules. When the user edits the text data of the second audio content, the client can perform a rhyme detection on the text data, obtain the rhyme hint information of the text data, obtain the finals corresponding to each rhyme foot according to the rhyme hint information, and display the rhyme feet with the same finals in the text data of the second audio content in the same color. For example, if the rhyme feet of two lines of lyrics in the text data edited by the user are "injury" and "when" respectively, and their corresponding finals are both "ang", the client uses the same color (such as red) to mark and display the two rhyme feet "injury" and "when". Optionally, the text data edited by the user contains two groups of rhyme feet. One group is "injury" and "when", and their corresponding finals are both "ang"; the other group is "window" and "frost", and their corresponding finals are both "uang". At this time, for the two rhyme feet "injury" and "when", the client uses red for marking; for the two rhyme feet "window" and "frost", the client uses different colors for marking, such as the client uses yellow for marking.

[0068] Optionally, perform a rhyming detection on the text data of the second audio content to obtain the words that conform to the rhyming norms in the text data of the second audio content and the number of rhymes corresponding to the text data of the second audio content. When the user edits the text data of the second audio content, the client can perform a rhyming detection on the text data, obtain the rhyming prompt information of the text data, obtain the number of rhymes for each rhyme according to the rhyming prompt information, and display the number of rhymes for each rhyme in the creation interface. For example, the rhymes of two lyrics in the text data edited by the user are "shang" and "dang" respectively, and their corresponding finals are both "ang", that is, these two rhymes rhyme. Among them, assuming that "shang" appears before "dang", the number of rhymes "×1" is displayed near "shang", and the number of rhymes "×2" is displayed near "dang". Among them, the number of rhymes for different rhymes is displayed in different colors or fonts.

[0069] In some embodiments, the server can score the audio content created by the user. For example, the server can score according to the song information of the user's audio content. The song information of the audio content can be the rhyming method, the number of rhymes, the expression method of the text, etc. of the text data, or the rhythm and melody of the audio data. The server scores the above song information of the audio content through a trained machine learning model. Optionally, the server can also select different scoring criteria according to the sticky notes selected by the user to score the audio content. This application does not limit this.

[0070] Step 560, in response to an operation on the tag selection control, display the tag information of the second audio content in the creation interface.

[0071] The tag selection control is a control for selecting tags. By clicking on the tag selection control, multiple tags are displayed for selection. According to the tags selected by the user, the tag information of the second audio content is determined and displayed in the creation interface. There are multiple tag selection controls displayed in the creation interface. The multiple tag selection controls correspond to multiple different tag types. The multiple different tag types include at least two of the following: text theme, audio style, rhyming method.

[0072] The client responds to the user's tag selection operation and displays the tag information of the second audio content in the creation interface. Among them, the tag information of the second audio content is used to represent information such as the text theme, audio style, and rhyming method of the audio content.

[0073] In some embodiments, as Figure 3 shown, Figure 3 the area 36 in is the tag selection control of the audio content. As shown in the figure, the tag selection control can be multiple controls, for example Figure 3The controls include style controls, theme controls, and lyrics type controls. Optionally, the tag selection control can also be other controls such as rhyme type controls and event type controls, which are not limited in this application.

[0074] Step 570: In response to the content publishing operation, display the publishing result of the first created content, which includes the first audio content and the second audio content.

[0075] For a description of steps 540 and 570, please refer to the above embodiment; they will not be repeated here.

[0076] The technical solution provided in this application, through the selection of tags in the creation interface, determines information such as the text theme, audio style, and rhyming method of the audio content. This allows other users to select audio content based on their preferred tags, facilitating the acquisition of their favorite audio content. Compared to choral singing, this application allows users to create both the audio and text aspects of the audio content, enhancing the freedom and flexibility of audio creation and providing users with a broader creative space.

[0077] Simultaneously, by displaying simple information about each audio content in the content selection interface and relevant information in the preview interface, users can understand the information of each audio content in the content selection interface, thus selecting audio content that interests them. Secondly, by displaying detailed information about the audio content in the preview interface, users can obtain detailed information about the audio content while listening, enabling them to perform a series of operations such as creation based on the audio content.

[0078] Please refer to Figure 8 This illustrates a flowchart of a method for publishing audio content according to another embodiment of this application. The main body executing this method is... Figure 1 The terminal device 10 shown can be used to perform each step by a client of the target application. The method may include at least one of the following steps (810-840): Step 810: Display the interface corresponding to the second created content. The second created content includes at least two audio contents.

[0079] The second creative content is the audio content obtained after creating the audio content. The second creative content contains multiple audio content, including audio data and text data of multiple audio content. Among them, the second creative content includes at least two audio content.

[0080] In some embodiments, such as Figure 7 As shown, Figure 7Figure (b) shows the display interface of the second continuation content. Users can obtain detailed information about the second continuation content from this interface, including complete lyrics, complete tag information, tag information, playback controls, appreciation controls, favorite controls, play count, appreciation count, and favorite count. Users can modify the audio or text data of any segment of the second continuation content using the rewrite control 73 to generate the rewritten second continuation content, which is the third continuation content described below. It should be noted that the second continuation content is not deleted; the third continuation content and the second continuation content are two different and independent audio contents.

[0081] Step 820: In response to the rewriting operation of the third audio content among the above at least two audio contents, the rewriting interface corresponding to the third audio content is displayed.

[0082] The rewriting interface corresponding to the third audio content is a user interface for users to rewrite the third audio content. In this embodiment, rewriting refers to rewriting the text data and / or audio data of the audio content to generate the rewritten third audio content. Optionally, the rewriting interface corresponding to the third audio content displays relevant information about the third audio content, such as, but not limited to, at least one of the following: the name of the third audio content, playback controls, tag information, text data, and audio data.

[0083] The third audio content can be any audio content from the second created content. The third audio content is the audio content in the second created content that needs to be modified. Clicking the modification control corresponding to the third audio content displays the modification interface for that third audio content. Optionally, clicking the modification control in the second created content and selecting to modify the third audio content will also display the modification interface for that third audio content.

[0084] In some embodiments, such as Figure 7 As shown, the user can modify the secondary creation content by clicking the rewrite control 73, displaying the rewrite interface. Optionally, the rewrite interface is as follows: Figure 9 As shown, Figure 9 This is the rewriting interface corresponding to the second creation content. In the rewriting interface 91, you can input text data and audio data, and select tag controls, etc., to rewrite the third audio content.

[0085] Step 830: In response to the content rewriting operation, add the rewritten third audio content to the rewriting interface, and / or display the text data of the rewritten third audio content in the rewriting interface.

[0086] In the rewriting interface, the third audio content is rewritten by adding rewritten audio data and / or text data.

[0087] In some embodiments, such as Figure 9 As shown, modified audio data can be added to area 92 to modify the third audio content. Optionally, modified text data can be added to area 93 to modify the third audio content. Alternatively, modified audio data can be added to area 92 and modified text data can be added to area 93 to modify the third audio content. This application does not limit how the third audio content is modified.

[0088] Optionally, the audio data and text data uploading methods in step 830 can be divided into the following three methods: Method A, Method B, and Method C.

[0089] Method A involves the user uploading not only modified audio data but also modified text data. For a description of Method A, please refer to the above embodiment; it will not be repeated here. Method B involves the user uploading only the modified audio data, while the modified text data is added via steps 831-832. Method C involves the user uploading only the modified text data, while the modified audio data is added via steps 833-835.

[0090] The following is the method for adding the rewritten text data in Method B: Step 831: Obtain the speech recognition result of the rewritten third audio content by performing speech recognition on the rewritten third audio content.

[0091] In the rewriting interface, after the user uploads the audio data of the rewritten third audio content, the client obtains the speech recognition result of the third audio content based on the artificial intelligence speech recognition system.

[0092] Step 832: Based on the speech recognition results of the rewritten third audio content, modify the text data of the third audio content displayed in the rewriting interface so as to display the rewritten text data of the third audio content in the rewriting interface.

[0093] Users modify the speech recognition results of the third audio content to obtain the rewritten text data of the third audio content, which is then displayed in the creation interface.

[0094] By performing AI speech recognition on the rewritten audio data uploaded by the user, the speech recognition result of the text data is obtained. The user can then modify the speech recognition result of the text data to obtain the rewritten text data. The user does not need to input all the text data; they only need to modify the speech recognition result of the text data, which simplifies the user's operation.

[0095] The following is the method for adding audio data after the rewrite in method C: Step 833: If the rewritten third audio content is not added in the rewriting interface, then obtain the text data based on the rewritten third audio content and automatically generate the rewritten third audio content.

[0096] After the user adds only the modified text data of the third audio content to the client, the client can automatically generate the modified audio data of the third audio content based on this text data. The modified audio data of the third audio content can be generated based on the target timbre and the modified text data. The target timbre is the timbre used to generate the audio data of the third audio content; the target timbre can be the default timbre in the client; optionally, the target timbre can also be the timbre used by the user modifying the third audio content; this application does not limit this.

[0097] Step 834: Add the modified third audio content to the rewriting interface.

[0098] After the client generates the audio data of the rewritten third audio content, the audio data of the rewritten third audio content is displayed in the audio data addition area of the rewriting interface.

[0099] In some embodiments, such as Figure 9 As shown, Figure 9 The middle area 92 displays the audio data of the rewritten third audio content.

[0100] Step 835: In response to the listening operation for the rewritten third audio content, play the rewritten third audio content.

[0101] When the client's rewriting interface displays the rewritten third-party audio content's audio data, users can click on the rewritten audio data to preview it. Optionally, if the user is not satisfied with the rewritten third-party audio content's audio data, they can replace or modify it.

[0102] By automatically generating corresponding audio data from the rewritten text data uploaded by users, the system eliminates the need for users to record audio data. This simplifies the process of rewriting audio content, protects user privacy, and enhances the versatility of product functionality.

[0103] Step 840: In response to the rewritten content publishing operation, display the publishing result of the third-authored content. The third-authored content includes the rewritten third audio content, and at least two other audio contents besides the third audio content.

[0104] After generating the audio and text data of the rewritten third-party audio content in the rewriting interface, the publishing operation is performed based on the user's created content, such as... Figure 9 Clicking the publish control (94) initiates a process where the server first generates a rewritten third audio content based on the audio and text data of the rewritten third audio content. Then, based on the rewritten third audio content and other audio content from the second continuation content (excluding the third audio content itself), the server generates a third continuation content. The client then sends a storage request for the generated third continuation content to the server. (Example:) Figure 4 As shown, Figure 4 An example diagram illustrating the publication result corresponding to the third continuation of the content is shown. Figure 4 As shown in Figure (a), if the server responds to the client within the preset time and indicates that the third-party continuation content has been stored successfully, the client will display a successful publication result 41. At this time, the third-party continuation content can be viewed by clicking button 42. Optionally, as shown in Figure (a), Figure 4 As shown in Figure (b), if the server does not respond to the client within the preset time, the client will display the publishing failure result 43. At this time, the client can click button 44 to republish the third continuation content.

[0105] In some embodiments, audio content can also be rewritten to generate rewritten audio content, such as rewriting the text data and / or audio data of the first audio content to generate modified first audio data.

[0106] This embodiment introduces a method for rewriting audio content. It involves rewriting the text data and / or subsequent audio data of any audio content displayed on a content creation interface to obtain the rewritten audio content. This rewritten content is then further modified and published. By rewriting the original audio content, various other audio content can be derived from a single audio source, thereby enhancing the diversity of product functionality and the flexibility of audio creation.

[0107] Please refer to Figure 10 This illustrates a flowchart of a method for publishing audio content according to another embodiment of this application. The main body executing this method is... Figure 1 The terminal device 10 shown can be executed by a client of the target application, as each step can be performed. The method may include at least one of the following steps (1010-1060): Step 1010: In response to an operation on the content publishing control, display the content publishing interface.

[0108] In some embodiments, such as Figure 6 As shown, Figure 6 An example diagram of a content selection interface is shown, which is accessed by clicking... Figure 6 The content publishing control 65 in the middle is used to publish audio content and display the content publishing interface, such as... Figure 11 As shown, Figure 11 The diagram illustrates an example of a content publishing interface, where users can input text and audio data, select tag controls, and perform other operations to publish audio content.

[0109] Step 1020: In response to the content to be published addition operation, add the fourth audio content in the content publishing interface.

[0110] In some embodiments, such as Figure 11 As shown, add the audio data of the fourth audio content in region 111.

[0111] Step 1030: In response to the text editing operation to be published, display the text data of the fourth audio content in the content publishing interface.

[0112] In some embodiments, such as Figure 11 As shown, text data for the fourth audio content is added to region 112.

[0113] Step 1040: In response to the tag selection operation, display the tag information of the fourth audio content in the content publishing interface.

[0114] In some embodiments, such as Figure 11 As shown, select the tag information for the fourth audio content in area 113.

[0115] Step 1050: In response to the publication operation of the content to be published, display the publication result of the fourth audio content.

[0116] In the content publishing interface, after adding audio and text data and selecting tag information, based on the user's content publishing action, the server generates fourth audio content using the audio and text data of the fourth audio content. The client will then send a fourth audio content storage request to the server based on the generated fourth audio content. Figure 4 As shown, Figure 4 An example diagram illustrating the publishing result corresponding to the fourth audio content is shown. Figure 4 As shown in Figure (a), if the server responds to the client within the preset time and indicates that the fourth audio content has been stored successfully, the client will display a successful publication result 41. At this time, the fourth audio content can be viewed by clicking button 42. Optionally, as shown in Figure (a), if the server responds to the client and indicates that the fourth audio content has been stored successfully within the preset time, the client will display a successful publication result 41. At this time, the fourth audio content can be viewed by clicking button 42. Figure 4 As shown in Figure (b), if the server does not respond to the client within the preset time, the client will display the publishing failure result 43. At this time, the fourth audio content can be republished by clicking button 44.

[0117] Step 1060: In response to the random splicing operation for the fourth audio content, display the preview interface of the spliced audio content.

[0118] The spliced audio content is generated based on the audio and text data of the fourth audio content, as well as the audio and text data of at least one other audio content that matches the tag information of the fourth audio content.

[0119] Random splicing is used to automatically merge user-published audio content with other audio content to generate original content. Random splicing can be performed before or after the user-published audio content. Optionally, the random splicing operation uses audio data, text data, and tag information of the fourth audio content, and through neural network calculations, selects one or more matching other audio content to generate spliced audio content synthesized from multiple audio data. Optionally, after neural network calculations, one or more matching other audio content are selected, and the user selects the audio data they want to splice, generating the spliced audio content. This application does not limit the method of generating spliced audio content. Optionally, the user can choose not to splice their own published audio content; in this case, the audio content will not be randomly spliced, nor will it be randomly spliced by other audio content.

[0120] This embodiment generates spliced audio content by randomly splicing the generated fourth audio content after it has been processed by a neural network, thereby promoting the exchange of audio content between users.

[0121] At the same time, by randomly splicing content, users can get to know other users with similar styles, rather than just the authors of popular audio content displayed on the content display interface. The random splicing method allows all users who want to participate to communicate with other users, enriching the ways users communicate and promoting communication between users.

[0122] Please refer to Figure 12 This illustrates a flowchart of a collaborative authoring method for audio content provided in another embodiment of this application. The main body executing this method is... Figure 1 The terminal device 10 shown can be executed by a client of the target application, as each step can be performed. The method may include at least one of the following steps (1210-1230): Step 1210: In response to the creation or joining of a collaborative creation room, display the interface of the collaborative creation room, which allows multiple user accounts to participate in online collaborative creation of audio content.

[0123] A collaborative creation room is a space where multiple users collaborate online to create audio content. This room facilitates communication between users through the creation and exchange of audio content. In some embodiments, multiple users can upload their initial audio and text data to the collaborative creation room, then exchange and refine these data to obtain finalized audio and text data, ultimately resulting in the completed audio content. Optionally, the online collaborative audio content can be a single audio file or a composite audio file generated from multiple audio files.

[0124] The server receives creation requests for collaborative creation rooms from multiple clients, generates the room, and displays it in the clients. Other users can also join the collaborative creation room via a room join request.

[0125] Step 1220: Display the target audio content for collaborative creation in the collaborative creation room's display interface.

[0126] In some embodiments, multiple users create audio content based on selected tag information, resulting in preliminary audio and text data created by the users. This data is then displayed on the collaborative creation room's interface, allowing each user to view the preliminary audio and text data created by the multiple users on their respective clients.

[0127] Step 1230: In response to the modification operation on the target audio content, modify the target audio content in the display interface of the collaborative creation room.

[0128] In some embodiments, each user modifies the audio and text data initially created by multiple users to obtain the modified audio and text data of the audio content. Finally, the modifications of each user are integrated to obtain the audio content under the specified tag information of each user, thus completing the online collaborative creation.

[0129] In some embodiments, such as Figure 13 As shown, taking the continuation of the story as an example, Figure 13 An exemplary diagram of a collaborative creation room is shown. This room contains two users: User Nickname 1 and User Nickname 2. The two users create continuation content based on selected tag information 131. Area 132 is the text data display area for User Nickname 1, and area 133 is the text data display area for User Nickname 2. After the two users upload their audio and text data, they modify the data to obtain the revised continuation content, which is then published via the publish button 134.

[0130] This embodiment provides an online collaborative creation method. Multiple users initiate collaborative creation room creation requests, generating a collaborative creation room. Through this room, multiple users create the same work or exchange ideas on multiple works, ultimately resulting in audio content that satisfies each user. This transforms one-way, non-real-time communication into real-time, multi-user interaction, promoting communication between users.

[0131] In some embodiments, the target application in the terminal device 10 also has the following functions: In some embodiments, such as Figure 14 As shown, taking the continuation of the story as an example, Figure 14 The example shown is a schematic diagram of a user's personal information interface. This interface displays the user's number of followers, number of fans, and the audio content created by the user. Optionally, audio content can be filtered using audio content tags, such as... Figure 14 As shown, Figure 14 There are two filter criteria: audio content 141 and continuation content 142. Figure 14 Figure (a) shows the user's profile when the filter is audio content 141. Figure 14 Figure (b) shows the user's personal information interface when the filter condition is "continue writing content 142".

[0132] In some embodiments, such as Figure 15 As shown, taking the continuation of the story as an example, Figure 15 An exemplary diagram of a leaderboard interface is shown. Optionally, the leaderboard interface can display a leaderboard for a single audio content or a leaderboard for continuation audio content generated from multiple audio content. Figure 15 The display shows a ranking of continuation audio content. It uses a neural network to calculate the play count, likes, and favorites of each continuation audio piece on the client, ranking them by their popularity. The ranking shows basic information such as the song title and artist for each continuation audio piece. Users can click the playback control (151) to access the listening interface, listen to the continuation audio piece, and learn more about its details.

[0133] In some embodiments, such as Figure 16 As shown, Figure 16 This example illustrates a schematic diagram of an IM (Instant Messenger) system interface between users, allowing for audio communication between them. Optionally, group chats within the IM system can also be created to facilitate communication between multiple users. Figure 16As shown, after selecting the nickname 2 for communication, the user can communicate with other users by entering text or audio content in the information input control 161.

[0134] In some embodiments, such as Figure 17 As shown, Figure 17 The illustration shows a schematic diagram of a livehouse (live music hall) live streaming interface. The audio content is displayed unilaterally by live streaming online. Viewers can perform a series of interactive operations such as sending bullet comments, liking, commenting, forwarding and collecting through interactive controls 171.

[0135] Through the aforementioned functions, the communication system between users has been improved. By using personal information interfaces, leaderboards, IM systems, and livehouse streaming, collaborative creation of audio content among users has been realized, enhancing communication between users, increasing users' desire to create, and promoting the development of the creation platform.

[0136] In some embodiments, please refer to Figure 18 This document illustrates a flowchart of an audio content liking method according to an embodiment of this application. When a user opens the client, the client displays a content selection interface. Based on the user's browsing history and the tags of liked and saved audio content, the server provides the client with audio content matching the user's interests, and the client displays this audio content. When a user likes a piece of audio content, the client displays that the like count for that audio content is incremented by one. The client saves the change in the like count for that audio content and submits this change to a state saving queue. The server processes the data in the state saving queue sequentially and saves the processed changes in the like count for audio content in a database. If the user cancels the like before the server processes the change in the like count, the change in the like count for that audio content is removed from the state saving queue. Optionally, if the user cancels the like after the server processes the change in the like count, a different change in the like count for that audio content is resubmitted to the state saving queue.

[0137] The like system prioritizes high-quality audio content in the content display interface and leaderboard, promoting communication among users and stimulating their competitive spirit.

[0138] In some embodiments, please refer to Figure 19The diagram illustrates a flowchart of an audio content search method provided in one embodiment of this application. Similarly, when a user opens the client, the client displays a content selection interface. Based on the user's browsing history and the tags of audio content liked and saved, the server provides the client with audio content that matches the user's interests, and the client displays this audio content. For example... Figure 20 As shown, Figure 20 The search results are displayed in two different categories: audio content and creative content, when the user clicks... Figure 20 The client enters keywords into the text box corresponding to the search control 201. The client then sends the query parameters generated based on the keyword information to the server. The server then sends the corresponding audio content found in the search to the client based on the query parameters and displays the corresponding search results on the client.

[0139] The search system helps users quickly find the audio data they want, saving them time.

[0140] The following are embodiments of the apparatus described in this application, which can be used to execute the embodiments of the method described in this application. For details not disclosed in the apparatus embodiments of this application, please refer to the embodiments of the method described in this application.

[0141] Please refer to Figure 21 This diagram illustrates a block diagram of an audio content publishing apparatus according to an embodiment of this application. The apparatus has the function of implementing the aforementioned audio content publishing method; this function can be implemented in hardware or by hardware executing corresponding software. The apparatus 2100 may include: a creation interface display module 2110, an audio creation module 2120, and a creation and publishing module 2130.

[0142] The creation interface display module 2110 is used to display the creation interface corresponding to the first audio content. The creation interface displays relevant information about the first audio content, which is the created audio content.

[0143] The audio creation module 2120 is used to add a second audio content in the creation interface in response to a content addition operation. The second audio content and the first audio content are two independent audio files.

[0144] The creation and publishing module 2130 is used to respond to the content publishing operation and display the publishing result of the first created content, which includes the first audio content and the second audio content.

[0145] In some embodiments, the creation interface displays a first add control, and the content addition operation is an operation on the first add control; the audio creation module 2120 is used for: In response to the operation on the first added control, multiple audio contents to be added are displayed; In response to the selection operation of the second audio content among the plurality of audio content to be added, the second audio content is added in the creation interface.

[0146] In some embodiments, the creation interface displays a second add control, and the content add operation is an operation on the second add control; the audio creation module 2120 is used for: In response to the operation of adding the second control, the content recording interface is displayed; Obtain the second audio content recorded in the content recording interface; In response to the completion of recording for the second audio content, the second audio content is added to the creation interface.

[0147] In some embodiments, such as Figure 22 As shown, the device 2100 also includes a text creation module 2140.

[0148] The text creation module 2140 is used to display the text data of the second audio content in the creation interface in response to a text editing operation; wherein the first creation content also includes the text data of the second audio content.

[0149] In some embodiments, the creation interface displays a text editing control, and the text editing operation is an operation on the text editing control; the text creation module 2140 is used for: In response to an operation on the text editing control, text data in an editing state is displayed in the authoring interface; In response to the completion of the editing operation, the text data in the editing state is determined as the text data of the second audio content and displayed in the creation interface.

[0150] In some embodiments, the audio authoring module 2120 is further configured to: Obtain the speech recognition result of the second audio content by performing speech recognition on the second audio content; The creation interface displays initial text data generated based on the speech recognition results of the second audio content. The initial text data refers to the text data that is initialized and in the editing state.

[0151] In some embodiments, the text authoring module 2140 is further configured to: Based on the rhyme detection results corresponding to the text data of the second audio content, rhyme prompts are displayed; The rhyming prompt information includes at least one of the following: words in the text data of the second audio content that conform to the rhyming rules, words in the text data of the second audio content that do not conform to the rhyming rules, the number of rhymes corresponding to the text data of the second audio content, the vowels in the text data of the second audio content that conform to the rhyming rules, and the vowels in the text data of the second audio content that do not conform to the rhyming rules.

[0152] In some embodiments, the text authoring module 2140 is further configured to: Obtain at least one recommended replacement word corresponding to the target word, wherein the recommended replacement word is a word that conforms to the rhyming rules; display the at least one recommended replacement word in the creation interface; in response to the selection operation of the target recommended replacement word among the at least one recommended replacement word, replace the target word with the target recommended replacement word.

[0153] In some embodiments, the creation interface displays multiple tag selection controls, which correspond to various different tag types. These various different tag types include at least two of the following: text theme, audio style, and rhyme scheme; such as... Figure 22 As shown, the device 2100 also includes a label selection module 2150.

[0154] Tag selection module 2150 is used to display tag information of the second audio content in the creation interface in response to an operation on the tag selection control.

[0155] In some embodiments, the creation interface display module 2110 is used for: The content selection interface is displayed, which shows multiple candidate audio contents; In response to the selection operation of the first audio content among the plurality of candidate audio content, the listening interface corresponding to the first audio content is displayed; In response to an operation on the creation controls in the listening interface corresponding to the first audio content, the creation interface corresponding to the first audio content is displayed. The content selection interface and / or the listening interface corresponding to the first audio content display relevant information about the first audio content. The relevant information includes at least one of the following: text data of the first audio content, tag information of the first audio content, playback control, appreciation control, favorite control, number of plays, number of appreciations, and number of favorites.

[0156] In some embodiments, such as Figure 22 As shown, the device 2100 also includes a content creation display module 2160, a rewriting interface display module 2170, a content rewriting module 2180, and a rewriting and publishing module 2190.

[0157] The creation content display module 2160 is used to display the display interface corresponding to the second creation content, which includes at least two audio contents. The rewrite interface display module 2170 is used to display the rewrite interface corresponding to the third audio content in response to the rewrite operation of the third audio content among the at least two audio contents. The content rewriting module 2180 is used to respond to the content rewriting operation by adding rewritten third audio content to the rewriting interface, and / or displaying the text data of the rewritten third audio content in the rewriting interface. The rewrite and publish module 2190 is used to respond to the rewrite content publish operation and display the publish result of the third-creation content; wherein, the third-creation content includes the rewritten third audio content, and other audio content other than the third audio content among the at least two audio contents.

[0158] In some embodiments, the content rewriting module 2180 is further configured to: Obtain the speech recognition result of the rewritten third audio content by performing speech recognition on the rewritten third audio content; Based on the speech recognition results of the rewritten third audio content, the text data of the third audio content displayed in the rewriting interface is modified so that the text data of the rewritten third audio content can be displayed in the rewriting interface.

[0159] In some embodiments, the content rewriting module 2180 is further configured to: If the rewritten third audio content is not added in the rewriting interface, then the rewritten third audio content is automatically generated based on the text data of the rewritten third audio content. Add the rewritten third audio content to the rewriting interface; In response to a listening operation on the rewritten third audio content, the rewritten third audio content is played.

[0160] In some embodiments, such as Figure 22 As shown, the device 2100 also includes a control interface display module 2210, a publishing interface display module 2220, a publishing audio adding module 2230, a publishing text display module 2240, a publishing tag selection module 2250, and a publishing content publishing module 2260.

[0161] The control interface display module 2210 is used to display the interface containing the content publishing control; The publishing interface display module 2220 is used to display the content publishing interface in response to the operation of the content publishing control; The audio addition module 2230 is used to add audio data of a fourth audio content in the content publishing interface in response to the content addition operation to be published. The text display module 2240 is used to display the text data of the fourth audio content in the content publishing interface in response to the text editing operation to be published; The tag selection module 2250 is used to display the tag information of the fourth audio content in the content publishing interface in response to the tag selection operation; The content publishing module 2260 is used to display the publishing result of the fourth audio content in response to the content publishing operation to be published.

[0162] In some embodiments, the content publishing module 2260 is further configured to display a preview interface of the spliced audio content in response to a random splicing operation of the fourth audio content; wherein the spliced audio content is generated based on the fourth audio content and at least one other audio content that matches the tag information of the fourth audio content.

[0163] In some embodiments, such as Figure 22 As shown, the device 2100 also includes a room display module 2270, a data display module 2280, and a data modification module 2290.

[0164] The room display module 2270 is used to display the display interface of the collaborative creation room in response to the creation or joining operation of the collaborative creation room, which allows multiple user accounts to participate in online collaborative creation of audio content; The data display module 2280 is used to display the target audio content of the collaborative creation in the display interface of the collaborative creation room; The data modification module 2290 is used to modify the target audio content in the display interface of the collaborative creation room in response to the modification operation on the target audio content.

[0165] This application provides a technical solution for creating and publishing audio content based on audio content published by others. By adding audio data and text data of a second audio content to the creation interface corresponding to the first audio content, the first and second audio content are combined into a single creative content and published. This enables collaborative creation of audio content between users and enhances communication and interaction between users.

[0166] Furthermore, compared to choral singing, the first and second audio contents in this application are two independent audio contents, allowing users to appreciate and evaluate them independently, compare their merits, and enhance competition and communication among users.

[0167] Please refer to Figure 23 This diagram illustrates a structural block diagram of a terminal device 2300 provided in one embodiment of this application. The terminal 2300 can be a mobile phone, tablet computer, smart TV, multimedia playback device, PC, etc. Figure 1 The terminal device 10 described in the embodiment.

[0168] Typically, terminal 2300 includes a processor 2301 and a memory 2302.

[0169] Processor 2301 may include one or more processing cores, such as a quad-core processor, an octa-core processor, etc. Processor 2301 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field Programmable Gate Array), and PLA (Programmable Logic Array). Processor 2301 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, processor 2301 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, processor 2301 may also include an AI (Artificial Intelligence) processor, which is used to handle computational operations related to machine learning.

[0170] The memory 2302 may include one or more computer-readable storage media, which may be non-transitory. The memory 2302 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. The memory 2302 stores at least one instruction, at least one program, code set, or instruction set, which is loaded and executed by the processor 2301 to implement the aforementioned audio content publishing method.

[0171] In some embodiments, the terminal 2300 may also optionally include a peripheral device interface 2303 and at least one peripheral device. The processor 2301, memory 2302, and peripheral device interface 2303 can be connected via a bus or signal line. Each peripheral device can be connected to the peripheral device interface 2303 via a bus, signal line, or circuit board. Specifically, the peripheral device may include at least one of a display screen 2304, audio circuitry 2305, communication interface 2306, and power supply 2307.

[0172] Those skilled in the art will understand that Figure 23 The structure shown does not constitute a limitation on terminal 2300 and may include more or fewer components than shown, or combine certain components, or use different component arrangements.

[0173] In an exemplary embodiment, a computer-readable storage medium is also provided, which stores at least one instruction, at least one program, code set, or instruction set, wherein the at least one instruction, at least one program, code set, or instruction set implements the above-described method for publishing audio content when executed by the processor of a terminal device.

[0174] Optionally, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random-Access Memory), SSD (Solid State Drives), or optical disc, etc. The random access memory may include ReRAM (Resistance Random Access Memory) and DRAM (Dynamic Random Access Memory).

[0175] In an exemplary embodiment, a computer program product or computer program is also provided, comprising computer instructions stored in a computer-readable storage medium. The processor of the terminal device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the aforementioned method for publishing audio content.

[0176] It should be understood that "multiple" as used herein refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. Furthermore, the step numbers described herein are merely illustrative of one possible execution order. In some other embodiments, the steps may not be executed in numerical order, such as two steps with different numbers being executed simultaneously, or two steps with different numbers being executed in the reverse order of the illustration. This application does not limit this.

[0177] The above are merely exemplary embodiments of this application and are not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims

1. A method for publishing audio content, characterized in that, The method includes: The creation interface corresponding to the first audio content is displayed. The creation interface displays relevant information about the first audio content, which is the already created audio content. In response to the content addition operation, a second audio content is added to the creation interface, wherein the first audio content and the second audio content differ in at least one aspect, such as semantic expression, audio melody, or source; the second audio content is audio content selected from a plurality of audio content to be added, or the second audio content is audio content recorded by the user. In response to the content publishing operation, the publishing result of the first created content is displayed, the first created content including the first audio content and the second audio content.

2. The method according to claim 1, characterized in that, The creation interface displays a first add control, and the content add operation is an operation on the first add control. In response to the content addition operation, adding second audio content to the creation interface includes: In response to the operation on the first added control, multiple audio contents to be added are displayed; In response to the selection operation of the second audio content among the plurality of audio content to be added, the second audio content is added in the creation interface.

3. The method according to claim 1, characterized in that, The creation interface displays a second add control, and the content add operation is an operation on the second add control. In response to the content addition operation, adding second audio content to the creation interface includes: In response to the operation of adding the second control, the content recording interface is displayed; Obtain the second audio content recorded in the content recording interface; In response to the completion of recording for the second audio content, the second audio content is added to the creation interface.

4. The method according to claim 1, characterized in that, The method further includes: In response to the content publishing operation, the identification information of the first audio content and the second audio content are sent to the server; wherein, the first created content is obtained by the server synthesizing the first audio content and the second audio content after determining the first audio content based on the identification information of the first audio content.

5. The method according to claim 1, characterized in that, After displaying the creation interface corresponding to the first audio content, it also includes: In response to a text editing operation, the text data of the second audio content is displayed in the authoring interface; The first created content also includes the text data of the second audio content.

6. The method according to claim 5, characterized in that, The creation interface displays a text editing control, and the text editing operation is an operation performed on the text editing control. The step of displaying the text data of the second audio content in the authoring interface in response to a text editing operation includes: In response to an operation on the text editing control, text data in an editing state is displayed in the authoring interface; In response to the completion of the editing operation, the text data in the editing state is determined as the text data of the second audio content and displayed in the creation interface.

7. The method according to claim 6, characterized in that, The response to the content addition operation, after adding second audio content in the creation interface, further includes: Obtain the speech recognition result of the second audio content by performing speech recognition on the second audio content; The creation interface displays initial text data generated based on the speech recognition results of the second audio content. The initial text data refers to the text data that is initialized and in the editing state.

8. The method according to claim 5, characterized in that, After displaying the text data of the second audio content in the creation interface, the method further includes: Based on the rhyme detection results corresponding to the text data of the second audio content, rhyme prompts are displayed; The rhyming prompt information includes at least one of the following: words in the text data of the second audio content that conform to the rhyming rules, words in the text data of the second audio content that do not conform to the rhyming rules, the number of rhymes corresponding to the text data of the second audio content, the vowels in the text data of the second audio content that conform to the rhyming rules, and the vowels in the text data of the second audio content that do not conform to the rhyming rules.

9. The method according to claim 8, characterized in that, The method further includes: Obtain at least one recommended replacement word corresponding to the target word, wherein the recommended replacement word is a word that conforms to the rhyming rules; The at least one recommended replacement word is displayed in the creation interface; In response to a selection operation for a target recommended replacement word among the at least one recommended replacement word, the target word is replaced with the target recommended replacement word.

10. The method according to claim 1, characterized in that, The creation interface displays multiple tag selection controls, which correspond to various different tag types. These various different tag types include at least two of the following: text theme, audio style, and rhyming method. After displaying the creation interface corresponding to the first audio content, it also includes: In response to an operation on the tag selection control, the tag information of the second audio content is displayed in the creation interface.

11. The method according to claim 1, characterized in that, The creation interface that displays the first audio content includes: The content selection interface is displayed, which shows multiple candidate audio contents; In response to the selection operation of the first audio content among the plurality of candidate audio content, the listening interface corresponding to the first audio content is displayed; In response to an operation on the creation controls in the listening interface corresponding to the first audio content, the creation interface corresponding to the first audio content is displayed. The content selection interface and / or the listening interface corresponding to the first audio content display relevant information about the first audio content. The relevant information includes at least one of the following: text data of the first audio content, tag information of the first audio content, playback control, appreciation control, favorite control, number of plays, number of appreciations, and number of favorites.

12. The method according to claim 11, characterized in that, The content selection interface displays at least two search tags, and the method further includes: In response to an operation targeting a target search tag among the at least two search tags, multiple candidate audio contents matching the target search tag are displayed.

13. The method according to claim 1, characterized in that, The method further includes: Displays the interface containing content publishing controls; In response to an operation on the content publishing control, the content publishing interface is displayed; In response to the content to be published addition operation, a fourth audio content is added to the content publishing interface; In response to a text editing operation to be published, the text data of the fourth audio content is displayed in the content publishing interface; In response to the tag selection operation, the tag information of the fourth audio content is displayed in the content publishing interface; In response to the publication operation of the content to be published, the publication result of the fourth audio content is displayed.

14. The method according to claim 13, characterized in that, After displaying the publication result of the fourth audio content in response to the publication operation of the content to be published, the method further includes: In response to the random splicing operation of the fourth audio content, a preview interface of the spliced audio content is displayed; The spliced audio content is generated based on the fourth audio content and at least one other audio content that matches the tag information of the fourth audio content.

15. The method according to claim 1, characterized in that, The method further includes: In response to the creation or joining of a collaborative creation room, the display interface of the collaborative creation room is displayed, which allows multiple user accounts to participate in online collaborative creation of audio content; The target audio content for collaborative creation is displayed on the interface of the collaborative creation room. In response to the modification operation on the target audio content, the target audio content is modified in the display interface of the collaborative creation room.

16. A method for publishing audio content, characterized in that, The method includes: Display the interface corresponding to the second created content, which includes at least two audio contents; In response to a rewriting operation on a third audio content among the at least two audio contents, a rewriting interface corresponding to the third audio content is displayed; In response to a content rewriting operation, the rewritten third audio content is added to the rewriting interface, and / or the text data of the rewritten third audio content is displayed on the rewriting interface. In response to the rewritten content publishing operation, the publishing result of the third-creation content is displayed; wherein, the third-creation content includes the rewritten third audio content, and other audio content other than the third audio content among the at least two audio content.

17. The method according to claim 16, characterized in that, After adding the modified third audio content to the modification interface, the process also includes: Obtain the speech recognition result of the rewritten third audio content by performing speech recognition on the rewritten third audio content; Based on the speech recognition results of the rewritten third audio content, the text data of the third audio content displayed in the rewriting interface is modified so that the text data of the rewritten third audio content can be displayed in the rewriting interface.

18. The method according to claim 16, characterized in that, After displaying the rewritten third audio content text data in the rewriting interface, the method further includes: If the rewritten third audio content is not added in the rewriting interface, then the rewritten third audio content is automatically generated based on the text data of the rewritten third audio content. Add the rewritten third audio content to the rewriting interface; In response to a listening operation on the rewritten third audio content, the rewritten third audio content is played.

19. An audio content publishing device, characterized in that, The device includes: The creation interface display module is used to display the creation interface corresponding to the first audio content. The creation interface displays relevant information about the first audio content, which is the already created audio content. An audio creation module is used to add a second audio content in response to a content addition operation on the creation interface, wherein the first audio content and the second audio content differ in at least one aspect, such as semantic expression, audio melody, or source; the second audio content is audio content selected from a plurality of audio content to be added, or the second audio content is audio content recorded by the user. The creation and publishing module is used to respond to the content publishing operation and display the publishing result of the first created content, which includes the first audio content and the second audio content.

20. An audio content publishing device, characterized in that, The device includes: The creation content display module is used to display the display interface corresponding to the second creation content, which includes at least two audio contents; The rewrite interface display module is used to display the rewrite interface corresponding to the third audio content in response to the rewrite operation of the third audio content among the at least two audio contents. The content rewriting module is used to respond to the content rewriting operation by adding rewritten third audio content to the rewriting interface, and / or displaying the text data of the rewritten third audio content in the rewriting interface. The rewrite and publish module is used to respond to the rewrite content publishing operation and display the publishing result of the third-creation content; wherein, the third-creation content includes the rewritten third audio content, and other audio content other than the third audio content among the at least two audio content.

21. A terminal device, characterized in that, The terminal device includes a processor and a memory, the memory storing at least one program, the at least one program being loaded and executed by the processor to implement the method as claimed in any one of claims 1 to 15, and / or to implement the method as claimed in any one of claims 16 to 18.

22. A computer-readable storage medium, characterized in that, The storage medium stores at least one program, which is loaded and executed by a processor to implement the method as claimed in any one of claims 1 to 15, and / or to implement the method as claimed in any one of claims 16 to 18.

23. A computer program product, characterized in that, The computer program product includes computer instructions that are executed by a processor to implement the method as claimed in any one of claims 1 to 15, and / or to implement the method as claimed in any one of claims 16 to 18.