Electronic device, method and non-transitory storage medium for audio editing

The electronic device addresses long processing times and limited usability in audio editing by using an audio classifier to identify and separate sound sources pre-playback, enabling efficient editing and control of individual sound components during playback.

WO2026142383A1PCT designated stage Publication Date: 2026-07-02SAMSUNG ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SAMSUNG ELECTRONICS CO LTD
Filing Date
2025-12-26
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing audio editing devices require long processing times for separating multiple sound sources from audio files, and editing is only possible during playback, limiting usability.

Method used

An electronic device with an audio classifier module to identify multiple sound sources before playback, allowing editing of individual sound sources on a display, and an audio separator module to separate and control volumes of these sources during playback.

Benefits of technology

Enables pre-playback identification and editing of sound sources, reducing processing time and enhancing editing usability by allowing simultaneous display and control of individual sound components.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025022914_02072026_PF_FP_ABST
    Figure KR2025022914_02072026_PF_FP_ABST
Patent Text Reader

Abstract

The present document relates to an electronic device, method, and non-transitory storage medium for audio editing. According to an embodiment, the electronic device may comprise: a display; at least one processor including processing circuitry; and a memory storing instructions. The instructions, when individually or collectively executed by the at least one processor, may cause the electronic device to: receive a request for individually editing a plurality of sound sources included in audio data; identify, by using an audio classifier module, whether the audio data includes a plurality of types of sound sources designated by the audio classifier module; on the basis of identifying that the audio data includes a first type of sound source and a second type of sound source among the plurality of types of sound sources, display, through the display, a first object indicating the first type of sound source and a second object indicating the second type of sound source on a screen for editing; and on the basis of a user input for the first object, control the volume of a first audio corresponding to the first type of sound source separated by using an audio separation module, wherein the first object and the second object may be displayed on the screen for audio editing before the audio separation module completes separation of the first audio corresponding to the first type of sound source and a second audio corresponding to the second type of sound source in the audio data. Various other embodiments are also possible.
Need to check novelty before this filing date? Find Prior Art

Description

Electronic device, method, and non-temporary storage medium for audio editing

[0001] The present disclosure relates to an electronic device, a method, and a non-transient storage medium for audio editing.

[0002] With the advancement of digital technology, electronic devices are being provided in various forms, such as smartphones, tablet PCs, or PDAs. Electronic devices are also being developed in wearable forms to enhance portability and user accessibility.

[0003] The electronic device utilizes audio separation technology to separate multiple sound sources from an audio file containing multiple sources, enabling the editing of each individual source. The device supports editing by separating sound sources from audio data through an audio editing module that supports audio separation technology, storing each separated source separately, and then utilizing the stored sources. This processing method can result in a long processing time required to perform and save the separation for editing; furthermore, since separation is only possible during playback, there is no information available regarding the separable sources until playback begins, which may reduce editing usability.

[0004] The information described above may be provided as related art for the purpose of aiding understanding of the present disclosure. No claim or determination is made as to whether any of the foregoing may be applied as prior art related to the present disclosure.

[0005] According to one embodiment of the present disclosure, an electronic device comprises a display, at least one processor including a processing circuit, and a memory for storing instructions.

[0006] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device receives a request to individually edit each of the plurality of sound sources included in the audio data.

[0007] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device uses an audio classifier module to determine whether the audio data contains a plurality of sound sources specified by the audio classifier module.

[0008] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device causes a first object representing a first type of sound source and a second object representing a second type of sound source to be displayed on the display for editing on a screen, based on identifying that a first type of sound source and a second type of sound source among the plurality of types are included in the audio data.

[0009] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device controls the volume of the first audio corresponding to the first type of sound source separated using an audio separator module based on user input for the first object. According to one embodiment, the first object and the second object are displayed on the screen for audio editing before the audio separator module completes the separation of the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data.

[0010] According to one embodiment, the method of operation in an electronic device includes receiving a request to individually edit each of a plurality of sound sources included in audio data.

[0011] According to one embodiment, the method includes an operation of using an audio classification module to determine whether the audio data contains a plurality of sound sources of types specified by the audio classification module.

[0012] According to one embodiment, the method includes the operation of displaying a first object representing a first type of sound source and a second object representing a second type of sound source on a screen for editing through the display of the electronic device, based on identifying that a first type of sound source and a second type of sound source among the plurality of types are included in the audio data.

[0013] According to one embodiment, the method includes an operation of controlling the volume of a first audio corresponding to a first type of sound source separated using an audio separation module based on user input for the first object. According to one embodiment, the first object and the second object are displayed on the screen for audio editing before the audio separation module completes the separation of the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data.

[0014] According to one embodiment, in a non-transient storage medium storing one or more programs, the one or more programs include instructions that, when executed by at least one processor of an electronic device, cause the electronic device to execute an operation of receiving a request to individually edit each of a plurality of sound sources included in audio data.

[0015] According to one embodiment, the one or more programs include, when executed by at least one processor of an electronic device, an operation to check whether the audio data contains a plurality of sound sources of types specified by the audio classification module using an audio classification module.

[0016] According to one embodiment, the one or more programs include instructions that, when executed by at least one processor of an electronic device, cause the electronic device to execute an operation of displaying at least one object representing a first type of sound source and a second object representing a second type of sound source on a screen for editing through the display of the electronic device, based on identifying that a first type of sound source and a second type of sound source among the plurality of types are included in the audio data.

[0017] Based on user input regarding a first object among the one or more objects above, the method includes commands to execute an operation to control the volume of a first audio corresponding to a first type of sound source separated using an audio separation module. According to one embodiment, the first object and the second object are displayed on the screen for audio editing before the audio separation module completes the separation of the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data.

[0018] FIG. 1 is a block diagram of an electronic device in a network environment according to various embodiments.

[0019] FIGS. 2a and FIGS. 2b are drawings illustrating examples of configurations for audio editing of an electronic device according to one embodiment.

[0020] FIG. 3 is a diagram illustrating an example of audio classification for an electronic device according to one embodiment.

[0021] FIGS. 4a and FIGS. 4b are drawings illustrating examples of audio separation and editing of an electronic device according to one embodiment.

[0022] FIGS. 5A, FIGS. 5B, and FIGS. 5C are drawings illustrating examples of screens for audio editing in an electronic device according to one embodiment.

[0023] FIG. 6 is a diagram illustrating an example of audio separation and editing of an electronic device according to one embodiment.

[0024] FIG. 7a is a diagram illustrating an example of selecting a plurality of audio classification modules in an electronic device according to one embodiment.

[0025] FIG. 7b is a diagram illustrating an example for selecting a plurality of audio separation modules in an electronic device according to one embodiment.

[0026] FIG. 7c is a diagram illustrating an example of selecting a plurality of audio classification modules and a plurality of audio separation modules in an electronic device according to one embodiment.

[0027] FIG. 8 is a diagram illustrating an example of a method of operation in an electronic device according to one embodiment.

[0028] FIG. 9 is a diagram showing an example of a method of operation in an electronic device according to one embodiment.

[0029] FIG. 10 is a diagram showing an example of a screen for editing audio separated from audio data in an electronic device according to one embodiment.

[0030] In relation to the description of the drawings, the same or similar reference numerals may be used for identical or similar components.

[0031] Hereinafter, embodiments of the present disclosure are described in detail with reference to the drawings so that those skilled in the art can easily implement them. However, the present disclosure may be embodied in various different forms and is not limited to the embodiments described herein. In relation to the description of the drawings, the same or similar reference numerals may be used for identical or similar components. Furthermore, in the drawings and related descriptions, descriptions of well-known functions and configurations may be omitted for clarity and brevity. The term "user" as used in the embodiments of the present disclosure may refer to a person using an electronic device or a device using an electronic device (e.g., an artificial intelligence electronic device).

[0032] FIG. 1 is a block diagram of an electronic device (101) in a network environment (100) according to various embodiments.

[0033] Referring to FIG. 1, in a network environment (100), an electronic device (101) may communicate with an electronic device (102) through a first network (198) (e.g., a short-range wireless communication network) or with at least one of an electronic device (104) or a server (108) through a second network (199) (e.g., a long-range wireless communication network). According to one embodiment, the electronic device (101) may communicate with the electronic device (104) through a server (108). According to one embodiment, the electronic device (101) may include a processor (120), memory (130), input module (150), sound output module (155), display module (160), audio module (170), sensor module (176), interface (177), connection terminal (178), haptic module (179), camera module (180), power management module (188), battery (189), communication module (190), subscriber identification module (196), or antenna module (197). In some embodiments, at least one of these components (e.g., connection terminal (178)) may be omitted from the electronic device (101), or one or more other components may be added. In some embodiments, some of these components (e.g., sensor module (176), camera module (180), or antenna module (197)) may be integrated into a single component (e.g., display module (160)).

[0034] The processor (120) can control at least one other component (e.g., a hardware or software component) of the electronic device (101) connected to the processor (120) by executing software (e.g., a program (140)), and can perform various data processing or operations. According to one embodiment, as at least part of the data processing or operations, the processor (120) can store commands or data received from other components (e.g., a sensor module (176) or a communication module (190)) in volatile memory (132), process the commands or data stored in volatile memory (132), and store the resulting data in non-volatile memory (134). According to one embodiment, the processor (120) may include a main processor (121) (e.g., a central processing unit or an application processor) or an auxiliary processor (123) that can operate independently or together with it (e.g., a graphics processing unit, a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor). For example, if the electronic device (101) includes a main processor (121) and an auxiliary processor (123), the auxiliary processor (123) may be configured to use lower power than the main processor (121) or to be specialized for a designated function. The auxiliary processor (123) may be implemented separately from the main processor (121) or as part thereof.

[0035] The auxiliary processor (123) may control at least some of the functions or states associated with at least one component of the electronic device (101) (e.g., display module (160), sensor module (176), or communication module (190)) on behalf of the main processor (121) while the main processor (121) is in an inactive (e.g., sleep) state, or together with the main processor (121) while the main processor (121) is in an active (e.g., application execution) state. According to one embodiment, the auxiliary processor (123) (e.g., image signal processor or communication processor) may be implemented as part of another functionally related component (e.g., camera module (180) or communication module (190)). According to one embodiment, the auxiliary processor (123) (e.g., neural network processing unit) may include a hardware structure specialized for processing an artificial intelligence model. The artificial intelligence model may be generated through machine learning. Such learning may be performed, for example, on the electronic device (101) itself where the artificial intelligence model is executed, or through a separate server (e.g., server (108)). The learning algorithm may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but is not limited to the examples described above. The artificial intelligence model may include a plurality of artificial neural network layers.An artificial neural network may be a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), a deep Q-network, or a combination of two or more of the above, but is not limited to the examples described above. In addition to the hardware structure, the artificial intelligence model may include a software structure, either additionally or substantially.

[0036] The memory (130) can store various data used by at least one component of the electronic device (101) (e.g., processor (120) or sensor module (176)). The data may include, for example, input data or output data for software (e.g., program (140)) and related commands. The memory (130) may include volatile memory (132) or non-volatile memory (134).

[0037] The program (140) may be stored as software in memory (130) and may include, for example, an operating system (142), middleware (144), or an application (146).

[0038] The input module (150) can receive commands or data to be used for a component of the electronic device (101) (e.g., processor (120)) from outside the electronic device (101) (e.g., user). The input module (150) may include, for example, a microphone, a mouse, a keyboard, a key (e.g., a button), or a digital pen (e.g., a stylus pen).

[0039] The sound output module (155) can output a sound signal to the outside of the electronic device (101). The sound output module (155) may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as multimedia playback or recording playback. The receiver may be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part thereof.

[0040] The display module (160) can visually provide information to an external (e.g., user) of the electronic device (101). The display module (160) may include, for example, a display, a holographic device, or a projector and a control circuit for controlling said device. According to one embodiment, the display module (160) may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of the force generated by said touch.

[0041] The audio module (170) can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module (170) can acquire sound through the input module (150) or output sound through the sound output module (155) or an external electronic device (e.g., electronic device (102)) (e.g., speaker or headphones) connected directly or wirelessly to the electronic device (101).

[0042] The sensor module (176) can detect the operating state of the electronic device (101) (e.g., power or temperature) or the external environmental state (e.g., user state) and generate an electrical signal or data value corresponding to the detected state. According to one embodiment, the sensor module (176) may include, for example, a gesture sensor, a gyroscope sensor, a barometric pressure sensor, a magnetic sensor, an accelerometer sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biosensor, a temperature sensor, a humidity sensor, or an illuminance sensor.

[0043] The interface (177) may support one or more specified protocols that can be used for the electronic device (101) to be connected directly or wirelessly to an external electronic device (e.g., electronic device (102)). According to one embodiment, the interface (177) may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.

[0044] The connection terminal (178) may include a connector through which the electronic device (101) can be physically connected to an external electronic device (e.g., electronic device (102)). According to one embodiment, the connection terminal (178) may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (e.g., a headphone connector).

[0045] The haptic module (179) can convert an electrical signal into a mechanical stimulus (e.g., vibration or movement) or an electrical stimulus that the user can perceive through tactile or kinesthetic senses. According to one embodiment, the haptic module (179) may include, for example, a motor, a piezoelectric element, or an electric stimulation device.

[0046] The camera module (180) can capture still images and video. According to one embodiment, the camera module (180) may include one or more lenses, image sensors, image signal processors, or flashes.

[0047] The power management module (188) can manage the power supplied to the electronic device (101). According to one embodiment, the power management module (188) can be implemented, for example, as at least part of a power management integrated circuit (PMIC).

[0048] The battery (189) can supply power to at least one component of the electronic device (101). According to one embodiment, the battery (189) may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.

[0049] The communication module (190) can support the establishment of a direct (e.g., wired) communication channel or a wireless communication channel between an electronic device (101) and an external electronic device (e.g., electronic device (102), electronic device (104), or server (108)), and the performance of communication through the established communication channel. The communication module (190) may include one or more communication processors that operate independently of the processor (120) (e.g., application processor) and support direct (e.g., wired) communication or wireless communication. According to one embodiment, the communication module (190) may include a wireless communication module (192) (e.g., cellular communication module, short-range wireless communication module, or GNSS (global navigation satellite system) communication module) or a wired communication module (194) (e.g., LAN (local area network) communication module, or power line communication module). The corresponding communication module among these communication modules can communicate with an external electronic device (104) through a first network (198) (e.g., a short-range communication network such as Bluetooth, WiFi (wireless fidelity) direct, or IrDA (infrared data association)) or a second network (199) (e.g., a legacy cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., a LAN or WAN)). These various types of communication modules may be integrated into a single component (e.g., a single chip) or implemented as multiple separate components (e.g., multiple chips). The wireless communication module (192) can identify or authenticate the electronic device (101) within a communication network such as the first network (198) or the second network (199) using subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module (196).

[0050] The wireless communication module (192) can support 5G networks and next-generation communication technologies following 4G networks, for example, new radio access technology. NR access technology can support high-speed transmission of high-capacity data (enhanced mobile broadband (eMBB)), minimization of terminal power and connection of multiple terminals (massive machine type communications (mMTC)), or high reliability and low latency (ultra-reliable and low-latency communications (URLLC)). The wireless communication module (192) can support a high-frequency band (e.g., mmWave band) to achieve a high data transmission rate, for example. The wireless communication module (192) can support various technologies for securing performance in the high-frequency band, such as beamforming, massive MIMO (multiple-input and multiple-output), full-dimensional MIMO (FD-MIMO), array antenna, analog beam-forming, or large-scale antenna. The wireless communication module (192) can support various requirements specified in the electronic device (101), external electronic device (e.g., electronic device (104)), or network system (e.g., second network (199)). According to one embodiment, the wireless communication module (192) can support a Peak data rate (e.g., 20 Gbps or more) for realizing eMBB, loss coverage (e.g., 164 dB or less) for realizing mMTC, or U-plane latency (e.g., downlink (DL) and uplink (UL) each 0.5 ms or less, or round trip 1 ms or less) for realizing URLLC.

[0051] An antenna module (197) can transmit a signal or power to or from an external source (e.g., an external electronic device). According to one embodiment, the antenna module (197) may include an antenna comprising a radiator made of a conductor or a conductive pattern formed on a substrate (e.g., a PCB). According to one embodiment, the antenna module (197) may include a plurality of antennas (e.g., an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network, such as a first network (198) or a second network (199), may be selected from the plurality of antennas, for example, by a communication module (190). A signal or power may be transmitted or received between the communication module (190) and an external electronic device through the selected at least one antenna. According to some embodiments, in addition to the radiator, other components (e.g., a radio frequency integrated circuit (RFIC)) may be additionally formed as part of the antenna module (197).

[0052] According to various embodiments, the antenna module (197) may form a mmWave antenna module. According to one embodiment, the mmWave antenna module may include a printed circuit board, an RFIC disposed on or adjacent to a first surface (e.g., bottom surface) of the printed circuit board and capable of supporting a specified high frequency band (e.g., mmWave band), and a plurality of antennas (e.g., array antennas) disposed on or adjacent to a second surface (e.g., top surface or side surface) of the printed circuit board and capable of transmitting or receiving a signal of the specified high frequency band.

[0053] At least some of the above components can be connected to each other via a communication method between peripheral devices (e.g., bus, GPIO (general purpose input and output), SPI (serial peripheral interface), or MIPI (mobile industry processor interface)) and exchange signals (e.g., commands or data) with each other.

[0054] According to one embodiment, commands or data may be transmitted or received between the electronic device (101) and an external electronic device (104) through a server (108) connected to a second network (199). Each of the external electronic devices (102, or 104) may be the same or different type of device as the electronic device (101). According to one embodiment, all or part of the operations performed on the electronic device (101) may be performed on one or more of the external electronic devices (102, 104, or 108). For example, if the electronic device (101) needs to perform a function or service automatically or in response to a request from a user or another device, the electronic device (101) may request one or more external electronic devices to perform at least part of the function or service instead of performing the function or service itself or additionally. One or more external electronic devices that receive the above request may execute at least part of the requested function or service, or additional function or service related to the request, and transmit the result of the execution to the electronic device (101). The electronic device (101) may provide the result as is or additionally processed as at least part of the response to the request. For this purpose, for example, cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology may be used. The electronic device (101) may provide ultra-low latency services using, for example, distributed computing or mobile edge computing. In another embodiment, the external electronic device (104) may include an Internet of Things (IoT) device. The server (108) may be an intelligent server using machine learning and / or neural networks. According to one embodiment, the external electronic device (104) or the server (108) may be included within a second network (199).The electronic device (101) can be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.

[0055] FIGS. 2a and 2b are drawings showing examples of configurations for audio editing of an electronic device according to one embodiment, FIG. 3 is a drawing showing an example of audio classification of an electronic device according to one embodiment, FIGS. 4a and 4b are drawings showing examples of audio separation and editing of an electronic device according to one embodiment, and FIGS. 5a, 5b, and 5c are drawings illustrating examples of screens for audio editing of an electronic device according to one embodiment.

[0056] Referring to FIGS. 1, FIGS. 2a, FIGS. 2b, FIGS. 3, FIGS. 4a, FIGS. 4b, FIGS. 5a, FIGS. 5b, and FIGS. 5c, an electronic device (101) according to one embodiment (e.g., the electronic device (101) of FIG. 1) may include at least one processor (e.g., the processor (120) of FIG. 1), memory (e.g., the memory (130) of FIG. 1), display (e.g., the display module (160) of FIG. 1), audio circuit (e.g., the audio module (170) of FIG. 1), communication circuit (e.g., the communication module (190) of FIG. 1), and speaker (e.g., the sound output module (155) of FIG. 1). Without being limited thereto, the electronic device (101) may be implemented identically or similarly to the electronic device (101) of FIG. 1 and may further include other components of the electronic device (101) of FIG. 1. In addition, the electronic device (101) may include other components necessary for the method of operation of the present disclosure. It may be configured to include additional elements. According to one embodiment, the electronic device (101) may be a device in the form of an on-device that stores an artificial intelligence model in memory (130). According to one embodiment, the electronic device (101) may perform audio editing using an audio separation function (e.g., a solution or program) and an audio classification function. Here, the audio separation function and the audio classification function may result in inconsistencies in the processed information due to performance differences. The audio classification function may be a function with a faster processing speed than the audio separation function.

[0057] According to one embodiment, the electronic device (101) may include an audio editing module (201) comprising an audio classification module (230) (e.g., an audio classifier) ​​for processing an audio classification function and an audio separation module (240) (e.g., an audio separator) for processing an audio separation function. For example, the audio editing module (201) may be a module (e.g., a function, application, or program) included in memory (130) and may further include other modules (e.g., a function or program) required for audio editing in addition to the audio classification module (230) and the audio separation module (240).

[0058] According to one embodiment, the processor (120) of the electronic device (101) can perform editing on audio data input for editing through the audio buffer (221, 223) via the audio decoder (210) using the audio classification module (230) and the audio separation module (240). According to one embodiment, the electronic device can obtain information on the types and segments of the entire audio data output from the audio decoder (210) by outputting the entire audio data and transmitting it as input to the audio classification module (230), or it can obtain audio information on the corresponding audio data by outputting a part of it and storing it in the audio buffer, or by outputting the audio data in part and storing it in the audio buffer, and obtaining audio information on the corresponding audio data through the audio classification module (240).

[0059] According to one embodiment, the processor (120) can use an audio classification module (230) to repeatedly check whether a sound source is detected (e.g., present) for each of the first sound source types for each of the entire time intervals (t1 to tn) of the audio data (310-1, 310-2, 310-3, …, 310n), and based on the result of the check, the detected sound source information (e.g., type of detected sound source) and interval information for the specified time interval can be stored in the memory (130). According to one embodiment, when the processor (120) detects at least one sound source corresponding to each of at least one type in the audio data, it checks a probability value (e.g., a score indicating detection reliability) for each of the at least one sound source that is detected, and if the confirmed probability value is greater than a threshold value, it can confirm that the sound source is detected (e.g., included in the audio data). If the confirmed probability value is less than or equal to the threshold value, the processor (120) can detect it as noise. The plurality of types designated to be classifiable (e.g., detected) in the audio classification module (230) are sound source types that can be independently detected in a specified number according to the performance of the audio classification module (230), and may include at least one of voice, music, wind, noise, laugh, and / or crowd, and may include noise as an other type. The sound source types are not limited thereto and may be designated by replacing or adding other forms of sound sources.

[0060] FIGS. 4a and FIGS. 4b are drawings illustrating examples of audio separation and editing of an electronic device according to one embodiment.

[0061] Referring to FIGS. 4a and 4b, a processor (120) of an electronic device (101) according to one embodiment can separate audio data being played using an audio separation module (240). A plurality of types separable by the audio separation module (240) may be specified according to the performance of the audio separation module (240), and may include, for example, voice (411) (voice audio data), music (412) (music audio data), wind (413) (wind audio data) and / or other types (414) (others audio data). Examples of sound source types are not limited thereto and may be specified by replacing or adding other forms of sound sources. Due to differences in the performance of the audio classification module (230), the type and number of audio (e.g., sound source) separated by the audio separation module (240) may not match the type and number of sound sources detected by the audio classification module (230). For example, the audio separation module (240) may specify a number of separable sound source types (e.g., audio types) that is smaller than the number of sound source types specified in the audio classification module (230). According to one embodiment, the processor (120) may separate audio from the entire audio data at specified time intervals (e.g., 2 seconds) through the audio separation module (240). According to one embodiment, the audio separation module (240) may output the audio separated from the audio data to the audio channels of specified sound source types (e.g., audio types) regardless of whether there is a separated result, and if there is no separated result, it may output a value of 0. Here, the remaining data excluding separable sound sources from the audio data may be separated and output as other audio data. Not limited thereto, the result of the audio separation module (240) may be changed.For example, the audio separation module (240) may separate and output only the separable audio identified in the audio data based on information regarding separable sound sources identified in the audio classification module (230), and may provide functions for controlling or editing the separated audio. In order to separate audio from the audio data based on information regarding separable sound sources identified in the audio classification module (230), the electronic device (101) may process the model information used by the audio separation module (240) by applying it differently at the necessary time.

[0062] According to one embodiment, the processor (120) can mix the separated audios reflecting the editing results through an audio mixer (430) and then output the mixed audio data through an audio renderer (440) and an encoder (450) to a speaker (460).

[0063] FIGS. 5A, FIGS. 5B, and FIGS. 5C are drawings illustrating examples of screens for audio editing in an electronic device according to one embodiment.

[0064] Referring to FIG. 5a, according to one embodiment, when a processor (120) receives a request to execute an audio editing function (e.g., an application related to audio editing), it displays an execution screen (510) for audio editing on a display (161) and can check the audio data to be edited. When an editing menu (511) (e.g., a button or object) included in the execution screen (510) is selected (e.g., when a request is received to individually edit each of the multiple sound sources included in the audio data), the processor (120) can check whether the audio data contains multiple types of sound sources through an audio classification module (230). Information regarding the sound source based on the result of the check and information regarding the time interval for checking whether multiple types of sound sources are included can be stored in memory (130). According to one embodiment, when the processor (120) receives a selection input for an edit menu (511) (e.g., a button or object) included in the execution screen (510), it may display a screen (520) for audio classification processing while obtaining information about a separable sound source through the audio classification module (230) before playing audio data.

[0065] According to one embodiment, the processor (210) can display a first screen (530) for separating and editing audio data through an audio separation module (240) when playing audio data, based on information about the sound source and time interval detected by the audio classification module (230). According to one embodiment, when the processor (210) receives input from an audio editing button (e.g., an audio eraser button) included in the execution screen (510), it obtains information about the time interval in which the first sound source (e.g., voice) is detected in the audio data based on identifying the first sound source (e.g., voice) of the first type (e.g., voice type) among a plurality of types in the audio data through the audio classification module (230), and while playing the audio data, it can display on the display (161) a first object (531a) representing the first sound source of the first type (e.g., one of a plurality of types) and a second object (531b) representing other types in a specific time interval (e.g., one interval from t1 to tn). Here, other types may be named as noise, but are not limited thereto, and may be named as others or a name designated by the user. For example, the processor (120) may display on the first screen (530) additional objects representing other types of sound sources, in addition to the first object (531a) representing a first type of first sound source and the second object (531b) representing other types (e.g., noise), in order to edit sound sources separated from the audio separation module (240). For example, as illustrated in FIG. 5b, objects (531a, 531b, 531c and / or 531d) representing types of voice, noise, music and / or wind, respectively, may be displayed on the first screen (530).As illustrated in FIG. 5a, the first screen (530) may include information (533) about a time interval for a sound source, information (535) about the audio data playback time, an object (537) for adjusting the volume, and / or an automatic edit button (539).

[0066] Referring to FIG. 5b, according to one embodiment, the processor (120) can check in advance the types of sound sources for audio that can be separated by time intervals in the entire audio data based on information about sound sources and information about time intervals, so that objects representing the types of sound sources corresponding to the audio that can be separated in the audio data can be displayed on the first screen (530). For example, if it is confirmed that among a plurality of types in a time interval, sound source types for voice, music, wind and / or noise are detected in the audio data using the audio classification module (230), the processor (120) can display objects (531a, 531b, 531c and / or 531d) representing the types of voice, music, wind and / or other types for the detected sound sources on the first screen (530) for editing when playing the audio data. The processor (120) may display information about time intervals (533a, 533b, 533c and / or 533d). According to one embodiment, if the information about a classified sound source identified (e.g., detected) by the audio classification module (230) is the same as the information about separable audio sources (e.g., audio channel) provided by the audio separation module (240), the processor (120) may provide an object (e.g., editing UI) for controlling or editing each separable sound source to the first screen (530) for editing, thereby controlling and editing the audio sources (e.g., audio channels) separated through audio separation in a 1:1 mapping.

[0067] According to one embodiment, the processor (120) may perform editing (e.g., volume control) on audio separated from audio data corresponding to a detected sound source corresponding to the selected object, based on user input selecting one of the first objects (531a, 531b, 531c, and 531d) displayed on the first screen (530). The processor (120) may apply visual effects corresponding to the result of the editing (e.g., volume control) to the selected object. According to one embodiment, the processor (120) may apply visual effects (e.g., volume control numbers and graphs) indicating the result of the volume control to the objects (531a, 531b, 531c, and 531d) representing the sound sources.

[0068] According to one embodiment, if a second sound source of a second type (e.g., music type) is not detected in a first time interval among a plurality of types by an audio classification module (230) and the second audio of the second type is separated and does not match using an audio separation module (240), the processor (120) does not display an object (531c) corresponding to the second type for volume control of the second audio of the second type, maps the second audio of the second type to a noise channel, and receives user input for a second object (531b) representing noise, the volume of the second audio and the volume of the third audio corresponding to a sound source of another type (e.g., second type) can be controlled. According to one embodiment, the processor (120) may obtain information about separable sound sources identified through the audio classification module (230) from memory (130) or an external electronic device (e.g., cloud) before starting audio editing in order to improve editing usability by reducing the waiting time before audio editing through the audio classification module (230), which has a faster processing time than the audio separation module (240). If the information about the separable sound sources obtained is the same as the separable audio from the audio separation module (240), the processor (120) may control / edit the output of the audio separation module (240) based on the information about the separable sound sources obtained. For example, if the information about the separable sound sources obtained from the audio does not correspond to the separable audio provided by the audio separation module (240), the processor may determine whether the information can be separated into noise and matched to other audio data of the audio separation module (240) to control / edit, thereby providing a control / editing function.

[0069] According to one embodiment, the processor (120) detects a third sound source of a third type (e.g., laughter type) detected in a first time interval among a plurality of types by the audio classification module (230), and if the third type does not correspond to the types of sound sources separable by the audio separation module (240), the processor (120) separates the third sound source into noise using the audio separation module, and can control the volume of the fourth audio corresponding to the third sound source and the third audio corresponding to the noise together based on user input for a second object representing the noise. Since the sound sources separable by the audio separation module (240) are audio data required for audio editing, the processor (120) can provide objects for control or editing corresponding to each of the separable sound sources on the screen for editing, and since other audio data, which are sound sources excluding the separable sound sources, are separated into noise, the processor (120) can provide an object for the noise on the screen.

[0070] According to one embodiment, if the processor (120) identifies that a third type of third sound source is not separated by the audio separation module (240) and is the dominant sound source in the audio data, it may display a third object representing the third sound source on the display (161). According to one embodiment, the processor (120) may individually control the volume of a fourth audio corresponding to the third sound source based on user input regarding the third object. For example, only the sound of a dog barking may be detected by the audio classification module (230), and information about the sound of a dog barking may be displayed on a second object (531b) representing the noise of the audio separation module (240).

[0071] According to one embodiment, when the types of separable sound sources identified by the audio classification module (230) are fewer than the types of sound sources separable by the audio separation module (240), the processor (120) may display a noise UI or an object corresponding to a specific UI on the first screen (530) for editing to control the audio channels of the separable sound sources not detected during audio separation, as a UI is absolutely necessary for controlling sound sources not detected during audio classification, and since the probability of separable sound sources not detected during audio classification being separated and output is very low even during audio separation. For example, when the first sound source (e.g., voice) is not detected, the processor (120) may display the voice channel of the audio separation as an object representing noise on the first screen for editing, as the probability of the first sound source existing in the audio is very low. The processor (120) may indicate that the likelihood of separation is low in the UI by displaying a different color or applying a different icon to the separable sound sources not detected during audio classification. For example, when the first sound source (e.g., voice) is not detected (e.g., when the score indicating the detection reliability of the first sound source in the first time interval is lower than a specified threshold value), the processor (120) may apply a visually distinguishable effect to a first object representing the first sound source (e.g., an object representing voice (531a)) to indicate that the probability of the first sound source being detected is low.

[0072] Referring to FIG. 5c, according to one embodiment, the processor (120) may display the control result (or edit result) of performing a control / editing function on the output of the audio separation module (240) based on information about the acquired separable sound source by applying a visual effect. For example, a visual effect (541) (e.g., numbers and / or graphs) representing the control result according to volume control of the first audio for the first sound source (e.g., voice) may be applied to the first object (531a). For example, a visual effect (543) (e.g., numbers and / or graphs) representing the control result according to volume control of the first audio for the second sound source (e.g., music) may be applied to the first object (531c). For example, a visual effect (545) (e.g., numbers and / or graphs) representing the control result according to volume control of the first audio for the third sound source (e.g., wind) may be applied to the first object (531d).

[0073] FIG. 6 is a diagram illustrating an example of audio separation and editing of an electronic device according to one embodiment. Referring to FIG. 6, according to one embodiment, a processor (120) of an electronic device (101) may obtain information and section information regarding sound sources detected in sampling sections (611, 612 and / or 613) by sampling the entire audio data in an audio classification module (230) to prevent the waiting time before editing from becoming longer as the processing time increases when the length of the audio data is longer than a specified length. Based on the information and section information regarding sound sources detected in sampling sections (611, 612 and / or 613), the processor (120) may reduce the waiting time before editing. When the processor (120) performs audio classification through the audio classification module (230) using only a part rather than the whole audio through sampling, it may perform audio control / editing based on a sampling ratio (e.g., probability) for a part of the audio so as not to miss information about separable sound sources that may exist in the part of the audio. When performing audio control / editing based on a sampling rate (e.g., probability) for a portion of audio, the processor (120) may display an editing screen (e.g., the first screen (530) of FIG. 5a, 5b, and 5c) that includes objects corresponding to all separated sound sources output from the audio separation module (240) (e.g., editing UI) instead of including objects corresponding to sound sources separated based on information about sound sources separable by the audio classification module (240) when the sampling rate (e.g., probability) for a portion of audio is below a reference value. The processor (120) may display an object (e.g., an icon or a visual effect that displays a different color on the object of the sound source) indicating that the information about the sound sources detected by the audio classification module (230) may be probabilistically unreliable on the editing screen (e.g., the first screen (530) of FIG. 5a, 5b, and 5c).

[0074] According to one embodiment, if the information regarding the time interval confirmed by the audio classification module (230) is different from the interval information confirmed by the audio separation module (240), the processor (120) may change the information regarding the different time intervals, display the changed (e.g., updated) oral information, and display a visual effect or object indicating that the interval information is different on the first screen.

[0075] According to one embodiment, when the processor (210) selects an option when playing audio data, the audio waveform separated through the audio separation module (240) may additionally display on the first screen. Here, the audio waveform may be an audio waveform for a corresponding time interval based on interval information, and may include a time-axis audio waveform and / or a frequency-axis audio waveform.

[0076] According to the present disclosure, the electronic device can solve the problem that when editing the sound sources separated during audio separation through the audio separation module (240) using the result information of the audio classification module (240), the results for the separable sound sources may differ due to different solutions. For example, when voice information is not detected during audio classification but a sound source for voice is separated and output during audio separation, if the information for separable sound sources is provided solely based on the audio classification result information and only the editing of said information is allowed, the problem of being unable to control the audio data separated and output through actual audio separation can be solved. Furthermore, if voice is not detected in the audio classification result and an object corresponding to the voice is not exposed, the problem of being unable to edit the audio data when voice output occurs through audio separation, and thus being unable to control all audio data separated and output to the user, can be solved. For example, the electronic device can prevent an error in which a voice sound is output even though the volume of the audio editing UI displayed to the user has been changed to 0.

[0077] According to one embodiment, the processor (120) may display all separable sound source information in the corresponding audio data or only dominant sound source information for the results of applying one or more audio classification modules (230). Additionally, the processor (120) may provide a UI to allow the user to select an audio classification module (230) capable of detecting specific sound sources, and after obtaining the information from the audio classification module (230), apply an audio separation module (240) capable of separating the sound sources to provide editing to the user.

[0078] FIG. 7a is a diagram illustrating an example of selecting a plurality of audio classification modules in an electronic device according to one embodiment.

[0079] Referring to FIG. 7a, according to one embodiment, when there are multiple audio classification modules (230), the electronic device (101) may display a screen (e.g., UI or object) for selecting an audio classification module to perform detection (e.g., checking for existence) of a specific sound source among the multiple audio classification modules (230-1, 230-2, …, 230-n). The electronic device (101) may select an audio classification module (e.g., a second audio classification module (230-2)) to check whether a specific sound source is detected among the multiple audio classification modules (230-1, 230-2, …, 230-n) through the screen. Each of the multiple audio classification modules (230-1, 230-2, …, 230-n) may be assigned a type of sound source to be detected. According to one embodiment, the electronic device (101) separates all sound sources that are not detected by the audio classification module (230) in the audio data into other types (e.g., noise) by the audio separation module (240) so that there is no case where control or editing of the audio does not occur because the separated sound source is not detected during audio classification and the editing UI is not exposed to the user during audio separation, and displays an object (e.g., editing UI) corresponding to the other types (e.g., noise) on a screen for editing (e.g., the first screen (530) of FIG. 5a, 5b and 5c) so that all sound sources that are not detected are controlled or edited.

[0080] FIG. 7b is a diagram illustrating an example for selecting a plurality of audio separation modules in an electronic device according to one embodiment.

[0081] Referring to FIG. 7b, according to one embodiment, when there are multiple audio separation modules (240), the electronic device (101) can separate specific sound sources by selecting an audio separation module (e.g., a second audio separation module (240-2)) among the multiple audio separation modules (240-1, 240-2, …, 240-n) and separate the sound sources, and configure and display a first screen for editing. Each of the multiple audio separation modules (240-1, 240-2, …, 240-n) can process designated audio channels (e.g., second sound source types) or designate and process sub-channels (e.g., detailed sound source types) of each audio channel. For example, if sub-channels are included in an audio channel corresponding to music, the first sub-channel (e.g., piano) can be processed in the second audio separation module 1 (240-1), the second sub-channel (e.g., guitar) can be processed in the second audio separation module 2 (240-2), and the nth sub-channel (e.g., drum) can be processed in the nth audio separation module (240-n). According to one embodiment, when the audio applied to the audio classification module (230) is sampled at a rate greater than a certain ratio to improve processing time, the electronic device (101) can display objects (e.g., editing UI) for all separable sound sources that may exist in a part of the audio on a screen for editing (e.g., the first screen (530) of FIG. 5a, 5b and 5c) so as not to miss information about separable sound sources that may exist in a part of the audio, even if there are sound sources that are not included in the sampled audio data. Accordingly, this can resolve the problem where accurate information regarding separated sound sources included in the audio cannot be conveyed if the reliability of the result is low due to sampling exceeding a certain threshold, as the electronic device relies solely on the audio classification results and thus uses only the audio classification results, thereby reducing the reliability of the separable sound source information included in the entire audio.

[0082] FIG. 7c is a diagram illustrating an example of selecting a plurality of audio classification modules and a plurality of audio separation modules in an electronic device according to one embodiment.

[0083] Referring to FIG. 7c, according to one embodiment, when the electronic device has a plurality of audio classification modules (230) and a plurality of audio separation modules (240), the device can separate sound sources by selecting an audio separation module (e.g., the first audio classification module (240-1)) from among a plurality of audio separation modules (240-1, 240-2, ..., 240-n) capable of separating a specific sound source detected by an audio classification module (e.g., the first audio classification module (230-1)) selected from among the plurality of audio classification modules (230-1, 230-2, ..., 230-n), and can configure and display a first screen for editing. If there are two or more selected audio classification modules, two or more audio separation modules can be selected. For example, an audio separation module capable of separating sound sources separable for each audio classification module (230) can be predetermined.

[0084] According to one embodiment, the processor (120) may be a hardware module or a software module (e.g., an application program) and may be a hardware component (function) or a software element (program) comprising at least one component provided in the electronic device (101). According to one embodiment, the processor (120) may include, for example, one or more combinations of hardware, software, or firmware. The processor (120) may be configured to omit at least some of the components or to include additional components for performing image processing operations in addition to the components.

[0085] An electronic device according to one embodiment (e.g., the electronic device (101) of FIG. 1) may implement a software module related to audio editing (e.g., the program (140) of FIG. 1). The memory of the electronic device (e.g., the memory (130) of FIG. 1) may store instructions (e.g., instructions) to implement the software module. At least one processor (e.g., the processor (120) of FIG. 1) may execute the instructions stored in memory to implement the software module and may control hardware associated with the function of the software module (e.g., the sensor module (176) of FIG. 1, the camera module (180), the communication module (190) of FIG. 1, the display module (160) of FIG. 1).

[0086] A software module of an electronic device (101) according to one embodiment may be configured to include a kernel (or HAL), a framework (e.g., middleware (144) of FIG. 1), and an application (e.g., application (146) of FIG. 1). At least some of the software modules may be preloaded onto the electronic device (101) or downloadable from a server (e.g., server (108)).

[0087] According to one embodiment, the kernel may include, for example, a system resource manager or a device driver, but may be configured to include other modules, not limited thereto. The system resource manager may perform control, allocation, or reclamation of system resources. The device driver may include, for example, a display driver, a camera driver, a Bluetooth driver, a shared memory driver, a USB driver, a keypad driver, a WIFI driver, an audio driver, or an IPC (inter-process communication) driver.

[0088] According to one embodiment, the framework may provide functions commonly required by the application, or provide various functions to the application through an application programming interface (API) (not shown) so that the application can efficiently use limited system resources within the electronic device (101). The framework may include modules that form combinations of various functions of the components. The framework may provide modules specialized for each type of operating system to provide differentiated functions. The framework may dynamically delete some existing components or add new components.

[0089] According to one embodiment, the application may be configured to include an application for audio editing (e.g., a module, a manager, or a program). The application may include an application received from an external electronic device (e.g., a server (108) or an electronic device (102, 104)). According to one embodiment, the application may include a preloaded application or a third-party application downloadable from a server. The components of the software module and the names of the components according to the illustrated embodiments may vary depending on the type of operating system. According to one embodiment, at least a portion of the software module may be implemented as software, firmware, hardware, or a combination of at least two of these. At least a portion of the software module may be implemented (e.g., executed) by a processor (e.g., AP). At least a portion of the software module may include, for example, a module, a program, a routine, a set of instructions, or a process for performing at least one function.

[0090] As such, in one embodiment, the main components of the electronic device (101) of FIG. 1 have been described. However, in various embodiments, the components illustrated in FIG. 1 are not all essential components, and the electronic device (101) may be implemented with more components than those illustrated, or with fewer components. Additionally, the positions of the main components of the electronic device (101) described above in FIG. 1 may be changed according to various embodiments.

[0091] According to one embodiment, an electronic device (e.g., electronic device (101) of FIG. 1) may include a display (e.g., display module (160) of FIG. 1, display (161) of FIG. 5a, FIG. 5b and FIG. 5c), at least one processor (e.g., processor (120) of FIG. 1) including a processing circuit, and a memory (e.g., memory (130) of FIG. 1) for storing instructions.

[0092] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may receive a request to individually edit each of the plurality of sound sources included in the audio data.

[0093] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may use an audio classifier module (e.g., the audio classifier module (230) of FIG. 2a) to determine whether the audio data contains a plurality of sound sources specified by the audio classifier module.

[0094] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may be able to display a first object representing a first type of sound source and a second object representing a second type of sound source on a screen for audio editing through the display, based on identifying that a first type of sound source and a second type of sound source among the plurality of types are included in the audio data.

[0095] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may control the volume of the first audio corresponding to the first type of sound source separated using an audio separator module based on user input for the first object.

[0096] According to one embodiment, the first object and the second object may be displayed on the screen for audio editing before the audio separation module completes the separation of the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data.

[0097] According to one embodiment, based on identifying that the audio data includes the first type of sound source and the second type of sound source among the plurality of types, the audio separation module may be used to separate the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data.

[0098] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may cause information about the time interval in which the first audio is detected to be displayed on the screen through the display.

[0099] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to control the volume of the second audio corresponding to the second type of sound source and the volume of the third audio corresponding to the third type of sound source based on user input for the second object, based on confirming that the third type of sound source among the plurality of types is not detected by the audio classification module and confirming that the third audio corresponding to the third type of sound source is at least partially separated using the audio separation module.

[0100] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may cause a visually distinguishable effect to be applied to the screen on the first object based on confirming that a score indicating the detection reliability of the first type of sound source is lower than a specified threshold value.

[0101] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may be able to apply a visual effect indicating a controlled volume of the first audio to the first object.

[0102] According to one embodiment, the plurality of types may be sound source types that can be classified through the audio classification module in the audio data.

[0103] According to one embodiment, the audio separation module may specify a number of separable audio types smaller than the number of types specified by the audio classification module, and the plurality of types may include sound source types corresponding to the plurality of audio types.

[0104] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to detect the third type of sound source among the plurality of types by the audio classification module, and based on the fact that the third type does not correspond to a type separable by the audio separation module, to separate the third audio corresponding to the third type of sound source into the second type using the audio separation module, and to control the volume of the third audio corresponding to the third type of sound source based on user input for the second object.

[0105] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to display a third object representing the third type of sound source on the screen through the display based on identifying that the third type of sound source is a dominant sound source in the audio data, and to control the volume of the third audio corresponding to the third type of sound source based on user input regarding the third object.

[0106] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may cause waveform information for sound sources separable from the audio data to be displayed on the screen through the display.

[0107] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to check whether the plurality of sound sources specified by the audio classification module are included in the section sampled from the audio data through the audio classification module.

[0108] FIG. 8 is a diagram illustrating an example of an operation method in an electronic device according to one embodiment. In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.

[0109] Referring to FIG. 8, an electronic device according to one embodiment (e.g., the electronic device (101) of FIG. 1) can check audio data based on receiving a request to individually edit each of the plurality of sound sources included in the audio data in operation 801. The audio data may be stored in memory. According to one embodiment, when the electronic device receives a request to execute an audio editing function (e.g., an application related to audio editing), it displays an execution screen for audio editing (e.g., the execution screen (510) of FIG. 5a) on a display and can check the audio data to be edited through the execution screen.

[0110] In operation 803, the electronic device can check whether the audio data contains multiple types of sound sources specified by the audio classification module through an audio classification module (e.g., the audio classification module (230) of FIG. 2a), obtain information about the detected sound source (e.g., type of sound source) according to the result of the check, and store the obtained information about the sound source in memory (130).

[0111] In operation 805, the electronic device can obtain information about a time interval (e.g., a first time interval) in which a first type of sound source is detected in the audio data, based on identifying a first type of sound source among a plurality of types in the audio data. The first type is a classifiable sound source type designated through the audio classification module and may be at least one of voice, music, wind, noise, laugh, or crowd. The electronic device can identify at least one classifiable sound source (e.g., voice) corresponding to each of at least one sound source type among the plurality of types in the first time interval of the audio data. The electronic device may classify sound sources that are not separated into independently classifiable sound sources as other types.

[0112] In operation 807, the electronic device may display on a display (e.g., the display module (160) of FIG. 1 or the display (161) of FIG. 5A, FIG. 5B, and FIG. 5C) a first screen for editing (e.g., the first screen (530) of FIG. 5A, FIG. 5B, and FIG. 5C) comprising a first object representing a first type of sound source identified in a first time interval (e.g., the object representing the voice (531a) of FIG. 5A, FIG. 5B, and FIG. 5C) and a second object representing a sound source of another type (e.g., the second type) (e.g., the second object (531b) of FIG. 5A, FIG. 5B, and FIG. 5C). Here, the first time interval may refer to a interval in which an audio separation operation is performed at a specified time interval within the total playback time of the audio data. The electronic device can identify (e.g., detect) a sound source of at least one of a plurality of types (e.g., types for designated separable sound sources) in a first time interval of audio data. When the electronic device identifies a sound source of the first type (e.g., speech) among the plurality of types, it can display a first object representing the sound source of the first type (e.g., an object representing speech) and a second object representing other identified types (e.g., second types), excluding the classifiable sound sources, on a first screen for editing. The electronic device can display information regarding the time interval in which the first audio was detected on the first screen. Here, the other types (e.g., second types) may be named "noise," but is not limited thereto, and may be named "others" or a name designated by the user. For example, to edit the sound sources separated by the audio separation module, the electronic device may additionally display an object representing a different type of sound source on the first screen, in addition to the first object and the object representing other types (e.g., noise) (e.g., the second object).For example, objects representing types of voice, noise, music, and / or wind (e.g., objects of FIG. 5b (531a, 531b, 531c and / or 531d)) may be displayed on the first screen. The first screen may include information about a time interval for a sound source (e.g., information about a time interval in FIG. 5a (533)), information about an audio data playback time (e.g., information about an audio data playback time in FIG. 5a (535)), an object for adjusting the volume (e.g., an object for adjusting the volume in FIG. 5a (537)), and / or an auto-edit button (e.g., an auto-edit button in FIG. 5a (539)).

[0113] In operation 809, the electronic device may control the volume of a first audio (e.g., voice) corresponding to a separated first type of sound source using an audio separation module (e.g., audio separation module (240) of FIG. 2b) based on user input for a first object.

[0114] An electronic device according to one embodiment may use an audio classification module to repeatedly check for the detection (e.g., presence) of each type of sound source that can be classified by intervals (t1 to tn) in the entire time of the audio data at specified time intervals (t), and, based on the result of the check, store information about at least one detected sound source (e.g., types of sound sources detected for each time interval) and information about time intervals (e.g., interval information) for each time interval in which at least one sound source is detected in a memory (e.g., memory (130) of FIG. 1). The specified plurality of types are types of sound sources that can be detected (e.g., classified) in a specified number according to the performance of the audio classification module, and may include voice, music, wind, laugh, and / or crowd, and may include noise as an other type. The types of sound sources described in this disclosure are described as examples and are not limited thereto, and may be designated by replacing or adding other types of sound sources. According to one embodiment, the audio separation module may pre-specify a number of sound source types (e.g., audio types) smaller than the number of sound source types specified in the audio classification module according to function (e.g., performance). The sound source types described in this disclosure are described as examples and are not limited thereto, and some of the specified sound source types may be excluded, replaced with, or added to other forms of sound sources.

[0115] According to one embodiment, when an electronic device detects a classifiable sound source (e.g., checks whether sound sources exist), it checks a probability value (e.g., a score indicating detection reliability) of the existence of the sound source, and if the confirmed probability value is greater than a threshold value, it can confirm that a classifiable sound source has been detected (e.g., exists). The electronic device can detect a sound source in which the confirmed probability value is less than or equal to the threshold value as noise.

[0116] According to one embodiment, when a third type of sound source (e.g., music) is not detected as a specific type in a first time interval among a plurality of types by an audio classification module, and a third audio (e.g., music) corresponding to the third type of sound source is separated using an audio separation module, the electronic device may separate the third audio corresponding to the third type of sound source into other types to control or edit the third audio, and may control or edit the third audio using an object representing other types (e.g., a second object representing other types in FIG. 5a (531b)). According to one embodiment, the electronic device may control the volume of the second audio corresponding to the second type of sound source and the volume of the third audio corresponding to the third type of sound source based on user input regarding the third object representing the third type of sound source. According to one embodiment, when a score representing the detection reliability of the first type of sound source is lower than a specified threshold value, the electronic device may apply a visually distinguishable effect to the first object representing the first type of sound source.

[0117] FIG. 9 is a diagram illustrating an example of a method of operation in an electronic device according to one embodiment. In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel. FIG. 9 describes a specific operation for displaying a first screen for editing audio data in operation 805 of FIG. 8.

[0118] An electronic device according to one embodiment can configure objects representing sound sources included in a first screen for editing (e.g., the first screen (530) of FIG. 5A, FIG. 5B, and FIG. 5C) by mapping sound sources detected for each time interval in an audio classification module (e.g., the audio classification module (230) of FIG. 2A) to audio channels (e.g., types of audio) specified in an audio separation module (e.g., the audio separation module (240) of FIG. 2B), based on sound source information obtained from an audio classification module.

[0119] Referring to FIG. 9, an electronic device according to one embodiment can identify a plurality of designated sound source types (e.g., audio types or audio channels) that are separable from an audio separation module in operation 901. For example, the electronic device can identify sound source types (e.g., audio types) for voice, music, wind, and / or other types (e.g., second types) that are separable from an audio separation module. Here, the number of sound source types designated to be separable from the audio separation module may be predetermined as being smaller than the number of sound source types designated in the audio classification module according to the function (e.g., performance) of the audio separation module. The sound source types (e.g., audio types) described in this disclosure are described as examples and are not limited thereto, and some of the designated sound source types (e.g., audio types) may be excluded, replaced, or added with other forms of sound sources. In operation 903, when the electronic device separates audio (e.g., a sound source) corresponding to some of the multiple sound source types specified by the audio separation module, it can determine whether to separate the detected or separable audio into other audio by checking whether a sound source of a sound source type corresponding to the audio detected or separable audio in the audio data at the time separation is detected by the audio classification module based on information about at least one sound source (e.g., sound source information) and information about a time interval.

[0120] If, as a result of checking in the above 903 operation, a sound source (e.g., music) corresponding to the audio detected or separable during audio separation is not detected during audio classification, the detected or separable audio (e.g., music) may be separated into other types (e.g., other audio data), and the 905 operation may be performed. In the 905 operation, the electronic device may display an object corresponding to the other type (e.g., a second object) as an editing UI for controlling or editing the detected or separable audio. Here, the object corresponding to the other type (e.g., a second object) may be named, for example, guitar, noise, or a user-specified name (e.g., an icon or a specific color). For example, the object corresponding to the other type (e.g., a second object) may be an object provided to control or edit noise or other unseparated sound sources together with the other type. For example, if an object corresponding to a guitar type (e.g., a second object) represents multiple audio sources (e.g., sound sources), such as guitar audio that is not separated into independent audio sources (e.g., sound sources), noise, or other unseparated sound sources, the object corresponding to the guitar type (e.g., the second object) may include sub-objects such as an object representing guitar audio, an object representing noise, or an object representing other unseparated audio. When the object corresponding to guitar audio (e.g., the second object) is selected by user interaction, an extended object (e.g., a screen or UI) displaying the sub-objects so that each included sub-object can be selected may be displayed on a first screen for editing. Subsequently, upon receiving user input regarding the second object through the first screen, the electronic device may perform editing or control (e.g., volume control) of the guitar audio, noise, and / or other unseparated audio.

[0121] If, as a result confirmed in the above 903 operation, a sound source (e.g., music) corresponding to the audio (e.g., music) detected or separable during audio separation is detected during audio classification, the audio can be separated into an audio type corresponding to the detected or separable audio (e.g., music) and the 907 operation can be performed. In the 907 operation, the electronic device can display an object corresponding to the separated audio type (e.g., an object corresponding to music) on a first screen for editing. Subsequently, when user input regarding the object corresponding to the audio type is received through the first screen, the electronic device can perform editing or control (e.g., volume control) on the audio (e.g., music) of the separated audio type.

[0122] According to one embodiment, when an electronic device detects audio of multiple sound source types during a time interval in which audio separation is performed through an audio separation module, it can individually perform an operation to check whether to separate each of the multiple sound source types into other types using the same operation method as described in FIG. 9 above, and display an object for control or editing (e.g., an editing UI) on a first screen for editing according to the result of the check. For example, if the audio separation module is composed of multiple units corresponding to the number of separable sound source types, the operations of FIG. 9 for the multiple sound source types can be performed simultaneously. For example, if the audio separation module is composed of a single unit, the operations of FIG. 9 can be performed sequentially for each of the multiple sound source types.

[0123] According to one embodiment, the electronic device may apply a visually distinguishable effect to a first object representing a first type of sound source (e.g., an object representing speech (531a)) when, for example, in a first time interval, a score representing the detection reliability of a first type of sound source (e.g., speech) is lower than a specified threshold value, thereby indicating that the probability of detecting a first audio corresponding to a first type of sound source is low. According to one embodiment, the electronic device may apply a visual effect (e.g., a number or a graph) indicating a control result to the first object according to volume control of the first audio corresponding to a first type of sound source.

[0124] According to one embodiment, an electronic device may detect a sound source of a third type (e.g., laughter type) detected in a first time interval among a plurality of types by an audio classification module, and if the third type is identified as not corresponding to an audio type separable by an audio separation module, the electronic device may separate the sound source of the third type into a second type (e.g., other type) using an audio separation module. Based on user input regarding a second object representing the sound source of the second type, the electronic device may control the volume of the third audio corresponding to the third type and the third audio corresponding to the sound source of the second type (e.g., noise and / or other audio not separated) together or individually.

[0125] According to one embodiment, if the electronic device identifies that a sound source of a third type (e.g., laughter type) (e.g., third sound source) is not separated in the audio separation module and is the dominant sound source in the audio data, it may display a third object representing the third type of sound source on a first screen for editing via a display (161). According to one embodiment, the electronic device may individually control the volume of the third audio corresponding to the third sound source based on user input regarding the third object. For example, if only a dog barking sound is detected in the audio classification module, information about the dog barking sound may be displayed on a second object representing a second type (e.g., guitar type) of sound source in the audio separation module.

[0126] According to one embodiment, in order to prevent the waiting time before editing from becoming longer as the processing time increases when the length of the audio data is longer than a specified length, the electronic device may sample the entire audio data in an audio classification module to obtain information about sound sources detected in the sampling interval and interval information. The electronic device may reduce the waiting time before editing based on the information about sound sources detected in the sampling interval (e.g., sampling intervals (611, 612 and / or 613) of FIG. 6) and interval information. When the electronic device performs audio classification through an audio classification module using only a part of the audio rather than the whole audio through sampling, it may perform audio control / editing based on a sampling ratio (e.g., probability) for a part of the audio so as not to miss information about separable sound sources that may exist in the part of the audio.

[0127] When performing audio control / editing based on a sampling rate (e.g., probability) for a portion of audio, the electronic device may display an editing screen (e.g., the first screen (530) of FIG. 5a, 5b, and 5c) that includes objects corresponding to all separated sound sources output from the audio separation module (240) (e.g., editing UI) instead of including objects corresponding to sound sources separated based on information about sound sources separable from the audio classification module when the sampling rate (e.g., probability) for a portion of audio is below a reference value. The electronic device may display an object (e.g., an icon or a visual effect that displays a different color on the object of the sound source) on the editing screen (e.g., the first screen (530) of FIG. 5a, 5b, and 5c) that indicates that the information about the sound sources detected by the audio classification module may be probabilistically unreliable.

[0128] According to one embodiment, if the information regarding the time interval confirmed by the audio classification module is different from the interval information confirmed by the audio separation module, the electronic device may change the information regarding the different time intervals, display the changed (e.g., updated) oral information, and display a visual effect or object indicating that the interval information is different on the first screen.

[0129] According to one embodiment, when an option is selected during audio data playback, the electronic device may additionally display an audio waveform separated through an audio separation module on a first screen. Here, the audio waveform may be an audio waveform for a corresponding time interval based on interval information, and may include a time-axis audio waveform and / or a frequency-axis audio waveform.

[0130] FIG. 10 is a diagram showing an example of a screen for editing audio separated from audio data in an electronic device according to one embodiment.

[0131] Referring to FIG. 10, according to one embodiment, an electronic device may display screens (1010, 1020, 1030) for editing audio (e.g., sound sources) separated from audio data on a display. For example, as shown in FIG. 5a, the screen for editing may be configured to individually control the volume of objects for each of the separated audios corresponding to the detected sound sources, or as shown in FIG. 10, the objects for each of the separated audios may be displayed together. In addition, the screen for editing may be configured in various forms.

[0132] According to one embodiment, when the types of sound sources detected by the audio separation module match the specified plurality of types of audio that can be separated by the audio separation module, the electronic device can display a screen (1010) on the display that includes objects corresponding to voice, music, wind, and noise, which are the specified plurality of separable types.

[0133] According to one embodiment, if some of the types of sound sources detected by the audio classification module (e.g., laugh or crowd) do not match a specified plurality of types of audio that can be separated by the audio separation module (e.g., there is no audio channel for laugh), the electronic device may display a screen (1020 or 1030) configured by replacing an object for a sound source of a non-matching type (e.g., laugh or crowd) with a noise object. For example, when a sound source of a non-matching type (e.g., laugh or crowd) is classified as noise and the sound source (e.g., laugh or crowd) is separated into noise (e.g., a noise channel) in the audio data using the audio separation module, the noise object included in the screen (1010) may be expanded to display an object representing a sound source (e.g., laugh or crowd) together with the noise object.

[0134] According to one embodiment, a method of operation in an electronic device (e.g., the electronic device (101) of FIG. 1) may include receiving a request to individually edit each of a plurality of sound sources included in audio data.

[0135] According to one embodiment, the method may include an operation of checking whether the audio data contains a plurality of sound sources specified by the audio classification module using an audio classification module (e.g., the audio classification module (230) of FIG. 2a).

[0136] According to one embodiment, the method may include the operation of displaying a first object representing the first type of sound source and a second object representing the second type of sound source on a screen for editing through the display of the electronic device (e.g., the display module (160) of FIG. 1 or the display (161) of FIG. 5a, 5b, and 5c) based on identifying that a first type of sound source and a second type of sound source among the plurality of types are included in the audio data.

[0137] According to one embodiment, the method may include an operation to control the volume of a first audio corresponding to a first type of sound source separated using an audio separation module (e.g., the audio classification module (240) of FIG. 2a) based on user input for the first object.

[0138] According to one embodiment, the first object and the second object may be displayed on the screen for audio editing before the audio separation module completes the separation of the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data.

[0139] According to one embodiment, based on identifying that the audio data includes the first type of sound source and the second type of sound source among the plurality of types, the method may further include the operation of separating the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data using the audio separation module.

[0140] According to one embodiment, the method may further include an operation of displaying information about the time interval in which the first audio is detected on the screen through the display.

[0141] According to one embodiment, the method may further include an operation of controlling the volume of the second audio corresponding to the second type of sound source and the volume of the third audio corresponding to the third type of sound source based on user input for the second object, based on confirming that the third type of sound source among the plurality of types is not detected by the audio classification module and confirming that the third audio corresponding to the third type of sound source is at least partially separated using the audio separation module.

[0142] According to one embodiment, the method may further include the operation of applying a visually distinguishable effect to the first object on the screen based on confirming that a score representing the detection reliability of the first type of sound source is lower than a specified threshold value.

[0143] According to one embodiment, the method may further include the operation of applying a visual effect indicating the controlled volume of the first audio to the first object.

[0144] According to one embodiment, the plurality of types may be sound source types that can be classified through the audio classification module in the audio data.

[0145] According to one embodiment, the method allows the audio separation module to specify a number of separable audio types smaller than the number of types specified by the audio classification module, and the plurality of types may include sound source types corresponding to the plurality of audio types.

[0146] According to one embodiment, the method may further include the operation of detecting a third type of sound source among the plurality of types by the audio classification module, separating a third audio corresponding to the third type into the second type using the audio separation module based on the fact that the third type does not correspond to a type separable by the audio separation module, and controlling the volume of the third audio based on user input for the second object.

[0147] According to one embodiment, the method may further include an operation of displaying a third object representing the third type of sound source on the screen through the display based on identifying that the third type of sound source is a dominant sound source in the audio data, and an operation of controlling the volume of the third audio corresponding to the third type of sound source based on user input regarding the third object.

[0148] According to one embodiment, the method may further include the operation of displaying waveform information for sound sources separable from the audio data on the screen through the display.

[0149] According to one embodiment, the method may further include an operation of checking whether the plurality of sound sources specified by the audio classification module are included in the section sampled from the audio data through the audio classification module.

[0150] According to one embodiment, in a non-transient storage medium storing one or more programs, the one or more programs may include instructions that, when executed by at least one processor (e.g., processor (120) of FIG. 1) of an electronic device (e.g., electronic device (101) of FIG. 1), cause the electronic device to execute an operation of receiving a request to individually edit each of a plurality of sound sources included in audio data.

[0151] According to one embodiment, the one or more programs may include instructions that, when executed by at least one processor of an electronic device, cause the electronic device to execute an operation to check whether the audio data contains a plurality of sound sources specified by the audio classification module using an audio classification module (e.g., the audio classification module (230) of FIG. 2a).

[0152] According to one embodiment, the one or more programs may include commands that, when executed by at least one processor of the electronic device, cause the electronic device to execute an operation of displaying a first object representing a first type of sound source and a second object representing a second type of sound source on a screen for editing through the display of the electronic device (e.g., the display module (160) of FIG. 1 or the display (161) of FIG. 5a, 5b, and 5c), based on identifying that a first type of sound source and a second type of sound source among the plurality of types are included in the audio data.

[0153] According to one embodiment, the one or more programs may include instructions that, when executed by at least one processor of the electronic device, cause the electronic device to execute an operation to control the volume of a first audio corresponding to the first type of sound source separated using an audio separation module (e.g., audio separation module (240) of FIG. 2b) based on user input for the first object.

[0154] According to one embodiment, the first object and the second object may be displayed on the screen for audio editing before the audio separation module completes the separation of the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data.

[0155] According to one embodiment of this document, editing usability can be improved by using an audio classification solution to obtain information on separable audio sources and time intervals of each audio source within a short time prior to editing, and by separating each audio source in real-time using an audio separation module during playback. Since this document supports editing using the separated audio sources by executing audio source separation in real-time during playback, usability can be improved by eliminating the need for waiting time prior to editing. In addition, various effects that can be identified directly or indirectly through this document may be provided. The effects obtainable from this disclosure are not limited to those mentioned above, and other unmentioned effects will be clearly understood by those skilled in the art to which this disclosure belongs from the description below.

[0156] Furthermore, the embodiments disclosed in this document are presented for the purpose of explaining and understanding the disclosed technical content and are not intended to limit the scope of the technology described in this document. Accordingly, the scope of this document should be interpreted to include all modifications or various other embodiments based on the technical concept of this document.

[0157] The electronic device according to the various embodiments disclosed in this document may be of various forms. The electronic device may include, for example, a portable communication device (e.g., a smartphone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a consumer electronics device. The electronic device according to the embodiments of this document is not limited to the devices described above.

[0158] The various embodiments of this document and the terms used therein are not intended to limit the technical features described in this document to specific embodiments, and should be understood to include various modifications, equivalents, or substitutions of said embodiments. In connection with the description of the drawings, similar reference numerals may be used for similar or related components. The singular form of a noun corresponding to an item may include one or more of said items unless the relevant context clearly indicates otherwise. In this document, phrases such as "A or B," "at least one of A and B," "at least one of A or B," "A, B or C," "at least one of A, B and C," and "at least one of A, B, or C" may each include any one of the items listed together in the corresponding phrase, or all possible combinations thereof. Terms such as "first," "second," or "first" or "second" may be used simply to distinguish said components from other said components and do not limit said components in any other aspect (e.g., importance or order). Where any (e.g., 1st) component is referred to as “coupled” or “connected” to another (e.g., 2nd) component, with or without the terms “functionally” or “communicationly,” it means that said any component may be connected to said other component directly (e.g., via a wire), wirelessly, or through a third component.

[0159] The term “module” as used in the various embodiments of this document may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit, for example. A module may be a component formed integrally, or a minimum unit of said component or a part thereof that performs one or more functions. For example, according to one embodiment, a module may be implemented in the form of an application-specific integrated circuit (ASIC).

[0160] Various embodiments of the present document may be implemented as software (e.g., program (140)) comprising one or more instructions stored in a storage medium (e.g., internal memory (136) or external memory (138)) readable by a machine (e.g., electronic device (101)). For example, a processor (e.g., processor (120)) of the machine (e.g., electronic device (101)) may call at least one of the one or more instructions stored in the storage medium and execute it. This enables the machine to be operated to perform at least one function according to the at least one called instruction. The one or more instructions may include code generated by a compiler or code that can be executed by an interpreter. The storage medium readable by the machine may be provided in the form of a non-transitory storage medium. Here, 'non-transient' simply means that the storage medium is a tangible device and does not contain a signal (e.g., electromagnetic waves), and the term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily.

[0161] According to one embodiment, the method according to the various embodiments disclosed herein may be provided by being included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)) or an application store (e.g., Play Store). TM It can be distributed online (e.g., downloaded or uploaded) through ) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily created on a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0162] According to various embodiments, each component (e.g., module or program) of the components described above may include a singular or multiple entities, and some of the multiple entities may be separated and placed in other components. According to various embodiments, one or more of the components or operations of the aforementioned components may be omitted, or one or more other components or operations may be added. Generally or additionally, multiple components (e.g., module or program) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each of the multiple components in the same or similar manner as those performed by the corresponding component among the multiple components prior to integration. According to various embodiments, operations performed by the module, program, or other components may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, omitted, or one or more other operations may be added.

Claims

1. In an electronic device (101), Display (161); At least one processor (120) including a processing circuit; and It includes a memory (130) for storing instructions, When the above instructions are executed individually or collectively by the at least one processor, the electronic device: Receives a request to individually edit each of the multiple sound sources included in the audio data, and Using an audio classification module (230), check whether the audio data contains multiple types of sound sources specified by the audio classification module, and Based on identifying that a first type of sound source and a second type of sound source among the plurality of types are included in the audio data, a first object representing the first type of sound source and a second object representing the second type of sound source are displayed on the screen for audio editing through the display, and Based on user input regarding the first object, the volume of the first audio corresponding to the separated first type of sound source is controlled using the audio separation module (240), and The electronic device, wherein the first object and the second object are displayed on the screen for audio editing before the audio separation module completes the separation of the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data.

2. In paragraph 1, when the instructions are executed individually or collectively by the at least one processor, the electronic device: Based on identifying that the audio data includes the first type of sound source and the second type of sound source among the plurality of types, the audio separation module is used to separate the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data. An electronic device that causes information about the time interval in which the first audio is detected to be displayed on the screen through the display.

3. In paragraph 1 or 2, when the instructions are executed individually or collectively by the at least one processor, the electronic device: An electronic device that causes the volume of the second audio corresponding to the second type of sound source and the volume of the third audio corresponding to the third type of sound source to be controlled based on user input for the second object, based on confirming that the third type of sound source among the plurality of types is not detected by the audio classification module and confirming that the third audio corresponding to the third type of sound source is at least partially separated using the audio separation module.

4. In any one of claims 1 to 3, when the instructions are executed individually or collectively by the at least one processor, the electronic device: Based on confirming that the score indicating the detection reliability of the first type of sound source is lower than a specified threshold value, a visually distinguishable effect is applied to the first object on the screen, and Causing a visual effect representing the controlled volume of the first audio to be applied to the first object, The above plurality of types are sound source types that can be classified through the audio classification module in the audio data, and The above audio separation module specifies a number of separable audio types that is smaller than the number of types specified by the audio classification module, and An electronic device comprising sound source types corresponding to the plurality of audio types.

5. In any one of claims 1 to 4, when the instructions are executed individually or collectively by the at least one processor, the electronic device: An electronic device that detects a third type of sound source among the plurality of types by the audio classification module, and based on the fact that the third type does not correspond to a type separable by the audio separation module, separates the third audio corresponding to the third type of sound source into the second type using the audio separation module, and causes the volume of the third audio corresponding to the third type of sound source to be controlled based on user input for the second object.

6. In any one of claims 1 to 5, when the instructions are executed individually or collectively by the at least one processor, the electronic device: Based on identifying that the third type of sound source is the dominant sound source in the audio data, a third object representing the third type of sound source is displayed on the screen through the display, and An electronic device that causes the volume of the third audio corresponding to the third type of sound source to be controlled based on user input to the third object.

7. In any one of claims 1 through 6, when the instructions are executed individually or collectively by the at least one processor, the electronic device: Waveform information for sound sources separable from the above audio data is displayed on the screen through the above display, and An electronic device that causes the audio classification module to check whether the plurality of sound sources specified by the audio classification module are included in the sampled section of the audio data.

8. In a method of operation in an electronic device (101), An operation to receive a request to individually edit each of the multiple sound sources included in the audio data; An operation to check whether the audio data contains a plurality of sound sources of the types specified by the audio classification module using an audio classification module (230); Based on identifying that a first type of sound source and a second type of sound source among the plurality of types are included in the audio data, an operation of displaying a first object representing the first type of sound source and a second object representing the second type of sound source on a screen for editing through the display (161) of the electronic device; and Based on user input for the first object, the method includes controlling the volume of the first audio corresponding to the first type of sound source separated using the audio separation module (240). A method in which the first object and the second object are displayed on the screen for audio editing before the audio separation module completes the separation of the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data.

9. In paragraph 8, the above method is, Based on identifying that the audio data includes the first type of sound source and the second type of sound source among the plurality of types, the operation of separating the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data using the audio separation module; and A method further comprising the operation of displaying information about the time interval in which the first audio is detected on the screen through the display.

10. In paragraph 8 or 9, the above method is, A method further comprising, based on confirming that a third type of sound source among the plurality of types is not detected by the audio classification module and confirming that a third audio corresponding to the third type of sound source is at least partially separated using the audio separation module, controlling the volume of the second audio corresponding to the second type of sound source and the volume of the third audio corresponding to the third type of sound source based on user input for the second object.

11. In any one of paragraphs 8 to 10, the above method is, An operation of applying an effect to the screen that is visually distinguishable from the first object based on confirming that the score representing the detection reliability of the first type of sound source is lower than a specified threshold value; The method further includes the operation of applying a visual effect indicating the controlled volume of the first audio to the first object. The above plurality of types are sound source types that can be classified through the audio classification module in the audio data, and The above audio separation module specifies a number of separable audio types that is smaller than the number of types specified by the audio classification module, and A method in which the plurality of types include sound source types corresponding to the plurality of audio types.

12. In any one of paragraphs 8 to 11, the method is, A method further comprising the operation of detecting a third type of sound source among the plurality of types by the audio classification module, and based on the fact that the third type does not correspond to a type separable by the audio separation module, separating a third audio corresponding to the third type into the second type using the audio separation module, and controlling the volume of the third audio based on user input for the second object.

13. In any one of paragraphs 8 to 12, the method is, An operation of displaying a third object representing the third type of sound source on the screen through the display, based on identifying that the third type of sound source is a dominant sound source in the audio data; and A method further comprising an operation to control the volume of the 3 audio corresponding to the 3 type of sound source based on user input for the 3 object.

14. In any one of paragraphs 8 to 13, the above method is, The operation of displaying waveform information for sound sources separable from the above audio data on the screen through the above display; and A method further comprising an operation to check whether the plurality of sound sources specified by the audio classification module are included in the section sampled from the audio data through the audio classification module.

15. In a non-transient storage medium storing one or more programs, the one or more programs, when executed by at least one processor (120) of an electronic device (101), cause the electronic device: An operation to receive a request to individually edit each of the multiple sound sources included in the audio data; An operation to check whether the audio data contains a plurality of sound sources of the types specified by the audio classification module using an audio classification module (230); Based on identifying that a first type of sound source and a second type of sound source among the plurality of types are included in the audio data, an operation of displaying a first object representing the first type of sound source and a second object representing the second type of sound source on a screen for editing through the display (161) of the electronic device; and Based on user input for the first object, the system includes commands to execute an operation to control the volume of the first audio corresponding to the first type of sound source separated using the audio separation module (240), and The first object and the second object are a non-transient storage medium that is displayed on the screen for audio editing before the audio separation module completes the separation of the first audio corresponding to the first type of sound source and the second audio corresponding to the second type of sound source from the audio data.