Electronic device and method for scanning and separating audio data in electronic device

By dynamically adjusting processing schedules and using variable sampling intervals, the method addresses real-time audio separation and scanning challenges, preventing interruptions and improving efficiency in electronic devices.

WO2026142255A1PCT designated stage Publication Date: 2026-07-02SAMSUNG ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SAMSUNG ELECTRONICS CO LTD
Filing Date
2025-12-23
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing electronic devices face challenges in real-time audio separation and scanning due to varying processing times, leading to potential audio playback interruptions and inefficient scanning operations, especially when dealing with large audio data sizes and varying device performance.

Method used

The solution involves dynamically adjusting the processing schedule for audio playback and separation based on real-time device performance, using variable sampling intervals to ensure timely completion within specified limits, and identifying sound source data during playback.

Benefits of technology

This approach prevents audio playback interruptions and enhances scanning efficiency by optimizing processing times, ensuring consistent separation results and reducing scanning duration.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025022559_02072026_PF_FP_ABST
    Figure KR2025022559_02072026_PF_FP_ABST
Patent Text Reader

Abstract

This electronic device is configured to determine a first scan interval by using a time period corresponding to audio content such that a scan time for the audio content does not exceed a designated maximum scan time. The electronic device is configured to identify a plurality of pieces of first sound source audio data corresponding to first audio data of a first time period in the audio data corresponding to the audio content, wherein the plurality of pieces of first sound source audio data are identified by using a scan result of the audio content while playing back the audio data corresponding to the audio content. The electronic device is configured to acquire the plurality of pieces of first sound source audio data by separating the first audio data of the first time period by using a real-time factor value.
Need to check novelty before this filing date? Find Prior Art

Description

Electronic device and method for scanning and separating audio data from electronic device

[0001] The present disclosure relates to a method for scanning and separating audio data in an electronic device.

[0002] With the development of electronic information and communication technology, various functions are being integrated into communication devices or electronic devices. Additionally, electronic devices are being implemented to perform interoperability functions, enabling them to interact with other electronic devices through communication. For example, portable electronic devices (e.g., mobile terminals, tablet terminals, or wearable electronic devices) include content playback functions in addition to communication functions. Portable electronic devices can play not only content stored at the time of manufacturing but also various received content. With recent advancements in content processing technology, electronic devices can provide not only the function of playing content but also the function of editing content.

[0003] The background technology described above is technical information held by the inventor for the purpose of describing the invention or technical information acquired during the process of deriving the invention, and it is not necessarily considered to be prior art known prior to the application.

[0004] The content editing function may include an audio separation function that separates various categories of sound source data from an audio stream included in the content. Additionally, the content editing function may include an audio scanning function that provides sections within the audio stream included in the content where sound sources of a specific type (e.g., sound type) exist.

[0005] When performing audio separation for an audio stream on an electronic device, audio data with a longer duration unit than the time unit used for playing the audio data may be required. Assuming that audio separation is performed in real-time while playing audio content on the electronic device, the time duration required to separate audio data (e.g., by analyzing and separating) (e.g., separation time) may take longer than the playback time corresponding to the audio data prepared for playback in the buffer. If the separation time is prolonged and the separation of the next audio data to be played is not completed before the playback of the audio data prepared in the buffer is finished, and thus the next audio data to be played cannot be provided (e.g., stored) in the buffer, audio playback may be interrupted and the sound may cut off until the next audio data is provided to the buffer. Therefore, when performing audio separation in real-time while playing audio content, it may be necessary to adjust the processing schedule for audio playback and audio separation so that the separation time required for audio separation does not take longer than the playback time corresponding to the audio data prepared for playback in the buffer. If the separation time is constant each time audio separation is performed, it may be easy to adjust the processing schedule for audio playback and audio separation; however, since the separation time during audio separation can vary in real time depending on the performance and state of the electronic device, it may not be easy to adjust the processing schedule for audio playback and audio separation. Additionally, when audio separation is performed on an electronic device, inference data is generated by accumulating the separation (or analysis) results for a single audio content, and the next audio data can be separated using the inference data.When an electronic device performs audio separation while sequentially playing discontinuous first audio content and second audio content, the separation results may not be consistent because, after separating the last audio data of the first audio content, when separating the first audio data of the second audio content, the inference data accumulated from the separation of the first audio content is initialized and the separation result of the second audio content is accumulated and used as new inference data.

[0006] Meanwhile, when an electronic device performs audio scanning on an audio stream, scanning is performed on the decoded audio data after decoding the audio stream; however, decoding the entire audio stream can take a long time, and scanning the decoded audio data can also take a long time if the size of the decoded audio data is large. To reduce the scanning time of audio data, a portion of the entire audio data is sampled for scanning; however, if a fixed sampling interval is used regardless of the real-time performance of the electronic device, it can be inefficient.

[0007] According to one embodiment of the present disclosure, when performing audio separation in real time while playing audio content in an electronic device, the separation time is determined in real time, and the processing schedule for audio playback and audio separation is adjusted so that the separation time required for audio separation in real time does not take longer than the playback time corresponding to the audio data prepared to be played in the buffer.

[0008] According to one embodiment of the present disclosure, when performing audio scanning on an audio stream in an electronic device, the scanning operation can be completed within a limited scanning time by variably determining and utilizing a sampling interval according to the real-time performance of the electronic device and a limited scanning time.

[0009] An electronic device according to one embodiment of the present disclosure may include a display, an audio output module including a speaker, a memory for storing commands, and at least one processor. When the commands are executed individually or collectively by the at least one processor, the electronic device may, based on an input for scanning audio content: determine a first scan interval such that the scan time for the audio content does not exceed the specified maximum scan time using a time period corresponding to the audio content and a specified maximum scan time, and scan the audio content by sampling audio data corresponding to the audio content using the first scan interval. When the commands are executed individually or collectively by the at least one processor, the electronic device may, based on an input for playing the audio content: identify a first plurality of sound source audio data corresponding to the first audio data of the first time period among the audio data. The first plurality of sound source audio data are identified using the scan result while playing the audio data of the audio content. When the above commands are executed individually or collectively by the at least one processor, the electronic device may be able to perform separation of the first audio data of the first time period using a real time factor value to acquire the first plurality of sound source audio data and output the first plurality of sound source audio data through the sound output module.

[0010] A method for scanning and separating audio data in an electronic device according to one embodiment of the present disclosure may include: based on an input for scanning audio content, determining a first scan interval such that the scan time for the audio content does not exceed the specified maximum scan time using a time period corresponding to the audio content and a specified maximum scan time, and scanning the audio content by sampling audio data corresponding to the audio content using the first scan interval; and based on an input for playing the audio content, identifying a first plurality of sound source audio data corresponding to a first audio data of the first time period among the audio data corresponding to the audio content. The first plurality of sound source audio data is identified using the scan result of the audio content while playing the audio data of the audio content. The method may include an operation to obtain the first plurality of sound source audio data by performing separation of the first audio data of the first time period using a real time factor value, and an operation to output the first plurality of sound source audio data through the sound output module.

[0011] In a non-transient storage medium storing commands according to one embodiment of the present disclosure, said commands are configured such that when executed by an electronic device, said electronic device performs at least one operation, wherein the at least one operation may include: based on an input for scanning audio content; determining a first scan interval using a time period corresponding to said audio content and a specified maximum scan time such that the scan time for said audio content does not exceed the specified maximum scan time, and scanning said audio content by sampling audio data corresponding to said audio content using said first scan interval; and based on an input for playing said audio content: identifying a first plurality of sound source audio data corresponding to a first audio data of a first time period among said audio data corresponding to said audio content. The first plurality of sound source audio data are identified using said scan result of said audio content while playing said audio data corresponding to said audio content. The above method may include the operation of obtaining the first plurality of sound source audio data by performing separation of the first audio data of the first time period using a real time factor value, and the operation of outputting the first plurality of sound source audio data through the sound output module.

[0012] FIG. 1 is a block diagram of an electronic device in a network environment according to one embodiment.

[0013] FIG. 2 is a block diagram of an electronic device according to one embodiment.

[0014] FIG. 3 is a configuration diagram showing an audio separator according to one embodiment.

[0015] FIG. 4 is a diagram showing cases of separating and processing content including a plurality of audio contents in an electronic device according to one embodiment.

[0016] FIG. 5 is a flowchart illustrating the audio data separation operation during content playback according to one embodiment.

[0017] FIG. 6a is a flowchart illustrating an audio data separation operation based on whether the audio data requires separation during content playback according to one embodiment.

[0018] FIG. 6b is a flowchart showing the operation continuing from 6a according to one embodiment.

[0019] FIG. 6c is a flowchart showing the operation continuing from 6b according to one embodiment.

[0020] FIG. 7 is a configuration diagram showing an audio scanner according to one embodiment.

[0021] FIG. 8 is a diagram showing the decode time and scan time for audio data in one section according to one embodiment.

[0022] FIG. 9 is a diagram showing an example of setting an analysis interval based on a skip interval and a scan interval according to one embodiment.

[0023] FIG. 10 is a diagram showing a specified data format for storing scan result information according to one embodiment.

[0024] FIG. 11 is a diagram showing sampling processing cases during content scanning according to one embodiment.

[0025] FIG. 12 is a flowchart illustrating an audio data scanning operation according to one embodiment.

[0026] FIG. 13a is a flowchart illustrating an audio data scanning operation according to one embodiment, based on the presence or absence of a previously acquired scan interval and a skip interval.

[0027] FIG. 13b is a flowchart showing the operation continuing from 13a according to one embodiment.

[0028] FIG. 13c is a flowchart showing the operation continuing from 13b according to one embodiment.

[0029] FIG. 14 is a flowchart illustrating a scanning operation for audio content containing scan result information according to one embodiment.

[0030] FIG. 15 is a drawing showing an example of a screen for audio scanning according to one embodiment.

[0031] FIG. 16 is a drawing showing an example of a screen displaying audio scan results according to one embodiment.

[0032] FIG. 17 is a drawing showing an example of a screen for content editing according to one embodiment.

[0033] In relation to the description of the drawings, the same or similar reference numerals may be used for identical or similar components.

[0034] The terms used in this document are used merely to describe specific embodiments and are not intended to limit the scope of other embodiments. Singular expressions may include plural expressions unless the context clearly indicates otherwise. All terms used herein, including technical or scientific terms, may have the same meaning as generally understood by those skilled in the art of the present invention. Terms defined in commonly used dictionaries may be interpreted as having the same or similar meaning as they have in the context of the relevant technology, and are not to be interpreted in an ideal or overly formal sense unless explicitly defined in this document. In some cases, even terms defined in this document shall not be interpreted to exclude embodiments of the present invention.

[0035] FIG. 1 is a block diagram of an electronic device (101) in a network environment (100) according to one embodiment.

[0036] Referring to FIG. 1, in a network environment (100), an electronic device (101) may communicate with an electronic device (102) through a first network (198) (e.g., a short-range wireless communication network) or with at least one of an electronic device (104) or a server (108) through a second network (199) (e.g., a long-range wireless communication network). According to one embodiment, the electronic device (101) may communicate with the electronic device (104) through a server (108). According to one embodiment, the electronic device (101) may include a processor (120), memory (130), input module (150), sound output module (155), display module (160), audio module (170), sensor module (176), interface (177), connection terminal (178), haptic module (179), camera module (180), power management module (188), battery (189), communication module (190), subscriber identification module (196), or antenna module (197). In some embodiments, at least one of these components (e.g., connection terminal (178)) may be omitted from the electronic device (101), or one or more other components may be added. In some embodiments, some of these components (e.g., sensor module (176), camera module (180), or antenna module (197)) may be integrated into a single component (e.g., display module (160)).

[0037] The processor (120) can control at least one other component (e.g., hardware or software component) of the electronic device (101) connected to the processor (120) by executing software (e.g., program (140)), and can perform various data processing or operations. According to one embodiment, as at least part of the data processing or operations, the processor (120) can store commands or data received from other components (e.g., sensor module (176) or communication module (190)) in volatile memory (132), process the commands or data stored in volatile memory (132), and store the resulting data in non-volatile memory (134). According to one embodiment, the processor (120) may include a main processor (121) (e.g., central processing unit or application processor) or an auxiliary processor (123) that can operate independently or together with it (e.g., graphics processing unit, neural processing unit (NPU), image signal processor, sensor hub processor, or communication processor). For example, if the electronic device (101) includes a main processor (121) and an auxiliary processor (123), the auxiliary processor (123) may be configured to use less power than the main processor (121) or to be specialized for a designated function. The auxiliary processor (123) may be implemented separately from the main processor (121) or as part thereof.

[0038] The auxiliary processor (123) may control at least some of the functions or states associated with at least one component of the electronic device (101) (e.g., display module (160), sensor module (176), or communication module (190)) on behalf of the main processor (121) while the main processor (121) is in an inactive (e.g., sleep) state, or together with the main processor (121) while the main processor (121) is in an active (e.g., application execution) state. According to one embodiment, the auxiliary processor (123) (e.g., image signal processor or communication processor) may be implemented as part of another functionally related component (e.g., camera module (180) or communication module (190)). According to one embodiment, the auxiliary processor (123) (e.g., neural network processing unit) may include a hardware structure specialized for processing an artificial intelligence model. The artificial intelligence model may be generated through machine learning. Such learning may be performed, for example, on the electronic device (101) itself where the artificial intelligence model is executed, or through a separate server (e.g., server (108)). The learning algorithm may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but is not limited to the examples described above. The artificial intelligence model may include a plurality of artificial neural network layers.An artificial neural network may be a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), a deep Q-network, or a combination of two or more of the above, but is not limited to the examples described above. In addition to the hardware structure, the artificial intelligence model may include a software structure, either additionally or substantially.

[0039] The memory (130) can store various data used by at least one component of the electronic device (101) (e.g., processor (120) or sensor module (176)). The data may include, for example, input data or output data for software (e.g., program (140)) and related commands. The memory (130) may include volatile memory (132) or non-volatile memory (134).

[0040] The program (140) may be stored as software in memory (130) and may include, for example, an operating system (142), middleware (144), or an application (146).

[0041] The input module (150) can receive commands or data to be used for a component of the electronic device (101) (e.g., processor (120)) from outside the electronic device (101) (e.g., user). The input module (150) may include, for example, a microphone, a mouse, a keyboard, a key (e.g., a button), or a digital pen (e.g., a stylus pen).

[0042] The sound output module (155) can output a sound signal to the outside of the electronic device (101). The sound output module (155) may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as multimedia playback or recording playback. The receiver may be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part thereof.

[0043] The display module (160) can visually provide information to an external (e.g., user) of the electronic device (101). The display module (160) may include, for example, a display, a holographic device, or a projector and a control circuit for controlling said device. According to one embodiment, the display module (160) may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of the force generated by said touch.

[0044] The audio module (170) can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module (170) can acquire sound through the input module (150) or output sound through the sound output module (155) or an external electronic device (e.g., electronic device (102)) (e.g., speaker or headphones) connected directly or wirelessly to the electronic device (101).

[0045] The sensor module (176) can detect the operating state of the electronic device (101) (e.g., power or temperature) or the external environmental state (e.g., user state) and generate an electrical signal or data value corresponding to the detected state. According to one embodiment, the sensor module (176) may include, for example, a gesture sensor, a gyroscope sensor, a barometric pressure sensor, a magnetic sensor, an accelerometer sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biosensor, a temperature sensor, a humidity sensor, or an illuminance sensor.

[0046] The interface (177) may support one or more specified protocols that can be used for the electronic device (101) to be connected directly or wirelessly to an external electronic device (e.g., electronic device (102)). According to one embodiment, the interface (177) may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.

[0047] The connection terminal (178) may include a connector through which the electronic device (101) can be physically connected to an external electronic device (e.g., electronic device (102)). According to one embodiment, the connection terminal (178) may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (e.g., a headphone connector).

[0048] The haptic module (179) can convert an electrical signal into a mechanical stimulus (e.g., vibration or movement) or an electrical stimulus that the user can perceive through tactile or kinesthetic senses. According to one embodiment, the haptic module (179) may include, for example, a motor, a piezoelectric element, or an electric stimulation device.

[0049] The camera module (180) can capture still images and video. According to one embodiment, the camera module (180) may include one or more lenses, image sensors, image signal processors, or flashes.

[0050] The power management module (188) can manage power supplied to the electronic device (101). According to one embodiment, the power management module (188) can be implemented, for example, as at least part of a power management integrated circuit (PMIC).

[0051] The battery (189) can supply power to at least one component of the electronic device (101). According to one embodiment, the battery (189) may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.

[0052] The communication module (190) can support the establishment of a direct (e.g., wired) communication channel or a wireless communication channel between an electronic device (101) and an external electronic device (e.g., electronic device (102), electronic device (104), or server (108)), and the performance of communication through the established communication channel. The communication module (190) may include one or more communication processors that operate independently of the processor (120) (e.g., application processor) and support direct (e.g., wired) communication or wireless communication. According to one embodiment, the communication module (190) may include a wireless communication module (192) (e.g., cellular communication module, short-range wireless communication module, or GNSS (global navigation satellite system) communication module) or a wired communication module (194) (e.g., LAN (local area network) communication module, or power line communication module). The corresponding communication module among these communication modules can communicate with an external electronic device (104) through a first network (198) (e.g., a short-range communication network such as Bluetooth, WiFi (wireless fidelity) direct, or IrDA (infrared data association)) or a second network (199) (e.g., a legacy cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., a LAN or WAN)). These various types of communication modules may be integrated into a single component (e.g., a single chip) or implemented as multiple separate components (e.g., multiple chips). The wireless communication module (192) can identify or authenticate the electronic device (101) within a communication network such as the first network (198) or the second network (199) using subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module (196).

[0053] The wireless communication module (192) can support 5G networks and next-generation communication technologies following 4G networks, for example, new radio access technology. NR access technology can support high-speed transmission of high-capacity data (enhanced mobile broadband (eMBB)), minimization of terminal power and connection of multiple terminals (massive machine type communications (mMTC)), or high reliability and low latency (ultra-reliable and low-latency communications (URLLC)). The wireless communication module (192) can support a high-frequency band (e.g., mmWave band) to achieve a high data transmission rate, for example. The wireless communication module (192) can support various technologies for securing performance in the high-frequency band, such as beamforming, massive MIMO (multiple-input and multiple-output), full-dimensional MIMO (FD-MIMO), array antenna, analog beam-forming, or large-scale antenna. The wireless communication module (192) can support various requirements specified in the electronic device (101), external electronic device (e.g., electronic device (104)), or network system (e.g., second network (199)). According to one embodiment, the wireless communication module (192) can support a Peak data rate (e.g., 20 Gbps or more) for realizing eMBB, loss coverage (e.g., 164 dB or less) for realizing mMTC, or U-plane latency (e.g., downlink (DL) and uplink (UL) each 0.5 ms or less, or round trip 1 ms or less) for realizing URLLC.

[0054] An antenna module (197) can transmit a signal or power to or from an external source (e.g., an external electronic device). According to one embodiment, the antenna module (197) may include an antenna comprising a radiator made of a conductor or a conductive pattern formed on a substrate (e.g., a PCB). According to one embodiment, the antenna module (197) may include a plurality of antennas (e.g., an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network, such as a first network (198) or a second network (199), may be selected from the plurality of antennas, for example, by a communication module (190). A signal or power may be transmitted or received between the communication module (190) and an external electronic device through the selected at least one antenna. According to some embodiments, in addition to the radiator, other components (e.g., a radio frequency integrated circuit (RFIC)) may be additionally formed as part of the antenna module (197).

[0055] According to one embodiment, the antenna module (197) may form a mmWave antenna module. According to one embodiment, the mmWave antenna module may include a printed circuit board, an RFIC disposed on or adjacent to a first surface (e.g., bottom surface) of the printed circuit board and capable of supporting a specified high frequency band (e.g., mmWave band), and a plurality of antennas (e.g., array antennas) disposed on or adjacent to a second surface (e.g., top surface or side surface) of the printed circuit board and capable of transmitting or receiving a signal of the specified high frequency band.

[0056] At least some of the above components can be connected to each other via a communication method between peripheral devices (e.g., bus, GPIO (general purpose input and output), SPI (serial peripheral interface), or MIPI (mobile industry processor interface)) and exchange signals (e.g., commands or data) with each other.

[0057] According to one embodiment, commands or data may be transmitted or received between the electronic device (101) and an external electronic device (104) through a server (108) connected to a second network (199). Each of the external electronic devices (102, or 104) may be the same or a different type of device as the electronic device (101). According to one embodiment, all or part of the operations performed on the electronic device (101) may be performed on one or more of the external electronic devices (102, 104, or 108). For example, if the electronic device (101) needs to perform a function or service automatically or in response to a request from a user or another device, the electronic device (101) may request one or more external electronic devices to perform at least part of the function or service instead of performing the function or service itself or additionally. One or more external electronic devices that receive the above request may execute at least part of the requested function or service, or additional function or service related to the request, and transmit the result of the execution to the electronic device (101). The electronic device (101) may provide the result as is or additionally processed as at least part of the response to the request. For this purpose, for example, cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology may be used. The electronic device (101) may provide ultra-low latency services using, for example, distributed computing or mobile edge computing. In another embodiment, the external electronic device (104) may include an Internet of Things (IoT) device. The server (108) may be an intelligent server using machine learning and / or neural networks. According to one embodiment, the external electronic device (104) or the server (108) may be included within a second network (199).The electronic device (101) can be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.

[0058] FIG. 2 is a block diagram of an electronic device according to one embodiment.

[0059] An electronic device (201) according to one embodiment (e.g., the electronic device (101) of FIG. 1) may include at least one processor (hereinafter also referred to as a processor) (220), memory (230), display (260), and / or audio output module (255). An electronic device (201) according to one embodiment may be configured to include various additional components, not limited thereto, or to exclude some of the components. An electronic device (201) according to one embodiment may further include all or part of the electronic device (101) shown in FIG. 1.

[0060] A processor (220) according to one embodiment (e.g., processor (120) of FIG. 1) may include a central processing unit (CPU), an application processor (AP), and an audio processor. The processor (220) may include a hardware structure specialized for processing an artificial intelligence (AI) model (e.g., an AI chip). A processor (220) according to one embodiment may perform overall control operations of an electronic device (201). A processor (220) according to one embodiment may execute commands stored in memory (230) individually or collectively to cause the electronic device (201) to perform an audio data separation operation (or method) when playing content of the present disclosure. A processor (220) according to one embodiment may execute commands stored in memory (230) individually or collectively to cause the electronic device (201) to perform an audio data scanning operation (or method) of the present disclosure. A processor (220) according to one embodiment can independently perform an audio data separation operation and an audio data scanning operation during content playback. A processor (220) according to one embodiment can perform an audio data separation operation after performing an audio data scanning operation.

[0061] When a processor (220) according to one embodiment performs an audio data separation operation after performing an audio data scanning operation, it may determine a first scan interval (or a first scan interval and a first skip interval) based on an input for scanning audio content, using a time period corresponding to audio content and a specified maximum scan time, such that the scan time for audio content does not exceed the specified maximum scan time. The processor (220) according to one embodiment may scan audio content by sampling audio data corresponding to audio content using the first scan interval (or a first scan interval and a first skip interval). The processor (220) according to one embodiment may identify at least one sound source audio data included in each time interval of the audio content as a result of scanning audio content. A processor (220) according to one embodiment can identify a first plurality of sound source audio data corresponding to a first audio data of a first time period among the audio data by using a scan result for the audio content while playing audio data of the audio content based on an input for playing audio content. A processor (220) according to one embodiment can obtain a first plurality of sound source audio data by performing separation of the first audio data of the first time period using a real time factor value, and can output the first plurality of sound source audio data through an audio output module (255).

[0062] A processor (220) according to one embodiment of the present disclosure may initiate an operation to separate audio data while playing content based on an input for content playback (or separation or editing).

[0063] Audio separation (or sound source separation) according to one embodiment may mean separating at least one sound source audio data corresponding to a sound source (or sound source source) (e.g., vocal, instrument, background sound, noise, and / or other sound source sources) of at least one designated category (or category or classification criterion) from audio data (e.g., pulse-code modulation (PCM) data) of a certain interval (or a certain time length), and obtaining at least one separated sound source audio data. For example, the audio data may include multiple sound sources of multiple categories. The multiple sound sources may include a first sound source (vocal) and a second sound source (instrument). The processor (220) may separate the first sound source audio data corresponding to vocals and the second sound source audio data corresponding to instruments from the audio data. The processor (220) may obtain the separated first sound source audio data and second sound source audio data. A processor (220) according to one embodiment may use at least one source audio data obtained through audio separation when playing content (or editing content). For example, when playing audio content (or editing audio content), the processor (220) may separate the source audio data from the audio content and adjust the volume of the source audio data (e.g., volume up or volume down) or remove the source audio data from the audio content.

[0064] A processor (220) according to one embodiment may obtain audio data by decoding audio content (e.g., an audio stream) through a decoder (232) based on an input for content playback (or editing). For example, the audio stream may be content in a format for continuously transmitting digital audio data over time. Content according to one embodiment may include audio content or may include audio content and video content. Audio content according to one embodiment may include a first audio content and a second audio content. The first audio content and the second audio content according to one embodiment may be continuous and different audio data. A processor (220) according to one embodiment may display a screen for content playback (or editing) on ​​a display (260) based on the execution of a content playback application (or content editing application) (or program). A processor (220) according to one embodiment may identify an input for content playback based on user input to a button (or icon) for a playback request on a screen for content playback (or editing). Based on the identification of the input for content playback, the processor (220) according to one embodiment may obtain audio data (e.g., PCM data) by decoding audio content through a decoder (232). PCM data according to one embodiment may be data in which the amplitude of a sound wave is sampled at specific time intervals and expressed as discrete numbers to convert an analog audio signal (sound) into digital. For example, the size of PCM data (e.g., bytes) during a certain duration (e.g., 1 second) may be calculated as the product of the sampling rate, the sample size, and the number of channels.A processor (220) according to one embodiment can decode audio content through a decoder (232) to continuously output (or acquire) PCM data of a specified duration (e.g., 0.5 seconds). A processor (220) according to one embodiment can perform synchronous (syn) or asynchronous (acync) decoding through the decoder (232). A processor (220) according to one embodiment can output PCM data corresponding to a specified seek time when a seek time is specified by a user during continuous playback of audio content through the decoder (232). According to one embodiment, the output time of PCM data through the decoder (232) may vary depending on the composition of the content, and when a seek is requested, the time required to output PCM data may increase due to the flushing of PCM data acquired prior to the seek request. A processor (220) according to one embodiment may perform audio separation and / or audio scanning by taking into account the time taken to output (or acquire) PCM data through a decoder (232).

[0065] A processor (220) according to one embodiment can identify whether audio data (PCM data) that is continuously output through a decoder (232) is audio data that requires separation.

[0066] A processor (220) according to one embodiment may store audio data that does not require separation in a designated buffer (e.g., a second buffer or an intermediate buffer) for storing audio data that does not require separation. Audio data stored in the second buffer may be delivered to an audio renderer (236) at the time of audio rendering corresponding to the stored audio data.

[0067] A processor (220) according to one embodiment may store audio data requiring separation in a designated buffer (input buffer (e.g., single buffer or double buffer)) so that it can be transmitted to an audio separator (24). A processor (220) according to one embodiment may use a single buffer or a double buffer depending on the state information of the electronic device (201) and / or whether there is a delay in the separation operation through the audio separator (24). A processor (220) according to one embodiment may process the audio data to be separated in parallel using a double buffer when the state information of the electronic device (201) is in a state where a double buffer can be used and the separation operation is delayed. A processor (220) according to one embodiment may process the audio data to be separated sequentially using a single buffer when the state information of the electronic device (201) is not in a state where a double buffer can be used or the separation operation is not delayed.

[0068] A processor (220) according to one embodiment performs a separation operation on audio data that requires separation among the audio data of the audio content, accumulates separation result information for the audio content, and can use the inference data accumulated so far when performing a separation operation on the next audio data. In the case where the content includes a first audio content and a second audio content, if the processor (220) performs separation on the first audio content and then performs separation on the second audio content, and the first audio content and the second audio content do not have continuity, the quality of the separation result may be poor when applying the first inference data accumulated for the first audio content to the second content.

[0069] A processor (220) according to one embodiment can identify whether there is continuity between the first audio data of the first time period and the second audio data of the second time period when the audio content includes a first audio content and a second audio content, and among the audio data to be separated, the first audio data of the first time period is included in the first audio content and the second audio data of the second time period is included in the second content. When the first audio data of the first time period and the second audio data of the second time period have continuity, the processor (220) according to one embodiment can update the first inference data by accumulating separation result information for the first audio content without initializing the first inference data, and by further accumulating separation result information for the second audio content following the separation result information for the first audio content. A processor (220) according to one embodiment may initialize first inference data that accumulates separation result information for the first audio content when the first audio data of the first time period and the second audio data of the second time period do not have continuity, and acquire and use second inference data that accumulates separation result information for the second audio content.

[0070] A processor (220) according to one embodiment can identify a first separation time required to perform separation of the first audio data of a first time period (e.g., input PCM data of the audio separator (24)) among the audio data to be separated through the audio separator (24) while playing audio data. For example, the processor (220) can acquire PCM data of a first time period (e.g., 2 seconds) collected by a specified number (e.g., number of PCM data specified for separation) of PCM data of a specified time (e.g., 0.5 seconds) output (or acquired) through the decoder (232) as the first audio data of the first time period, which is the input PCM data for separation through the audio separator (24).

[0071] A processor (220) according to one embodiment can obtain (or calculate) a first time period (e.g., PCM input duration) corresponding to input PCM data based on the following mathematical formula 1.

[0072]

[0073] In the above mathematical formula 1, PCM size may be the data size of the input PCM data input to the audio separator (24) for separation. channel may be the channel of the input PCM data. speed may be the speed of the input PCM data. sample rate may be the sample rate of the input PCM data. bit depth may be the bit depth of the input PCM data.

[0074] A processor (220) according to one embodiment can identify a first separation time to be taken to perform separation of input PCM data of a first time period based on the following mathematical formula 2.

[0075]

[0076] According to the above mathematical formula 2, the separation time may be the actual separation time taken when performing separation in the electronic device (201) prior to the first audio data of the first time period. A processor (220) according to one embodiment may obtain a real time factor value by using the value obtained by dividing the separation time by the first time period. A processor (220) according to one embodiment may identify the first separation time required to perform separation for the first audio data of the first time period by using the real time factor value. A processor (220) according to one embodiment may obtain the cumulative average value of previous real time factor values ​​obtained when performing separation for each of the plurality of audio data prior to the first audio data of the first time period, and identify the first separation time for the first audio data of the first time period by using the cumulative average value of the real time factor values. A processor (220) according to one embodiment may obtain a cumulative average value of previous real-time factor values ​​obtained when performing separation for each of a plurality of audio data prior to the first audio data of the first time period, and may identify a first separation time for the first audio data of the first time period using the cumulative average value of the real-time factor values ​​and the state information of the electronic device. The state information of the electronic device (201) according to one embodiment may include the usage and / or occupancy of at least one processor (220) (e.g., CPU, AP, and / or audio processor) and / or memory (230), the power consumption of the battery (e.g., 189 in FIG. 1) of the electronic device (201), applications running in the background of the electronic device (201), and / or network connection status information of the electronic device (201).

[0077] A processor (220) according to one embodiment can obtain first multiple audio source data by performing separation of first audio data of a first time period through an audio separator (24). A processor (220) according to one embodiment can obtain first multiple audio source data by transmitting PCM data of a specified time (e.g., 0.5 seconds) obtained through a decoder (232) to an audio separator (24) PCM data of a specified number of times (e.g., minimum analysis (or separation) duration) collected for the purpose of performing separation (e.g., 2 seconds), and by performing separation of first audio data of a first time period through an audio separator (24). A processor (220) according to one embodiment may obtain first multiple sound source audio data by performing separation to individually extract multiple sound sources (or sound source sources) (e.g., vocals, instruments, background sound, noise, and / or other sources) from first audio data of a first time period through an audio separator (24). For example, the separation may be sound source separation. A processor (220) according to one embodiment may perform sound source separation using a specified number of classification information through an audio separator (24), and the specified number may not be limited to a specific number.

[0078] A processor (220) according to one embodiment can determine whether the data size of the first plurality of sound source audio data is equal to or greater than the size of the audio data corresponding to the first audio rendering time (or first playback time) associated with the first separation time. According to one embodiment, the first audio rendering time may be the time required to render (or play) audio data prepared for audio rendering in a buffer (e.g., a first buffer, an output buffer, or a buffer storing audio data to be input to an audio renderer (236)). A processor (220) according to one embodiment can determine whether the data size of the first plurality of sound source audio data is equal to or greater than the data size corresponding to the first audio rendering time associated with the first separation time. For example, if the data size of the first plurality of sound source audio data is equal to or greater than the size of the audio data corresponding to the first audio rendering time associated with the first separation time, the amount of the first plurality of sound source audio data being audio-rendered is not insufficient when the separation of the second audio data in the second time period following the first time period is being performed, so audio interruption may not occur between the audio output of the first plurality of sound source audio data and the audio output of the second plurality of sound source audio data. For example, if the data size of the first plurality of sound source audio data is not equal to or greater than (is smaller than) the size of the audio data corresponding to the first audio rendering time associated with the first separation time, the amount of the first plurality of sound source audio data being audio-rendered becomes insufficient when the separation of the second audio data is being performed, so audio interruption may occur between the audio output of the first plurality of sound source audio data and the audio output of the second plurality of sound source audio data.

[0079] A processor (220) according to one embodiment can output the first plurality of sound source audio data through an audio renderer (236) and output it through a sound output module (255) if the data size of the first plurality of sound source audio data is equal to or greater than the data size corresponding to the first audio rendering time associated with the first separation time. A processor (220) according to one embodiment can adjust the volume of each of the first plurality of sound source audio data. A processor (220) according to one embodiment can adjust the volume of each of the first plurality of sound source audio data to a volume size entered by a user or a volume size automatically designated. For example, if the first plurality of sound source audio data includes voice sound source audio data and instrument sound source audio data, and the volume size is designated by user input or automatically so that the volume of the voice sound source audio data becomes 56%, the processor (220) can adjust the volume size of the voice sound source audio data among the first plurality of sound source audio data to 56% and the volume size of the instrument sound source audio data to 100%. For example, if the first plurality of sound source audio data includes voice sound source audio data and noise sound source audio data and is designated to have noise removed by user input or automatically, the processor (220) can adjust the volume of the voice sound source audio data among the first plurality of sound source audio data to 100% and the volume of the noise sound source audio data to 0%. In one embodiment, the processor (220) can mix the volume-adjusted first plurality of sound source audio data and transmit the mixed first plurality of sound source audio data to an audio renderer (236). In one embodiment, the processor (220) can audio render the mixed first plurality of sound source audio data through the audio renderer (236) and output it through an audio output module (255).

[0080] A processor (220) according to one embodiment may delay audio rendering by storing the first plurality of audio data in a first buffer (e.g., output buffer, or a buffer storing audio data to be input to the audio renderer (236)) without transmitting the first plurality of audio data to the audio renderer (236) if the data size of the first plurality of audio data is smaller than the data size corresponding to the first audio rendering time associated with the first separation time. A processor (220) according to one embodiment may obtain the second plurality of audio data by performing separation on the second audio data of the second time period following the first time interval after storing the first plurality of audio data in the first buffer. The separation of the second audio data may be similar to the separation operation for the first audio data. A processor (220) according to one embodiment may merge the first plurality of audio data and the second plurality of audio data when the second plurality of audio data is obtained and the first plurality of audio data exists in the first buffer. A processor (220) according to one embodiment may transmit the merged first plurality of sound source audio data and second plurality of sound source audio data to an audio renderer (236). A processor (220) according to one embodiment may adjust the volume of each of the first plurality of sound source audio data and the second plurality of sound source audio data, mix the volume-adjusted first plurality of sound source audio data and second plurality of sound source audio data, and transmit the mixed first plurality of sound source audio data and second plurality of sound source audio data to an audio renderer (236) to output through an audio output module (255).According to one embodiment, the processor (220) stores the first plurality of sound source audio data in a first buffer when the data size of the first plurality of sound source audio data is smaller than the size of the audio data corresponding to the first audio rendering time associated with the first separation time, and then merges the first plurality of sound source audio data and the second plurality of sound source audio data when the next second plurality of sound source audio data is acquired and performs audio rendering, thereby preventing an audio interruption phenomenon that occurs between the audio output of the first plurality of sound source audio data and the audio output of the second plurality of sound source audio data when the amount of audio rendering of the first plurality of sound source audio data becomes insufficient while the separation of the second plurality of sound source audio data is being performed.

[0081] A processor (220) according to one embodiment may terminate when the end of the audio content (e.g., audio stream) (e.g., EOS (end of stream)) is identified after repeating the decoding, separation, and audio rendering as described above until the last audio data of the last time period of the audio content.

[0082] A processor (220) according to one embodiment of the present disclosure may initiate an operation to scan audio content based on an input for scanning audio content.

[0083] An audio scan (or scan operation) according to one embodiment may mean obtaining information about a segment containing a sound source (or sound source source) of a specific category (or category) among segments of audio content (e.g., audio stream) of content (e.g., video or audio file) desired by the user. A video and audio file according to one embodiment may have one or more audio tracks. An audio track according to one embodiment may contain audio content. A processor (220) according to one embodiment may obtain audio content (e.g., PCM data) by decompressing audio content through a decoder (232) and scan the audio content through an audio scanner (22) to obtain information about a segment containing a sound source of a specific category (or category) among segments of audio content as a scan result. A processor (220) according to one embodiment may display information on a display (260) regarding a section of audio content that contains a sound source of a specific category (or category) among the sections of audio content when playing (or editing) audio content. Accordingly, the user may be able to know which section of audio content contains a sound source of which category.

[0084] In a processor (220) according to one embodiment, when decoding for audio scanning of audio content, the decoding time may vary depending on the length of the audio content and / or real-time status information (or performance) of the electronic device (201). The status information of the electronic device (201) according to one embodiment may include the usage and / or share of at least one processor (220) (e.g., CPU, AP, and / or audio processor) and / or memory (230), the power consumption of the battery (e.g., 189 in FIG. 1) of the electronic device (201), applications running in the background of the electronic device (201), and / or network connection status information of the electronic device (201). For example, if the length of the audio content is short, the decoding time may be short, but if the length of the audio content is long, the decoding time may be long. For example, if the real-time status information of the electronic device (201) corresponds to status information exceeding a specified performance level, the decoding time may be short, but if the real-time status information of the electronic device (201) corresponds to status information below a specified performance level, the decoding time may be long. A processor (220) according to one embodiment may perform an analysis to identify whether a sound source of a specific category (or category) is included in the decoded audio data for each specified section, and the time may be short when the section to be analyzed is short, and the time may be long when the section to be analyzed is long.

[0085] A processor (220) according to one embodiment can identify a specified maximum scan time through an audio scanner (22) when scanning audio content and control the scan time of the audio content (e.g., the actual scan time taken from the start of scanning the audio content until completion) so that it does not exceed the specified maximum scan time. A processor (220) according to one embodiment can determine a scan interval and a skip interval that prevent the scan time for the audio content from exceeding the specified maximum scan time by using a time period corresponding to the audio content, state information of an electronic device (201), and a specified maximum scan time through an audio scanner (22). A processor (220) according to one embodiment can perform the scanning of the audio content within a limited maximum scan time by using the scan interval and the skip interval through an audio scanner (22) to sample audio data of at least a portion of the audio data and analyze audio data of at least a portion of the audio data to identify the sound source category (category) to which the audio data belongs.

[0086] A processor (220) according to one embodiment can load audio content based on an input for scanning audio content.

[0087] A processor (220) according to one embodiment can identify whether previously acquired scan interval and skip interval information exists in correspondence with the loaded audio content. When the first scan is performed on the loaded audio content, previously acquired scan interval and skip interval information may not exist (e.g., may not be stored). For example, if a scan has been performed on audio data of at least a portion of the loaded audio content, previously acquired scan interval and skip interval may exist (e.g., stored).

[0088] A processor (220) according to one embodiment can determine a scan interval (e.g., a first scan interval) and a skip interval (e.g., a first skip interval) for audio data in a first section (e.g., an initial scan request section) among audio data included in audio content when there is no previously acquired scan interval and skip interval corresponding to audio content.

[0089] A processor (220) according to one embodiment may obtain a time period (content duration) corresponding to audio content, state information of an electronic device (e.g., first state information), and a specified maximum scan time (max scan time) when there is no previously obtained scan interval and skip interval corresponding to audio content to be scanned. For example, the first state information of the electronic device (101) may include usage and / or occupancy of at least one processor (220) (e.g., CPU, AP, and / or audio processor) and / or memory (230) corresponding to a first time (e.g., the start time of the first segment scan of audio content), power consumption of the battery (e.g., 189 in FIG. 1) of the electronic device (201), applications running in the background of the electronic device (201), and / or network connection status information of the electronic device (201). For example, the specified maximum scan time may be a scan time limit predefined in the audio scanner (22) of the electronic device (201) or in an application including the audio scanner (22). A processor (220) according to one embodiment may obtain a first estimated scan time for audio content using a time period corresponding to audio content, first state information of the electronic device, and the specified maximum scan time, and determine (or calculate or identify) a first scan interval and a first skip interval such that the first estimated scan time does not exceed the specified maximum scan time.

[0090] A processor (220) according to one embodiment can obtain an estimated scan time (e.g., a first estimated scan time) to be taken to scan audio content by using an estimated decoding time (e.g., a first estimated decoding time) to be taken to decode audio content and a value obtained by multiplying the time taken to scan one block of audio data by the number of blocks included in the time period of the audio content.

[0091] A processor (220) according to one embodiment can obtain an estimated decoding time (e.g., a first estimated decoding time) using the following mathematical formula 3.

[0092]

[0093] Referring to the above mathematical formula 3, the average decoding time may be the average decoding time taken to decode one unit of audio data contained in the audio content through the decoder (232). The content duration may be the time period of the audio content.

[0094] A processor (220) according to one embodiment can obtain an estimated scan time (e.g., a first estimated scan time) using the following mathematical formula 4.

[0095]

[0096] Referring to the above mathematical formula 4, the estimated scan time may be the larger of the estimated decoding time (e.g., the first estimated decoding time), the time required to scan one block of audio data (block scan time), and the number of blocks included in the content duration of the audio content (content duration / block).

[0097] A processor (220) according to one embodiment can decode audio data using a decoder (232) based on the determination of a first scan interval and a first skip interval, and store a first decoding time taken to decode audio data of a first section among the audio data.

[0098] A processor (220) according to one embodiment may use a first scan interval and a first skip interval to sample audio data of a first at least partial section of audio data of a first section decoded through a decoder (232), and analyze the sampled audio data of the first at least partial section to identify the sound source category (category) to which the audio data of the first section belongs. A processor (220) according to one embodiment may store a first scan time taken to sample audio data of the first at least partial section of audio data of the first section and analyze the sampled audio data of the first at least partial section to identify the sound source category (category) to which the audio data of the first section belongs.

[0099] A processor (220) according to one embodiment may determine (or update) a scan interval (e.g., a second scan interval) and a skip interval (e.g., a second skip interval) for audio data of a second section (e.g., a section after the first section) among audio data included in the audio content, if there is a previously acquired scan interval (e.g., a first scan interval) and a skip interval (e.g., a first scan interval) corresponding to the audio content.

[0100] A processor (220) according to one embodiment may obtain a first expected decoding time, a first expected scan time, a specified maximum scan time, a time period of audio content that is not scanned among audio content (e.g., a time period of audio content that is not scanned among audio content), and state information of an electronic device (e.g., second state information) to determine a second scan interval and a second skip interval. For example, the second state information of the electronic device (101) may include the usage and / or share of at least one processor (220) (e.g., CPU, AP, and / or audio processor) and / or memory (230) corresponding to a second time (e.g., the start time of the second segment scan of audio content), the power consumption of the battery (e.g., 189 in FIG. 1) of the electronic device (201), an application running in the background of the electronic device (201), and / or network connection status information of the electronic device (201).

[0101] A processor (220) according to one embodiment can determine (or calculate or identify) a second scan interval and a second skip interval such that the second expected scan time for audio data in a second segment does not exceed the specified maximum scan time by using a first expected decoding time, a first expected scan time, a specified maximum scan time, a time period of audio content that is not scanned among audio content (e.g., a time period of audio content that is not scanned among audio content), and state information of an electronic device (e.g., second state information). A processor (220) according to one embodiment can identify a starting point to apply the second scan interval and the second skip interval by using the second skip interval.

[0102] A processor (220) according to one embodiment may sample audio data of at least a second portion of audio data of a second portion of audio data decoded through a decoder (232) based on the determination of a second scan interval and a second skip interval. A processor (220) according to one embodiment may identify a sampling type designated for sampling using the second scan interval and the second skip interval. For example, the designated sampling type may include a first sampling type and / or a second sampling type. For example, the first sampling type may include a seek method (or mode or operation). The second sampling type may include a drop method (or mode or operation). A processor (220) according to one embodiment, when a first sampling type is specified, calculates the starting point of a second section of audio content based on a second scan interval, performs decoding from the starting point of the second section of audio content using a decoder (232) to obtain audio data of the second section, and can sample audio data of at least a second section from the audio data of the second section using a second scan interval and a second skip interval. A processor (220) according to one embodiment, when a second sampling type is specified, obtains audio data of the second section of audio content using a decoder (232), drops audio data of the section corresponding to the second skip interval from the audio data of the second section, and can sample audio data of at least a second section from the audio data of at least a second section corresponding to the second scan interval.

[0103] A processor (220) according to one embodiment can analyze audio data of at least a second sampled section to identify the sound source category (category) to which the audio data of the second section belongs.

[0104] A processor (220) according to one embodiment may terminate the audio scan operation when the end of the audio content (e.g., audio stream) (e.g., EOS (end of stream)) is identified after repeatedly performing a scan (or audio source analysis) up to the audio data of the last decoded segment of the audio content. A processor (220) according to one embodiment may store audio scan result information in memory (230) based on the termination of the audio scan operation.

[0105] A processor (220) according to one embodiment can identify whether scan result information of audio content exists in memory (230) based on an input for a scan request of audio content. If scan result information of audio content does not exist, the processor (220) according to one embodiment can perform a scan operation for audio content.

[0106] A processor (220) according to one embodiment can identify whether the version of the scan result information of the audio content stored is a version compatible with the electronic device (201) (e.g., a version available for use in the electronic device (201)) when scan result information of the audio content exists.

[0107] A processor (220) according to one embodiment may perform a scan operation on audio content if scan result information of audio content exists and the version of the scan result information of audio content is not a version compatible with the electronic device (201) (e.g., a version available for use in the electronic device (201)). A processor (220) according to one embodiment may identify whether the scan result information of audio content includes sound source information of a section requested by a user if scan result information of audio content exists and the version of the scan result information of audio content is a version compatible with the electronic device (201) (e.g., a version available for use in the electronic device (201)).

[0108] A processor (220) according to one embodiment can perform a scan of the section requested by the user if there is scan result information of audio content and the version of the scan result information of audio content is a version compatible with the electronic device (201) (e.g., a version available in the electronic device (201)) and does not include sound source information of the section requested by the user.

[0109] A processor (220) according to one embodiment can identify whether the section requested by the user is a section included in the skip interval when there is scan result information of audio content and the version of the scan result information of audio content is a version compatible with the electronic device (201) (e.g., a version available in the electronic device (201)) and includes sound source information of the section requested by the user.

[0110] A processor (220) according to one embodiment can perform a scan operation on audio content if there is scan result information of audio content, the version of the scan result information of audio content is a version compatible with the electronic device (201) (e.g., a version available for use in the electronic device (201)), includes sound source information of a section requested by the user, and the section requested by the user is a section included in a skip interval. A processor (220) according to one embodiment can extract sound source information of the section requested by the user from the scan result information of audio content and display it on a display (260) if there is scan result information of audio content, the version of the scan result information of audio content is a version compatible with the electronic device (201) (e.g., a version available for use in the electronic device (201)), includes sound source information of a section requested by the user, and the section requested by the user is not a section included in a skip interval.

[0111] A memory (230) according to one embodiment (e.g., memory (130) of FIG. 1) may store a plurality of applications (functions or programs) and data associated with each of the plurality of applications. A memory (230) according to one embodiment may store various data generated during the execution of a program (140), including a program (e.g., program (140) of FIG. 1) used for audio separation operations and / or audio scanning operations during content playback. A memory (230) according to one embodiment may include a decoder (232), an audio solution (234), and an audio renderer (236) used for audio separation operations and / or audio scanning operations during content playback of the present disclosure. Although the decoder (232), audio solution (234), and / or audio renderer (236) according to one embodiment are described as being stored in the memory (230) as software modules, it is understood that each may be separately mounted as a physical component. A memory (230) according to one embodiment may largely include a program area (e.g., 140) and a data area. The program area (e.g., 140) may store relevant program information for operating the electronic device (201), such as an operating system (OS) that boots the electronic device (201) (e.g., the operating system of FIG. 1 (e.g., 142)). The data area (not shown) may represent at least one buffer according to various embodiments and may store information (or data) acquired (or generated) during audio separation operations and / or audio scan operations during content playback. Additionally, the memory (230) may be configured to include at least one storage medium among flash memory, a hard disk, a multimedia card micro type memory (e.g., secure digital (SD) or extreme digital (XD) memory), RAM, and ROM.

[0112] An audio output module (255) according to one embodiment (e.g., the acoustic output module (155) of FIG. 1) can convert rendered audio data output through an audio renderer (236) into an analog audio signal and output it through a speaker. For example, the audio output module (255) may include a speaker.

[0113] A communication module (290) according to one embodiment (e.g., communication module (190) of FIG. 1) can communicate with a first external electronic device (e.g., electronic device (104) of FIG. 1). For example, the communication module (290) can receive content from the external electronic device or transmit information (e.g., scan result information of audio content) obtained through an audio separation operation and / or an audio scan operation during content playback to the external electronic device. According to one embodiment, the communication module (290) may include a cellular module, a Wi-Fi (wireless-fidelity) module, a Bluetooth module, or a near field communication (NFC) module. In addition, it may further include other modules capable of communicating with the first external electronic device (204).

[0114] A display (260) according to one embodiment (e.g., the display module (160) of FIG. 1) can display various information based on the control of a processor (220). For example, the display (260) can display a screen associated with an audio separation operation and / or a screen associated with the performance of an audio scan operation during the playback of the content of the present disclosure. According to one embodiment, the display (260) can be implemented in the form of a touch screen. When the display (260) is implemented in the form of a touch screen together with an input module, it can display various information generated according to the user's touch operation.

[0115] According to one embodiment, the electronic device (201) is not limited to the configuration shown in FIG. 2 and may be configured to include various additional components. In one embodiment, the main components of the electronic device were described through the electronic device (201) of FIG. 2. However, in various embodiments, not all components shown in FIG. 2 are essential components, and the electronic device (201) may be implemented with more components than shown, or with fewer components. Additionally, the connection relationships of the main components of the electronic device (201) described above through FIG. 2 may be changed according to various embodiments.

[0116] An electronic device according to one embodiment (the electronic device (101) of FIG. 1 or the electronic device (201) of FIG. 2) may include a display (160, 260), an audio output module (155, 255), a memory (130, 230) for storing commands, and at least one processor (120, 220). When the commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may determine a first scan interval based on an input for scanning audio content, using a time period corresponding to the audio content and a specified maximum scan time, such that the scan time for the audio content does not exceed the specified maximum scan time, and scan the audio content by sampling audio data corresponding to the audio content using the first scan interval. When executed individually or collectively by the above at least one processor (220), the electronic device (201) can be configured to identify a first plurality of sound source audio data corresponding to a first audio data of a first time period among the audio data using the scan result while playing the audio data of the audio content based on an input for playing the audio content, and to obtain the first plurality of sound source audio data by performing separation of the first audio data of the first time period using a real time factor value, and to output the first plurality of sound source audio data through the sound output module.

[0117] When the commands according to one embodiment are executed individually or collectively by the at least one processor (220), the electronic device (201) may identify a first separation time required to perform the separation of the first audio data of the first time period using the real-time factor value. When executed individually or collectively by the at least one processor (220), the electronic device (201) may transmit the first multiple audio data obtained by performing the separation of the first audio data of the first time period to an audio renderer and output it through the sound output module if the data size of the first multiple audio data obtained by performing the separation of the first audio data of the first time period is equal to or greater than the data size corresponding to the first audio rendering time associated with the first separation time. When executed individually or collectively by the at least one processor (220), the electronic device (201) may be configured to store the first plurality of sound source audio data in the first buffer of the memory if the data size of the first plurality of sound source audio data is smaller than the data size corresponding to the first audio rendering time associated with the first separation time. When executed individually or collectively by the at least one processor (220), the electronic device (201) may be configured to perform the separation on the second audio data of the second time period following the first time interval to obtain the second plurality of sound source audio data.When the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may be configured to merge the first plurality of sound source audio data and the second plurality of sound source audio data when the first plurality of sound source audio data exists in the first buffer, and to transmit the merged first plurality of sound source audio data and the second plurality of sound source audio data to the audio renderer (236) to output through the audio output module (255).

[0118] According to one embodiment, when the commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may obtain the real time factor value using the value obtained by dividing the separation time taken to perform separation in the electronic device (201) prior to the first audio data of the first time period by the first time period. According to one embodiment, when the commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may identify the cumulative average value of a plurality of real time factor values ​​obtained during separation for each of the plurality of audio data prior to the first audio data of the first time period. When the commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may identify the first separation time using the cumulative average value of the plurality of real time factor values ​​and the state information of the electronic device (201).

[0119] According to one embodiment, the state information of the electronic device may include the usage of the at least one processor and / or the memory, the occupancy rate of the at least one processor and / or the memory, the power consumption of the battery of the electronic device (201), information on applications running in the background of the electronic device (201), and / or network connection status information of the electronic device (201).

[0120] When the above commands according to one embodiment are executed individually or collectively by the at least one processor (220), the electronic device (201) may be configured to store the first audio data of the first time interval in the second buffer of the memory (130, 230) when the separation of the first audio data of the first time interval is not performed. When the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may be configured to perform the separation of the second audio data of the second time interval so that when the first audio data exists in the second buffer upon acquiring the second plurality of sound source audio data, the first audio data and the second plurality of sound source audio data are merged, and the merged first audio data and the second plurality of sound source audio data are transmitted to the audio renderer (236) to be output through the audio output module (255).

[0121] According to one embodiment, the audio content may include a first audio content including the first audio data of the first time period and a second content including the second audio data of the second time period. When the commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may identify whether the first audio data of the first time period and the second audio data of the second time period are continuous. When the commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may update the first inference data by accumulating separation result information for the second audio content following the separation result information for the first audio content, without initializing the first inference data that has accumulated separation result information for the first audio content when the first audio data of the first time period and the second audio data of the second time period have continuity. When the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may initialize the first inference data and acquire the second inference data that accumulates separation result information for the second audio content when the first audio data of the first time period and the second audio data of the second time period do not have continuity.

[0122] When the commands according to one embodiment are executed individually or collectively by the at least one processor (220), the electronic device (201) may obtain the time period corresponding to the audio content, the first state information of the electronic device (201), and the specified maximum scan time based on the input for scanning the audio content. When the commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may determine the first scan interval and the first skip interval using the time period corresponding to the audio content, the first state information of the electronic device (201), and the specified maximum scan time so that the scan time for the audio content does not exceed the specified maximum scan time. When the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may decode the audio content through a decoder to obtain audio data of a first segment among the audio data included in the audio content. When the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may sample at least a portion of the audio data of the first segment using the first scan interval and the first skip interval. When the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may analyze the at least portion of the audio data of the first segment to identify the sound source category to which the audio data of the first segment belongs.

[0123] When the commands according to one embodiment are executed individually or collectively by the at least one processor (220), the electronic device (201) may obtain second state information of the electronic device (201) for scanning audio data of a second section following audio data of the first section among the audio data. When the commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may obtain an estimated decoding time required to decode the audio data of the second section. When the commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may obtain a scan time required to scan audio data of the designated time section. When the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may identify the longer of the expected decoding time and the scan time as the expected scan time for the audio data of the specified time interval. When the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may determine the second scan interval and the second skip interval of the audio data of the second interval based on the expected scan time, the second state information of the electronic device (201), and the specified maximum scan time. When the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may sample at least a portion of the audio data of the second interval using the second scan interval and the second skip interval.When the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may analyze at least a portion of the audio data of the second section to identify the sound source category to which the audio data of the second section belongs.

[0124] When the commands according to one embodiment are executed individually or collectively by the at least one processor (220), the electronic device (201) may calculate the starting point of the second section of the audio content based on the second scan interval when a first sampling type is specified for sampling using the second scan interval and the second skip interval.

[0125] According to one embodiment, when the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may be configured to obtain the audio data of the second section by performing decoding from the starting point of the second section of the audio content using the decoder. When the above commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may be configured to sample at least a portion of the audio data of the second section using the second scan interval and the second skip interval.

[0126] When the commands according to one embodiment are executed individually or collectively by the at least one processor (220), the electronic device (201) may be configured to acquire audio data of the second section of the audio content using the decoder (232) when a second sampling type is specified for sampling using the second scan interval and the second skip interval. When the commands are executed individually or collectively by the at least one processor (220), the electronic device (201) may be configured to sample at least a portion of the audio data of the second section corresponding to the second scan interval.

[0127] FIG. 3 is a configuration diagram showing an audio separator according to one embodiment.

[0128] Referring to FIG. 3, an audio separator (24) according to one embodiment may be stored in memory (230) as a software module (or program). According to one embodiment, the audio separator (24) may be implemented as a hardware module (or component or hardware element).

[0129] An audio separator (24) according to one embodiment can separate at least one sound source audio data corresponding to a sound source (or sound source source) (e.g., vocal, instrument, background sound, noise, and / or other sound source sources) of at least one specified category (or category or classification criterion) from audio data (PCM data) of a certain interval (or time length) obtained from audio content, and can obtain (or output) at least one separated sound source audio data.

[0130] An audio separator (24) according to one embodiment may include a separation time estimator (310), a separator (320), an audio data scheduler (330), an audio buffer manager (340), device utilities (350), a content manager (360), and / or an audio processor module (360).

[0131] A processor (220) according to one embodiment may use a separation time estimator (310) to measure and store separation times (actually taken) whenever audio separation is performed on audio data in an electronic device (201). A processor (220) according to one embodiment may use the separation time estimator (310) to obtain a real time factor value based on the stored separation times and the above Equations 1 and 2, and may identify a (real-time) separation time (e.g., the separation time required to perform separation on the current audio data to be separated) based on the real time factor value.

[0132] A processor (220) according to one embodiment can perform separation of audio data through a separator (320). A processor (220) according to one embodiment can receive PCM data of a specified time (e.g., 0.5 seconds) from a decoder (232) through a separator (320) and PCM data of a specified number of times (e.g., a first time (e.g., minimum analysis (or separation) duration) collected for the number of PCM data specified for separation) (e.g., 2 seconds), and perform separation of the first audio data of the first time period to obtain a first plurality of sound source audio data. A processor (220) according to one embodiment can obtain (or output) the first plurality of sound source audio data by performing separation to individually extract a plurality of sound sources (or sound source sources) (e.g., vocals, instruments, background sound, noise, and / or other sources) from the first audio data of the first time period through a separator (320). For example, the separation may be sound source separation. A processor (220) according to one embodiment may perform sound source separation using a specified number of classification information through a separator (320), and the specified number may not be limited to a specific number. A processor (220) according to one embodiment may adjust the speed of computation through the separator (320) according to the state information of the electronic device (201).

[0133] A processor (220) according to one embodiment determines whether the size of the first plurality of sound source audio data (current Audio Output PCM data) output from the separator (320) is sufficient for audio rendering until the next second plurality of sound source audio data (Separation Output data) is output, based on the separation time (e.g., first separation time) being calculated in real time using the audio data scheduler (330). A processor (220) according to one embodiment can determine whether the data size of the first plurality of sound source audio data obtained from the separator (350) based on the first separation time is equal to or greater than the size of the audio data corresponding to the first audio rendering time (or first playback time) associated with the first separation time, using the audio data scheduler (330). According to one embodiment, the first audio rendering time may be the time required to render (or play) audio data prepared for audio rendering in a buffer (e.g., a first buffer, an output buffer, or a buffer that stores audio data to be input to an audio renderer (236)). For example, if the data size of the first plurality of sound source audio data is equal to or greater than the size of the audio data corresponding to the first audio rendering time associated with the first separation time, the amount of the first plurality of sound source audio data being rendered is not insufficient when the separation of the second audio data in the second time period following the first time period is being performed, so that no audio interruption may occur between the audio output of the first plurality of sound source audio data and the audio output of the second plurality of sound source audio data.For example, if the data size of the first plurality of sound source audio data is not equal to or greater than (or smaller than) the size of the audio data corresponding to the first audio rendering time associated with the first separation time, the amount of the first plurality of sound source audio data being audio rendered becomes insufficient when the separation of the second audio data is being performed, and thus an audio interruption may occur between the audio output of the first plurality of sound source audio data and the audio output of the second plurality of sound source audio data. A processor (220) according to one embodiment can use an audio data scheduler (320) to perform scheduling (managing the flow of data) so that audio output is not interrupted based on real-time factor values, by maintaining (storing in a buffer) the first multiple audio data obtained from a separator (350) and merging them with the first multiple audio data (separation output) obtained next from the separator (350) without transmitting the first multiple audio data to the audio renderer (236) (or Rendering Phase) if the data size of the first multiple audio data obtained from the separator (350) is insufficient to the data size of the data (Audio Output PCM data) at the first separation time (or the first audio rendering time associated with the first separation time). A processor (220) according to one embodiment can perform scheduling whenever separation is performed using an audio data scheduler (320).In the case where the data size of the first plurality of sound source audio data obtained from the separator (350) according to one embodiment is less than the data size (Audio Output PCM data) of the first separation time (or the first audio rendering time associated with the first separation time), it may be when there is no plurality of sound source audio data (separation output) obtained by the previous separation operation during the initial separation operation (when there is no separation output), or when the size of the last audio data of the first audio content is smaller than the data size corresponding to the separation time according to the real-time factor value during the playback of the first audio content and the second audio content which are not continuous.

[0134] A processor (220) according to one embodiment can determine the level of parallel processing of audio processing operations (e.g., volume control and mixing operations of multiple separated audio source data) and separation operations using a separator through an audio buffer manager (340), depending on the processor (e.g., GPU) overhead and scan time that occur when performing separation operations through a separator (320). If the performance of the electronic device (201) is lower than the specified performance, and the frequency of performing audio processing operations and separation operations simultaneously increases, it may affect other operations within the electronic device (201) due to excessive process occupancy. To prevent this, the processor (220) according to one embodiment can lower the maximum average usage of the processor (220) by controlling the audio processing operations and separation operations to be performed linearly through the audio buffer manager (340) when the separation time is sufficiently fast. A processor (220) according to one embodiment controls the audio processing operation and the separation operation to be performed in parallel using a buffer handling method through an audio buffer manager (340) when overhead needs to be reduced, and can optimize the computational execution speed during the separation operation by reducing the usage of the processor (220) (e.g., GPU) to a level where there is no problem with audio rendering. A processor (220) according to one embodiment can select or distinguish the audio content that is desired for separation (or requires separation or will perform separation) among the multiple audio contents through the audio buffer manager (340) when playing content containing multiple audio contents.In one embodiment, when audio data of audio content to be separated and audio data of audio content not to be separated are acquired consecutively, the processor (220) can handle each audio data through the audio buffer manager (340) so that they can be processed by separating them into different buffers (e.g., separable buffer and non-separable buffer). This improves processing speed and reduces the average processor (220) (e.g., GPU) utilization by preventing audio data of audio content not to be separated from being passed (input) to the separator (350).

[0135] A processor (220) according to one embodiment can obtain real-time (current) status information of an electronic device (201) through device utilities (350). A processor (220) according to one embodiment can obtain, through device utilities (350), the usage and / or share of the CPU, AP, and / or audio processor and / or memory (230) of the electronic device (201), the power consumption of the battery (e.g., 189 in FIG. 1) of the electronic device (201), applications running in the background of the electronic device (201), and / or network connection status information of the electronic device (201).

[0136] A processor (220) according to one embodiment may decide whether to retain inference data accumulated and stored for the separation results of audio content in the Separator (330) based on the relationship (e.g., whether there is continuity) of multiple audio data (e.g., consecutive audio content added to a single timeline within an editor) included in the content requested for playback (or editing) through the content manager (360).

[0137] In one embodiment, the processor (220) can perform separation of the first audio content through the content manager (360) and then perform separation of the second audio content. If the first audio content and the second audio content do not have continuity, the processor can specify a flag to terminate the first inference data accumulated for the first audio content (or initialize it), and then obtain and use the second inference data that accumulates separation result information for the second audio content. In one embodiment, the processor (220) can perform separation of the first audio content through the content manager (360) and then perform separation of the second audio content. If the first audio content and the second audio content have continuity, the first inference data accumulated with the separation result information for the first audio content is not initialized, and the first inference data can be updated by further accumulating the separation result information for the second audio content following the separation result information for the first audio content. A processor (220) according to one embodiment can determine through a content manager (360) that the first audio content and the second audio content have the same content path as a reference, and that the end time of the first audio content and the start time of the next second audio content are within an allowable error value, and thus determine that the first audio content and the second audio content have continuity even if they are separated (e.g., determine that the same content is simply split), thereby preventing the first inference data from being terminated or initialized.

[0138] A processor (220) according to one embodiment can adjust the volume of each of the multiple source audio data (e.g., first multiple source audio data or merged first and second multiple source audio data) obtained by using a separator (330) through an audio processor module (360) to a specified volume size (e.g., a volume specified by a user or automatically specified (e.g., volume specified for noise source audio data in the case of noise removal: 0). A processor (220) according to one embodiment can mix the first multiple source audio data with volume adjusted through the audio processor module (360) and transmit the mixed first multiple source audio data to an audio renderer (236).

[0139] A processor (220) according to one embodiment may audio render mixed first multiple sound source audio data through an audio renderer (236) and output it through a sound output module (255). A processor (220) according to one embodiment may separate and / or merge the mixed first multiple sound source audio data through an audio renderer (236) to match an audio data unit independent of the separation operation, thereby obtaining audio data for audio rendering, and output the obtained audio data through a sound output module (255). A processor (220) according to one embodiment may change audio attributes including the channel, sampling rate, and / or speed of the audio data for audio rendering if necessary.

[0140] FIG. 4 is a diagram showing cases of separating and processing content including a plurality of audio contents in an electronic device according to one embodiment.

[0141] Referring to FIG. 4, a first case according to one embodiment ( <case1>normal)(410) may indicate a case where the content includes continuous and different item1 audio content (412), item2 audio content (414), and item3 audio content (416), and the size of the audio data obtained from each of the item1 audio content (412), item2 audio content (414), and item3 audio content (416) (e.g., duration 2 sec) is not smaller than the size of the data that must be prepared for audio rendering (e.g., duration 2 sec). A processor (220) according to one embodiment may perform separation while playing the item1 audio content (412), item2 audio content (414), and item3 audio content (416) in order according to a content playback request in the first case (410). A processor (220) according to one embodiment decodes the item1 audio content (412) to perform audio data separation of the first section among the acquired audio data to obtain a first PCM output (e.g., multiple audio source audio data), and can store the first PCM output in a first buffer to delay audio rendering of the first PCM output because there is no data prepared for audio rendering when the first PCM output is obtained. The processor (220) according to one embodiment can perform audio rendering for the next PCM outputs without delay operation from the audio data separation operation of the section following the first section, because there may be sufficient data prepared for audio rendering. A processor (220) according to one embodiment can initialize inference data through a content manager (360) when the separation of item1 audio content (412), item2 audio content (414), and item3 audio content (416) begins, because item1 audio content (412), item2 audio content (414), and item3 audio content (416) are different contents.

[0142] A second case according to one embodiment ( <case2>A non-separable (420) may indicate a case where the content is not continuous and includes different item1 audio content (422), item2 audio content (424), and item3 audio content (426), where item1 audio content (422) is specified (or set) not to perform separation (the user specifies that they do not want separation), item2 audio content (424) and item3 audio content (426) are set to perform separation, and the audio data size (e.g., duration 2 sec) from each of item1 audio content (422), item2 audio content (424), and item3 audio content (426) is not smaller than the data size (e.g., duration 2 sec) that must be prepared for audio rendering. A processor (220) according to one embodiment can sequentially play item1 audio content (422), item2 audio content (424), and item3 audio content (426) in response to a content playback request in the second case (420), while storing the audio data of item1 audio content (422) in a buffer (e.g., second buffer) without performing separation, and separating the audio data of item2 audio content (424) and item3 audio content (426) in sequence. A processor (220) according to one embodiment can schedule the audio data of item1 audio content (422) stored in the buffer through an audio data scheduler (330) so that it can be delivered to the audio processor module (370) before the pcm outputs (e.g., multiple audio source data) after separating the audio data of item2 audio content (424).

[0143] Third case according to one embodiment ( <case3>small item)(430) may represent a case where the content includes continuous and different item1 audio content (432), item2 audio content (434), and item3 audio content (436), and the audio data size from item1 audio content (432) (e.g., duration 0.5 sec) is smaller than the data size that needs to be prepared for audio rendering (e.g., duration 2 sec), and the audio data size from each of item2 audio content (434) and item3 audio content (436) (e.g., duration 2 sec) is not smaller than the data size that needs to be prepared for audio rendering (e.g., duration 2 sec). A processor (220) according to one embodiment may perform separation by playing item1 audio content (432), item2 audio content (434), and item3 audio content (436) in order in response to a content playback request in the third case (430). According to one embodiment, the processor (220) may have insufficient audio data to be prepared for rendering during the time of separating the audio data of item2 audio content (434) because the audio data size (e.g., duration 0.5 sec) from item1 audio content (432) is smaller than the data size (e.g., duration 2 sec) to be prepared for audio rendering. According to one embodiment, the processor (220) may store the pcm outputs (e.g., multiple sound source audio data) obtained after separating from item1 audio content (432) in a buffer (e.g., first buffer) to delay, and after the next pcm outputs (e.g., multiple sound source audio data) are obtained after the next audio data separation operation of item2 audio content (434), the pcm outputs stored in the buffer and the next pcm outputs obtained can be merged to enable audio rendering.

[0144] Fourth case according to one embodiment ( <case4>A continuous split item (440) may include item1 audio content (442) and item2 audio content (444) that are continuous and identical to each other but split, and item3 audio content (446) that is different from item1 audio content (442) and item2 audio content (444), and may indicate a case where the audio data size (e.g., duration 2 sec) from each of item1 audio content (442), item2 audio content (444), and item3 audio content (446) is not smaller than the data size (e.g., duration 2 sec) that must be prepared for audio rendering. A processor (220) according to one embodiment may perform separation while playing item1 audio content (442), item2 audio content (444), and item3 audio content (446) in order according to a content playback request in the fourth case (440). A processor (220) according to one embodiment performs the separation of item1 audio content (442) and, when separating item2 audio content (444), since item1 audio content (442) and item2 audio content (444) are the same but split, the inference data may not be initialized when the separation of item2 audio content (444) begins.

[0145] FIG. 5 is a flowchart illustrating the audio data separation operation during content playback according to one embodiment.

[0146] Referring to FIG. 5, a processor (e.g., processor (120) of FIG. 1 or processor (220) of FIG. 2) of an electronic device according to one embodiment (e.g., electronic device (101) of FIG. 1 or electronic device (201) of FIG. 2) can perform at least one of 510 to 560 operations.

[0147] In operation 510, a processor (220) according to one embodiment may obtain audio data by decoding audio content (e.g., an audio stream) through a decoder (232) based on an input for content playback. The content according to one embodiment may include audio content or may include audio content and video content. The audio content according to one embodiment may include a first audio content and a second audio content. The first audio content and the second audio content according to one embodiment may be consecutive and different audio content. The processor (220) according to one embodiment may display a screen for content playback (or editing) on ​​a display (260) based on the execution of a content playback application (or content editing application) (or program). A processor (220) according to one embodiment may identify an input for content playback based on user input to a button (or icon) for a playback request on a screen for content playback. Based on the identification of the input for content playback, the processor (220) according to one embodiment may obtain audio data (e.g., audio PCM (pulse code modulation) data) by decoding audio content through a decoder (232). A processor (220) according to one embodiment may continuously output (or obtain) PCM data of a specified duration (e.g., 0.5 seconds) by decoding audio content through a decoder (232).

[0148] In operation 520, a processor (220) according to one embodiment can identify a first separation time to be taken to perform separation of first audio data of a first time period (e.g., input PCM data of the audio separator (24)) among audio data through an audio separator (24). For example, the processor (220) can acquire PCM data of a first time period (e.g., 2 seconds) that is the input PCM data for performing separation through the audio separator (24), by collecting PCM data of a specified time (e.g., 0.5 seconds) output (or acquired) through a decoder (232) in a specified number (e.g., number of PCM data specified for performing separation) and PCM data of a first time period (e.g., 2 seconds). A processor (220) according to one embodiment can obtain (or calculate) a first time period (e.g., PCM input duration) corresponding to input PCM data based on the above mathematical formula 1, and can identify a first separation time required to perform separation of the input PCM data of the first time period based on the above mathematical formula 2. A processor (220) according to one embodiment can obtain a real time factor value by using the value obtained by dividing the separation time, which is the actual separation time taken when performing separation in the electronic device (201) prior to the first audio data of the first time period, by the first time period. A processor (220) according to one embodiment can identify a first separation time required to perform separation of the first audio data of the first time period using the real time factor value.A processor (220) according to one embodiment may obtain a cumulative average value of previous real-time factor values ​​obtained when performing separation for each of a plurality of audio data prior to the first audio data of the first time period, and may identify a first separation time for the first audio data of the first time period using the cumulative average value of real-time factor values. A processor (220) according to one embodiment may obtain a cumulative average value of previous real-time factor values ​​obtained when performing separation for each of a plurality of audio data prior to the first audio data of the first time period, and may identify a first separation time for the first audio data of the first time period using the cumulative average value of real-time factor values ​​and the state information of the electronic device. The status information of the electronic device (201) according to one embodiment may include the usage and / or share of at least one processor (220) (e.g., CPU, AP, and / or audio processor) and / or memory (230), the power consumption of the battery (e.g., 189 in FIG. 1) of the electronic device (201), applications running in the background of the electronic device (201), and / or network connection status information of the electronic device (201).

[0149] In operation 530, a processor (220) according to one embodiment can obtain first multiple audio data of a first time period by performing separation of first audio data through an audio separator (24). A processor (220) according to one embodiment can obtain first multiple audio data of a first time period by performing separation of first audio data of a first time period (e.g., minimum analysis (or separation) duration) (e.g., 2 seconds) by transmitting PCM data of a specified time (e.g., 0.5 seconds) obtained through a decoder (232) to an audio separator (24), and obtain first multiple audio data of a first time period by performing separation of first audio data of a first time period through an audio separator (24). A processor (220) according to one embodiment may obtain first multiple sound source audio data by performing separation to individually extract multiple sound sources (or sound source sources) (e.g., vocals, instruments, background sound, noise, and / or other sources) from first audio data of a first time period through an audio separator (24). For example, the separation may be sound source separation. A processor (220) according to one embodiment may perform sound source separation using a specified number of classification information through an audio separator (24), and the specified number may not be limited to a specific number.

[0150] In operation 540, a processor (220) according to one embodiment may determine whether the data size of the first plurality of sound source audio data is equal to or greater than the size of the audio data corresponding to the first audio rendering time (or first playback time) associated with the first separation time. According to one embodiment, the first audio rendering time may be the time required to audio render (or play) audio data prepared for audio rendering in a buffer (e.g., a first buffer, an output buffer, or a buffer storing audio data to be input to the audio renderer (236)). A processor (220) according to one embodiment may determine whether the data size of the first plurality of sound source audio data is equal to or greater than the data size corresponding to the first audio rendering time associated with the first separation time. For example, if the data size of the first plurality of sound source audio data is equal to or greater than the size of the audio data corresponding to the first audio rendering time associated with the first separation time, the amount of the first plurality of sound source audio data being audio-rendered is not insufficient when the separation of the second audio data in the second time period following the first time period is being performed, so audio interruption may not occur between the audio output of the first plurality of sound source audio data and the audio output of the second plurality of sound source audio data. For example, if the data size of the first plurality of sound source audio data is not equal to or greater than (is smaller than) the size of the audio data corresponding to the first audio rendering time associated with the first separation time, the amount of the first plurality of sound source audio data being audio-rendered becomes insufficient when the separation of the second audio data is being performed, so audio interruption may occur between the audio output of the first plurality of sound source audio data and the audio output of the second plurality of sound source audio data.

[0151] In operation 550, if the data size of the first plurality of sound source audio data is equal to or greater than the data size corresponding to the first audio rendering time associated with the first separation time, the processor (220) according to one embodiment can audio render the first plurality of sound source audio data through an audio renderer (236) and output it through a sound output module (255). Before transmitting the first plurality of sound source audio data to the audio renderer (236), the processor (220) according to one embodiment can adjust the volume of each of the first plurality of sound source audio data. The processor (220) according to one embodiment can adjust the volume of each of the first plurality of sound source audio data to a volume size entered by a user or an automatically designated volume size. For example, if the first plurality of sound source audio data includes voice sound source audio data and instrument sound source audio data and the volume size is specified by user input or automatically so that the volume of the voice sound source audio data is 56%, the processor (220) can adjust the volume size of the voice sound source audio data among the first plurality of sound source audio data to 56% and the volume size of the instrument sound source audio data to 100%. For example, if the first plurality of sound source audio data includes voice sound source audio data and noise sound source audio data and the noise is specified to be removed by user input or automatically, the processor (220) can adjust the volume size of the voice sound source audio data among the first plurality of sound source audio data to 100% and the volume size of the noise sound source audio data to 0%. A processor (220) according to one embodiment displays a content editing screen for specifying (or changing) the volume size of each of the first plurality of sound source audio data, and can specify (or change) the volume size of at least some or all of the first plurality of sound source audio data by user input or automatically on the content editing screen.A processor (220) according to one embodiment can mix the volume-controlled first plurality of sound source audio data and transmit the mixed first plurality of sound source audio data to an audio renderer (236). A processor (220) according to one embodiment can audio render the mixed first plurality of sound source audio data through the audio renderer (236) and output it through a sound output module (255).

[0152] In operation 560, if the data size of the first plurality of sound source audio data is smaller than the data size corresponding to the first audio rendering time associated with the first separation time, the processor (220) according to one embodiment may delay audio rendering for the first plurality of sound source audio data by storing them in a first buffer of memory (230) (e.g., output buffer, or a buffer that stores audio data to be input to the audio renderer (236)) without transmitting them to the audio renderer (236). After storing the first plurality of sound source audio data in the first buffer, the processor (220) according to one embodiment may obtain the second plurality of sound source audio data by performing separation for the second audio data of the second time period following the first time interval. Separation for the second audio data may be similar to the separation operation for the first audio data. A processor (220) according to one embodiment may merge the first plurality of audio data and the second plurality of audio data when the second plurality of audio data is acquired and the first plurality of audio data exists in the first buffer. A processor (220) according to one embodiment may transmit the merged first plurality of audio data and the second plurality of audio data to an audio renderer (236). A processor (220) according to one embodiment may adjust the volume of each of the first plurality of audio data and the second plurality of audio data, mix the volume-adjusted first plurality of audio data and the second plurality of audio data, and transmit the mixed first plurality of audio data and the second plurality of audio data to an audio renderer (236) to output through an audio output module (255).According to one embodiment, the processor (220) stores the first plurality of sound source audio data in a first buffer when the data size of the first plurality of sound source audio data is smaller than the size of the audio data corresponding to the first audio rendering time associated with the first separation time, and then merges the first plurality of sound source audio data and the second plurality of sound source audio data when the next second plurality of sound source audio data is acquired and performs audio rendering, thereby preventing an audio interruption phenomenon that occurs between the audio output of the first plurality of sound source audio data and the audio output of the second plurality of sound source audio data when the amount of audio rendering of the first plurality of sound source audio data becomes insufficient while the separation of the second plurality of sound source audio data is being performed.

[0153] FIG. 6a is a flowchart illustrating an audio data separation operation according to whether the audio data requires separation during content playback according to one embodiment, FIG. 6b is a flowchart illustrating an operation continuing from FIG. 6a according to one embodiment, and FIG. 6c is a flowchart illustrating an operation continuing from FIG. 6b according to one embodiment.

[0154] Referring to FIG. 6a, a processor (e.g., processor (120) of FIG. 1 or processor (220) of FIG. 2) of an electronic device according to one embodiment (e.g., electronic device (101) of FIG. 1 or electronic device (201) of FIG. 2) may perform at least one of 612 to 656 operations.

[0155] In operation 612, a processor (220) according to one embodiment may receive input for content playback. A processor (220) according to one embodiment may display a screen for editing (or playing) content on a display (260) based on the execution of a content editing application (or content playback application) (or program). A processor (220) according to one embodiment may identify input for content playback based on user input for a button (or icon) for a playback request on the screen for editing content.

[0156] In operation 614, a processor (220) according to one embodiment may obtain audio data (e.g., PCM data) by decoding audio content (e.g., audio stream) through a decoder (232). For example, the audio stream may be content in a format for continuously transmitting digital audio data over time. Content according to one embodiment may include audio data or may include audio data and video data. Audio data according to one embodiment may include first audio data and second audio data. The first audio data and second audio data according to one embodiment may be continuous but different audio content. PCM data according to one embodiment may be data in which the amplitude of a sound wave is sampled at specific time intervals and expressed as discrete numbers to convert an analog audio signal (sound) into digital. The unit of PCM data according to one embodiment may be a sample. For example, the size of PCM data (e.g., bytes) for a certain duration (e.g., 1 second) can be calculated as the product of the sampling rate, the sample size, and the number of channels. A processor (220) according to one embodiment can obtain PCM data (e.g., input PCM data) of a specified duration (e.g., 0.5 seconds) by decoding audio content through a decoder (232).

[0157] In operation 616, a processor (220) according to one embodiment can identify whether audio data (PCM data) obtained through a decoder (232) is audio data that requires separation. A processor (220) according to one embodiment can identify whether audio content is audio content that requires separation through a content manager (360), and if it is audio content that requires separation, it can determine that the audio data (PCM data) needs to be separated.

[0158] In operation 618, the processor (220) according to one embodiment may store audio data in a designated buffer (e.g., a second buffer or an intermediate buffer) to store audio data that does not require separation when audio data separation is not required. The processor (220) according to one embodiment may schedule the audio data stored in the second buffer to be delivered to an audio renderer (236) via an audio data scheduler (330) so that audio can be rendered during a time period corresponding to the stored audio data.

[0159] In operation 620, a processor (220) according to one embodiment can identify whether multiple buffers are available to process audio data to be separated based on state information of the electronic device (201). A processor (220) according to one embodiment can determine whether to use a single buffer or a double buffer depending on the state information of the electronic device (201) and / or the delay of the separation operation through the audio separator (24).

[0160] In operation 622, the processor (220) according to one embodiment can process audio data to be separated in parallel by applying multiple buffers (double buffering) through the audio buffer manager (340) when the state information of the electronic device (201) is in a state where multiple buffers can be used and the separation operation is delayed.

[0161] In operation 624, the processor (220) according to one embodiment can process the audio data to be separated sequentially by applying a single buffering through the audio buffer manager (340) when the state information of the electronic device (201) is not in a state where multiple buffers can be used or the separation operation is not delayed.

[0162] In operation 626, a processor (220) according to one embodiment can identify whether the audio content of the audio data has continuity with the audio content of the previous audio data. A processor (220) according to one embodiment can identify whether the audio content of the audio data has continuity with the audio content of the previous audio data through a content manager (360).

[0163] In operation 628, the processor (220) according to one embodiment may reset the inference data that has accumulated separation result information for the audio content and perform operation 630 if the audio content of the audio data does not have continuity with the audio content of the previous audio data. The processor (220) according to one embodiment may request the separator (320) through the content manager (360) to reset the inference data that has accumulated separation result information for the audio content. The processor (220) according to one embodiment may perform operation 630 without resetting the inference data that has accumulated separation result information for the audio content if the audio content of the audio data has continuity with the audio content of the previous audio data.

[0164] In operation 630, the processor (220) according to one embodiment may initiate a separation operation for audio data. The processor (220) according to one embodiment may initiate a separation operation for audio data through a separator (320).

[0165] In operation 632, a processor (220) according to one embodiment can obtain a real-time factor value and state information of an electronic device (201). A processor (220) according to one embodiment can obtain a real-time factor value by dividing the actual separation time taken to perform separation in the electronic device (201) prior to the current audio data to be separated (e.g., first audio data of a first time period) by the first time period. A processor (220) according to one embodiment can obtain state information of the electronic device (201) (e.g., first state information) through device utilities (350). The status information of the electronic device (201) according to one embodiment may include the usage and / or share of at least one processor (220) (e.g., CPU, AP, and / or audio processor) and / or memory (230), the power consumption of the battery (e.g., 189 in FIG. 1) of the electronic device (201), applications running in the background of the electronic device (201), and / or network connection status information of the electronic device (201).

[0166] In operation 634, a processor (220) according to one embodiment can identify a first separation time required to perform separation of the first audio data of the first time period using real-time factor values ​​and state information of the electronic device. A processor (220) according to one embodiment can obtain a cumulative average value of previous real-time factor values ​​obtained when performing separation of each of the plurality of audio data prior to the first audio data of the first time period, and can identify a first separation time for the first audio data of the first time period using the cumulative average value of real-time factor values. A processor (220) according to one embodiment can obtain a cumulative average value of previous real-time factor values ​​obtained when performing separation of each of the plurality of audio data prior to the first audio data of the first time period, and can identify a first separation time for the first audio data of the first time period using the cumulative average value of real-time factor values ​​and state information of the electronic device.

[0167] In operation 636, a processor (220) according to one embodiment can obtain first multiple source audio data by performing separation of first audio data of a first time period through an audio separator (24). A processor (220) according to one embodiment can obtain first multiple source audio data by transmitting PCM data of a specified time (e.g., 0.5 seconds) obtained through a decoder (232) to an audio separator (24) PCM data of a specified number of times (e.g., minimum analysis (or separation) duration) collected for the first time period (e.g., 2 seconds), and by performing separation of first audio data of a first time period through an audio separator (24). A processor (220) according to one embodiment may obtain first multiple sound source audio data by performing separation to individually extract multiple sound sources (or sound source sources) (e.g., vocals, instruments, background sound, noise, and / or other sources) from first audio data of a first time period through an audio separator (24). For example, the separation may be sound source separation. A processor (220) according to one embodiment may perform sound source separation using a specified number of classification information through an audio separator (24), and the specified number may not be limited to a specific number.

[0168] In operation 638, the processor (220) according to one embodiment can identify whether audio data prior to the first time period exists in a designated buffer (e.g., a second buffer).

[0169] In operation 640, if audio data prior to the first time period exists in a designated buffer (e.g., a second buffer), the processor (220) according to one embodiment may store (or merge) the audio data prior to the first time period in a first buffer of memory (230) (e.g., an output buffer, or a buffer that stores audio data to be input to an audio renderer (236).

[0170] In operation 642, the processor (220) according to one embodiment may store the first plurality of sound source audio data obtained by performing separation of the first audio data of the first time period in the first buffer when there is no audio data prior to the first time period in the designated buffer (e.g., second buffer).

[0171] In operation 644, a processor (220) according to one embodiment may determine whether the data size of the first plurality of sound source audio data is equal to or greater than the size of the audio data corresponding to the first audio rendering time (or first playback time) associated with the first separation time. According to one embodiment, the first audio rendering time may be the time required to audio render (or play) audio data prepared for audio rendering in the first buffer.

[0172] In operation 646, if the data size of the first plurality of sound source audio data is equal to or greater than the data size corresponding to the first audio rendering time associated with the first separation time, the processor (220) according to one embodiment may transmit the first plurality of sound source audio data to the audio processing module (370) to start audio processing for the first plurality of sound source audio data. If the data size of the first plurality of sound source audio data is smaller than the data size corresponding to the first audio rendering time associated with the first separation time, the processor (220) according to one embodiment may return to operation 616 to perform separation of the second audio data for a subsequent second time period and process the first plurality of sound source audio data by merging them with the second plurality of sound source audio data, which is the result of separation of the audio data for the second time period.

[0173] In operation 648, a processor (220) according to one embodiment can adjust the volume of each of the separated first plurality of sound source audio data. A processor (220) according to one embodiment can adjust the volume of each of the first plurality of sound source audio data through an audio processing module (370) to a volume size entered by a user or automatically specified to a volume size. For example, if the first plurality of sound source audio data includes voice sound source audio data and instrument sound source audio data and the volume size is specified by user input or automatically so that the volume of the voice sound source audio data becomes 56%, the processor (220) can adjust the volume size of the voice sound source audio data among the first plurality of sound source audio data to 56% and the volume size of the instrument sound source audio data to 100%. For example, if the first plurality of sound source audio data includes voice sound source audio data and noise sound source audio data and is designated to remove noise by user input or automatically, the processor (220) can adjust the volume of the voice sound source audio data among the first plurality of sound source audio data to 100% and the volume of the noise sound source audio data to 0%. In one embodiment, the processor (220) displays a content editing screen for specifying (or changing) the volume of each of the first plurality of sound source audio data, and can allow the volume of at least some or all of the first plurality of sound source audio data to be specified (or changed) by user input or automatically on the content editing screen.

[0174] In operation 650, the processor (220) according to one embodiment can mix the first plurality of volume-controlled audio source data. The processor (220) according to one embodiment can mix the volume-controlled first plurality of audio source data into one audio data through an audio processing module (370).

[0175] In operation 652, a processor (220) according to one embodiment may transmit mixed first multiple sound source audio data to an audio renderer (236). A processor (220) according to one embodiment may split the mixed first multiple sound source audio data to fit the input data size of the audio renderer (236) and transmit it to the audio renderer (236).

[0176] In operation 654, the processor (220) according to one embodiment can render audio data through an audio renderer (236) and output it through an audio output module (255).

[0177] In operation 656, the processor (220) according to one embodiment may identify whether the audio content has ended (e.g., EOS (end of stream)). If the end of the audio content is identified, the processor (220) according to one embodiment may terminate the playback and separation operations for the audio content. If the end of the audio content is not identified, the processor (220) according to one embodiment may return to operation 616 and repeat decoding, separation, and audio rendering until the last audio data of the last time period of the audio content, and then terminate if the end of the audio content (e.g., audio stream) (e.g., EOS (end of stream)) is identified.

[0178] FIG. 7 is a configuration diagram showing an audio scanner according to one embodiment.

[0179] Referring to FIG. 7, an audio scanner (22) according to one embodiment may be stored in memory (230) as a software module (or program). According to one embodiment, the audio scanner (22) may be implemented as a hardware module (or part or element).

[0180] A processor (220) according to one embodiment can obtain information about a section containing a specific category (or category) of sound source (or sound source source) (e.g., vocals, instruments, background sound, noise, and / or other sound source sources) among sections of audio content (e.g., audio stream) through an audio scanner (22). A processor (220) according to one embodiment can obtain information about sections of audio categories within an audio track by rapidly analyzing a long audio track using an audio scanner (22).

[0181] An audio scanner (22) according to one embodiment may include a decode time estimator (710), a scan time estimator (720), device utilities (730), a scan setting generator (740), a scanner (750), and an analyze result extractor (760). A decode time estimator (710) according to one embodiment may predict (or measure or acquire) the time required to generate input data (e.g., PCM data) of an audio solution (234). A scan time estimator (720) according to one embodiment may predict (or measure or acquire) the time required to analyze audio data. A device utility (730) according to one embodiment can obtain real-time (current) status information of an electronic device (201). A scan setting generator (740) according to one embodiment can obtain an estimated scan time for audio content and determine (or set) a scan interval and a skip interval so that the estimated scan time does not exceed a specified maximum scan time. A scanner (750) according to one embodiment can identify audio categories by analyzing PCM data. An analyze result extractor (760) according to one embodiment can configure analysis result data of an audio solution (234).

[0182] A processor (220) according to one embodiment can obtain and store the seek time required for decoding audio content and the decoding time required to decode audio data in sections at the decoder (232) through a decode time estimator (710), and obtain the average decoding time required to decode audio data in one section.

[0183] A processor (220) according to one embodiment can obtain an estimated scan time to scan audio content by using a scan time estimator (720) to obtain the value obtained by multiplying the time required to scan one block of audio data by the number of blocks included in the time period of the audio content. For example, the scan time estimator (720) can determine the estimated scan time required to scan the entire audio data. The estimated scan time may include block scan time. A block may refer to the size of the minimum scan input of an audio solution (234).

[0184] A processor (220) according to one embodiment can obtain real-time (current) status information of an electronic device (201) through device utilities (730). A processor (220) according to one embodiment can obtain information of hardware elements of the electronic device (201) currently through device utilities (730) (e.g., usage and / or share of CPU, AP, and / or audio processor and / or memory (230), power consumption of the battery of the electronic device (201) (e.g., 189 in FIG. 1), information of applications running in the background of the electronic device (201), and / or network connection status information of the electronic device (201). According to one embodiment, since the decoding time and / or scan time may vary depending on the real-time (current) status information of the electronic device (201), the device utilities (730) can acquire the real-time (current) status information of the electronic device (201) for each audio data segment (or at specified time intervals) and transmit it to the scan setting generator (740).

[0185] A processor (220) according to one embodiment can obtain an estimated scan time for audio content using a time period corresponding to audio content, state information of an electronic device, and a specified maximum scan time through a scan setting generator (740), and determine (or calculate or identify or update) a scan interval and a skip interval so that the estimated scan time does not exceed the specified maximum scan time. For example, a scan setting generator (740) can generate a scan interval and a skip interval that determine the section of audio data to be analyzed (e.g., the section to be analyzed next) based on information transmitted through a decode time estimator (710) and a scan time estimator (720), status information of an electronic device (201) transmitted through device utilities (730), and a maximum scan time set in an audio solution (234) (e.g., App). According to one embodiment, the scan setting generator (740) can specify the maximum scan wait time required by the editor (e.g., an editing application) and the minimum scan interval of the audio solution (234) according to the editor and / or audio solution (234), and can calculate the expected scan time using the above Equation 4.

[0186] A processor (220) according to one embodiment can sample audio data of at least a portion of audio data of one portion of audio data using a scanner (750) with a scan interval and a skip interval, and analyze the sampled audio data of at least a portion of the portion to identify the sound source category (category) to which the audio data of one portion belongs.

[0187] A processor (220) according to one embodiment can acquire scan result information through an analyze result extractor (760) and store the scan result information in a specified data format. For example, the specified data format may be created as a hierarchical format so that the scan result information can be reused. Details regarding the data format may be described in detail later in FIG. 10.

[0188] FIG. 8 is a diagram showing the decode time and scan time for audio data in one section according to one embodiment.

[0189] Referring to FIG. 8, a processor (220) according to one embodiment may acquire (or measure) and store the decoding time (decode take n(ms)) required to decode one segment of audio data (810) in a decoder (232) through a decode time estimator (710). A processor (220) according to one embodiment may acquire (or measure) and store the scan time (scan take m(ms)) required to scan (or analyze) one segment of audio data (820) through a scan time estimator (720). The stored decoding times and scan times may be transmitted to and used by a scan setting generator (740).

[0190] FIG. 9 is a diagram showing an example of setting an analysis interval based on a skip interval and a scan interval according to one embodiment.

[0191] Referring to FIG. 9, a processor (220) according to one embodiment can calculate a skip interval (e.g., skip: x(ms)) and a scan interval (e.g., scan: y(ms)) based on the time taken to analyze and scan an audio stream (910) of a previous segment through a scan setting generator (740) (e.g., c. decoded & scanned). Based on the skip interval (e.g., skip: x(ms)) and the scan interval (e.g., scan: y(ms)), the processor (220) according to one embodiment can skip the decoding and / or scanning process by an audio stream (e.g., d. skip x(ms) stream) (920) corresponding to the skip interval x(ms) and analyze a new analysis segment (930) corresponding to the scan interval (e. decoded & scanned y(ms)). A processor (220) according to one embodiment may reduce the new analysis section (930) if a long time is required to analyze the previous analysis section (910), thereby ensuring that the scan time for the entire audio data does not exceed the specified maximum scan time. A processor (220) according to one embodiment may manage the analysis time in real time based on the status of the electronic device (201) based on the status information of the electronic device (201) and / or the status of the audio solution (234) based on the processing time of each operation of the audio solution (234) when scanning through the audio scanner (22), so that the scan time is not delayed beyond the specified time even if the performance of the electronic device and / or audio solution differs.

[0192] FIG. 10 is a diagram showing a specified data format for storing scan result information according to one embodiment.

[0193] Referring to FIG. 10, a processor (220) according to one embodiment may obtain scan result information through an analyze result extractor (760) and store the scan result information in a specified data format (1000). A processor (220) according to one embodiment may obtain ANALZYED_INFO, META_DATA_FORMAT_VERSION, SCAN INTERVAL, SKIP INTERVAL, SAMPLING TYPE, CLASSES, TIMELINES, START_TIME_US, END_TIME_US TIME LINE, and / or SOL_NAME as scan result information through an analyze result extractor (760) and store them in a specified data format (1000). For example, ANALZYED_INFO may represent a group of all analyzed information. META_DATA_FORMAT_VERSION may be version information for checking the Metadata Format History at the time the current Scan result was derived. SCAN INTERVAL may be scan interval information. SKIP INTERVAL may be skip interval information. SAMPLING TYPE may be information on the sampling method used (e.g., seek method or drop method). CLASSES and TIMELINES may be information representing the audio source name derived from Scan analysis and the section where the audio source is playing. START_TIME_US and END_TIME_US may be information on the current content. In addition, the data format (1000) may include other information or may not include at least some of the information described above.

[0194] FIG. 11 is a diagram showing sampling processing cases during content scanning according to one embodiment.

[0195] Referring to FIG. 11, a first case according to one embodiment ( <case1>Normal sampling) (1110) may be a case where content 1 (content1, duration 4m30s) (1112) currently used for scanning (e.g., audio analysis) does not require sampling. According to one embodiment, the processor (220) may decide not to perform sampling via the scan setting generator (1114) when the length of content 1 (1112) is short, as in the first case (1110), and the state information of the electronic device (201) measured by the device utilities (730) is sufficient to process content 1 (1112). Accordingly, the processor (220) may set the skip interval to 0. In the first case (1110) according to one embodiment, the processor (220) may set the scan interval to a small size (e.g., 10s) because sampling may be required according to the status information of the electronic device (201) that changes in real time, even if the length of the content 1 (1112) is short. In the first case (1110) according to one embodiment, even if the processor (220) has set the scan period in the scan setting generator (1114) to 10s according to the length of the content 1 (1112) of 4m 30s, the scan period may be adjusted (or changed) for additional sampling according to the state of the electronic device measured by the device utils.

[0196] A second case according to one embodiment ( <case2>Sampling on the long content) (1120) may occur when the length of content2 (content2, duration 5m30s) (1122) is long and the total estimated time required for scanning (e.g., sound source analysis) exceeds the maximum scan time depending on the performance of the decoder (232) and the audio scanner (22) and the status information of the electronic device (201). A processor (220) according to one embodiment can determine a scan interval (e.g., scan: 2s) and a skip interval (e.g., skip: 1s) so that the scan time for content 2 (1122) does not exceed the specified maximum scan time by using a time estimator (715) (e.g., including a decode time estimator (710) and a scan time estimator (720)) to obtain an estimated decoding time and an estimated scan time, a time period corresponding to content 2 (1122), state information of an electronic device (201), and a specified maximum scan time, and can perform sound source analysis by sampling at least a portion of the audio data of content 2 (1122) using the scan interval and the skip interval. Processor 220 can set the scan cycle in the scan setting generator (1124) according to the scan interval and skip interval.

[0197] Third case according to one embodiment ( <case3>Sampling on very long content) (1130) may be a case where the length of content 3 (content3, duration 1h30m) (1132) is longer than the length of content 2 (1122). A processor (220) according to one embodiment may determine a skip interval greater than the second case (e.g., skip: 18s) through a scan setting generator (1134). A processor (220) according to one embodiment may variably determine (or set) the skip interval in the third case (1140) by comparing the skip interval setting in the second case (1120) with the skip interval setting in the third case (1130).

[0198] Fourth case according to one embodiment ( <case4>Sampling on very long content in low-tier device) (1140) may be a case where the content 4 (content4, duration 1h30m) (1142)) of the same length as the third case (1130) but the state information of the electronic device (201) has lower performance (e.g., performance of hardware elements) than the third case (1130). In one embodiment, the processor (220) may determine the skip interval to be smaller than the third case (e.g., skip: 24s) through the scan setting generator (1144) when the state information of the electronic device (201) has lower performance (e.g., performance of hardware elements) than the third case.

[0199] According to one embodiment, the processor (220) may determine (or set) the skip interval of the fourth case to have more skip intervals than the skip interval of the third case when the performance (e.g., performance of hardware elements and / or software) of the state information of the electronic device (201) is lower than that of the third case (1130). For example, when the resources of the electronic device (201) are constrained because the performance of the decoder (232) and / or audio solution (234) may be degraded by the performance (e.g., state) of the hardware elements and / or software that changes in real time (e.g., instantaneous) in the electronic device (201), the scan setting generator (740) may set a skip interval different from the previous interval to perform sound source analysis within a limited time.

[0200] FIG. 12 is a flowchart illustrating an audio data scanning operation according to one embodiment.

[0201] Referring to FIG. 12, a processor (e.g., processor (120) of FIG. 1 or processor (220) of FIG. 2) of an electronic device according to one embodiment (e.g., electronic device (101) of FIG. 1 or electronic device (201) of FIG. 2) may perform at least one of 1210 to 1250 operations.

[0202] In operation 1210, a processor (220) according to one embodiment may obtain a content duration corresponding to audio content, state information of an electronic device (201), and a specified maximum scan time based on an input for scanning audio content. For example, the state information of the electronic device (201) may include information on hardware elements (e.g., usage and / or share of at least one processor (220) (e.g., CPU, AP, and / or audio processor) and / or memory (230), power consumption of the battery (e.g., 189 in FIG. 1) of the electronic device (201), information on applications running in the background of the electronic device (201), and / or network connection status information of the electronic device (201). For example, the specified maximum scan time may be a scan time limit predefined for the audio scanner (22) of the electronic device (201) or for an application including the audio scanner (22).

[0203] In operation 1220, a processor (220) according to one embodiment may obtain an estimated scan time for audio content using a time period corresponding to audio content, state information of an electronic device, and a specified maximum scan time, and determine (or calculate or identify) a scan interval and a skip interval so that the estimated scan time does not exceed the specified maximum scan time. A processor (220) according to one embodiment may obtain the larger value between an estimated decoding time (e.g., a first estimated decoding time) expected to take time to decode audio content and an estimated scan time (e.g., a first estimated scan time) expected to take time to scan audio content using the value obtained by multiplying the time taken to scan one block of audio data and the number of blocks included in the time period of audio content.

[0204] In operation 1230, the processor (220) according to one embodiment can obtain audio data of a first segment included in the audio data by decoding the audio data with the decoder (232) based on the determination of the scan interval and the skip interval.

[0205] In operation 1240, a processor (220) according to one embodiment can sample audio data of at least a first portion of audio data of a first portion using a scan interval and a skip interval.

[0206] In operation 1250, a processor (220) according to one embodiment can analyze audio data of at least a first portion of the sampled section to identify the sound source category (category) to which the audio data of the first section belongs.

[0207] A method for scanning and separating audio data during content playback in an electronic device (101, 201) according to one embodiment of the present disclosure may include, based on an input for scanning audio content, determining a first scan interval such that the scan time for the audio content does not exceed the specified maximum scan time by using a time period corresponding to the audio content and a specified maximum scan time, and scanning the audio content by sampling audio data corresponding to the audio content using the first scan interval. The method may include, based on an input for playing the audio content, identifying a first plurality of sound source audio data corresponding to a first audio data of a first time period among the audio data using the scan result while playing the audio data of the audio content, obtaining the first plurality of sound source audio data by performing separation of the first audio data of the first time period using a real time factor value, and outputting the first plurality of sound source audio data through the sound output module.

[0208] The method according to one embodiment may include an operation of identifying a first separation time required to perform the separation of the first audio data of the first time period using the real-time factor value. The method may include an operation of transmitting the first plurality of audio data to an audio renderer and outputting them through the sound output module (155, 255) if the data size of the first plurality of audio data obtained by performing the separation of the first audio data of the first time period is equal to or greater than the data size corresponding to the first audio rendering time associated with the first separation time. The method may include an operation of storing the first plurality of audio data in a first buffer of the memory (130, 230) if the data size of the first plurality of audio data is smaller than the data size corresponding to the first audio rendering time associated with the first separation time. The method may include an operation of obtaining the second plurality of audio data by performing the separation of the second audio data of the second time period following the first time interval. The above method may include, when the first plurality of sound source audio data exists in the first buffer, merging the first plurality of sound source audio data and the second plurality of sound source audio data, and transmitting the merged first plurality of sound source audio data and the second plurality of sound source audio data to the audio renderer (236) of the electronic device and outputting through the audio output module (155, 255) of the electronic device.

[0209] The method according to one embodiment may include an operation of obtaining the real-time factor value using a value obtained by dividing the separation time taken to perform separation in the electronic device prior to the first audio data of the first time period by the first time period. The method may include an operation of identifying the first separation time using the real-time factor value.

[0210] The method according to one embodiment may include an operation of identifying the cumulative average value of a plurality of real-time factor values ​​obtained when performing separation for each of the plurality of audio data prior to the first audio data of the first time period. The method may include an operation of identifying the first separation time using the cumulative average value of the plurality of real-time factor values ​​and the state information of the electronic device.

[0211] In the method according to one embodiment, the state information of the electronic device may include the usage of the at least one processor and / or the memory, the occupancy rate of the at least one processor and / or the memory, the power consumption of the battery of the electronic device, information on applications running in the background of the electronic device, and / or network connection status information of the electronic device.

[0212] The method according to one embodiment may include an operation of storing the first audio data of the first time interval in a second buffer of the memory (130, 230) when the separation of the first audio data of the first time interval is not performed. The method may include an operation of merging the first audio data and the second multiple audio data when the first audio data exists in the second buffer when the separation of the second audio data of the second time interval is performed and the second multiple audio data is obtained, and then transmitting the merged first audio data and the second multiple audio data to the audio renderer (236) to output through the audio output module (155, 255).

[0213] The method according to one embodiment may include an operation of obtaining a time period corresponding to the audio content, first state information of the electronic device, and a designated maximum scan time based on the input for scanning the audio content. The method may include an operation of determining a first scan interval and a first skip interval using the time period corresponding to the audio content, the first state information of the electronic device, and the designated maximum scan time so that the scan time for the audio content does not exceed the designated maximum scan time. The method may include an operation of obtaining audio data of a first segment among the audio data included in the audio content by decoding the audio content through a decoder (232) of the electronic device. The method may include an operation of sampling at least a portion of the audio data of the first segment using the first scan interval and the first skip interval. The method may include an operation of identifying the sound source category to which the audio data of the first segment belongs by analyzing at least a portion of the audio data of the first segment.

[0214] The method according to one embodiment may include an operation of obtaining a time period corresponding to the audio content, first state information of the electronic device (201), and a designated maximum scan time based on the input for scanning the content. The method may include an operation of determining a first scan interval and a first skip interval using the time period corresponding to the audio content, the first state information of the electronic device, and the designated maximum scan time so that the scan time for the audio content does not exceed the designated maximum scan time. The method may include an operation of obtaining audio data of a first section among the audio data included in the audio content by decoding the audio content through a decoder (232) of the electronic device (201). The method may include an operation of sampling at least a portion of the audio data of the first section using the first scan interval and the first skip interval. The above method may include an operation of analyzing at least a portion of the audio data of the first section to identify the sound source category to which the audio data of the first section belongs.

[0215] According to one embodiment, the method may include an operation of acquiring second state information of the electronic device for scanning audio data in a second section following the audio data in the first section among the audio data. The method may include an operation of acquiring an estimated decoding time required for the audio data in the second section to be decoded. The method may include an operation of acquiring a scan time required for scanning audio data in the designated time section. The method may include an operation of identifying the longer of the estimated decoding time and the scan time as the estimated scan time for the audio data in the designated time section. The method may include an operation of determining a second scan interval and a second skip interval for the audio data in the second section based on the estimated scan time, the second state information of the electronic device, and the designated maximum scan time. The method may include an operation of sampling at least a portion of the audio data in the second section using the second scan interval and the second skip interval. The method may include an operation of analyzing the at least portion of the audio data in the second section to identify the sound source category to which the audio data in the second section belongs.

[0216] FIG. 13a is a flowchart illustrating an audio data scanning operation according to the presence or absence of a previously acquired scan interval and a skip interval according to one embodiment, FIG. 13b is a flowchart illustrating an operation continuing from FIG. 13a according to one embodiment, and FIG. 13c is a flowchart illustrating an operation continuing from FIG. 13b according to one embodiment.

[0217] Referring to FIGS. 13a through 13c, a processor (e.g., processor (120) of FIG. 1 or processor (220) of FIG. 2) of an electronic device according to one embodiment (e.g., electronic device (101) of FIG. 1 or electronic device (201) of FIG. 2) may perform at least one of 1312 to 1352 operations.

[0218] In operation 1312, the processor (220) according to one embodiment may load audio content based on an input for scanning audio content. The processor (220) according to one embodiment may prepare audio to be scanned (or analyzed) based on an input for scanning audio content.

[0219] In operation 1314, a processor (220) according to one embodiment can identify whether there are previously acquired (or calculated) scan intervals and skip intervals corresponding to audio content. When the first scan is performed on the audio content, previously acquired scan interval and skip interval information may not exist (e.g., may not be stored). For example, if a scan has been performed on audio data of at least a portion of the loaded audio content, previously acquired scan intervals and skip intervals may exist (e.g., may be stored).

[0220] In operation 1316, if there are no previously acquired scan intervals and skip intervals corresponding to the audio content, the processor (220) according to one embodiment can calculate a scan interval (e.g., first scan interval) and a skip interval (e.g., first skip interval) for audio data in a first section (e.g., first scan section) among the audio data included in the audio content. If there are no previously acquired scan intervals and skip intervals corresponding to the audio content to be scanned, the processor (220) according to one embodiment can acquire a time period (content duration) corresponding to the audio content, state information of an electronic device (e.g., first state information), and a specified maximum scan time (max scan time). For example, the first state information of the electronic device (101) may include information of a hardware element corresponding to the first time (e.g., the start time of the first segment scan of audio content) (e.g., usage and / or share of at least one processor (220) (e.g., CPU, AP, and / or audio processor) and / or memory (230), power consumption of the battery (e.g., 189 in FIG. 1) of the electronic device (201), information of an application running in the background of the electronic device (201), and / or network connection status information of the electronic device (201). For example, the specified maximum scan time may be a scan time limit predefined for the audio scanner (22) of the electronic device (201) or an application including the audio scanner (22). A processor (220) according to one embodiment can obtain a first expected scan time for audio content using a time period corresponding to audio content, first state information of an electronic device, and a specified maximum scan time, and determine (or calculate or identify) a first scan interval and a first skip interval such that the first expected scan time does not exceed the specified maximum scan time.A processor (220) according to one embodiment can proceed to operation 1328 when the first scan interval and the first skip interval are determined.

[0221] In operation 1318, the processor (220) according to one embodiment can identify whether the length of audio data to be scanned obtained from the decoder (232) satisfies the scan interval (e.g., the first scan interval) if there is a previously acquired scan interval (e.g., the first scan interval) and a skip interval (e.g., the first scan interval) corresponding to the audio content. If the length of audio data to be scanned obtained from the decoder (232) does not satisfy the scan interval (e.g., the first scan interval), the processor (220) according to one embodiment can proceed to operation 1328.

[0222] In operation 1320, a processor (220) according to one embodiment may calculate (or update or determine) a scan interval (e.g., a first scan interval) and a skip interval (e.g., a first scan interval) previously acquired in response to audio content and a length of audio data to be scanned acquired from a decoder (232) that satisfies the scan interval (e.g., a first scan interval) and a skip interval (e.g., a second scan interval) for audio data in a second section (e.g., a section after the first section) among the audio data included in the audio content. A processor (220) according to one embodiment may obtain a first expected decoding time, a first expected scan time, a specified maximum scan time, a time period of audio content that is not scanned among audio content (e.g., a time period of audio content that is not scanned among audio content), and state information of an electronic device (e.g., second state information) to determine a second scan interval and a second skip interval. For example, the second state information of the electronic device (101) may include information of a hardware element corresponding to a second time (e.g., the start time of the second segment scan of audio content) (e.g., usage and / or share of at least one processor (220) (e.g., CPU, AP, and / or audio processor) and / or memory (230), power consumption of the battery (e.g., 189 in FIG. 1) of the electronic device (201), information of an application running in the background of the electronic device (201), and / or network connection status information of the electronic device (201).A processor (220) according to one embodiment can determine (or calculate or identify) a second scan interval and a second skip interval such that the second expected scan time for audio data in a second segment does not exceed the specified maximum scan time by using a first expected decoding time, a first expected scan time, a specified maximum scan time, a time period of audio content that is not scanned among audio content (e.g., a time period of audio content that is not scanned among audio content), and state information of an electronic device (e.g., second state information). A processor (220) according to one embodiment can identify a starting point to apply the second scan interval and the second skip interval by using the second skip interval.

[0223] In operation 1322, a processor (220) according to one embodiment may identify a sampling type designated for sampling using a second scan interval and the second skip interval. For example, the designated sampling type may include a first sampling type and / or a second sampling type. For example, the first sampling type may include a seek type (or mode or operation). The second sampling type may include a drop method (or mode or operation).

[0224] In operation 1324, the processor (220) according to one embodiment can identify whether the first sampling type (e.g., seek type) is specified. If the processor (220) according to one embodiment is not specified for the seek type, it can proceed to operation 1328.

[0225] In operation 1326, the processor (220) according to one embodiment can identify a decoding start point by performing a seek operation based on a second scan interval when a first sampling type (e.g., seek type) is specified.

[0226] In operation 1328, a processor (220) according to one embodiment may decode a corresponding section of audio content (e.g., a first section or a second section) using a decoder (232). In operation 1316, the processor (220) according to one embodiment may decode a first section of audio content (audio stream). In operation 1318 or the next operation 1326, the processor (220) according to one embodiment may decode audio data of a second section of audio content (audio stream).

[0227] In operation 1334, a processor (220) according to one embodiment may measure (or measure and store) the decoding time (e.g., first decoding time or second decoding time) taken to decode a corresponding section of content (e.g., first section or second section).

[0228] In operation 1336, the processor (220) according to one embodiment can identify whether a second sampling type (e.g., drop type) is specified. If the processor (220) according to one embodiment is not specified for the drop type, it can proceed to operation 1340.

[0229] In operation 1338, the processor (220) according to one embodiment can identify whether the decoded audio data corresponds to a section corresponding to a scan interval (a first scan interval or a second scan interval) if the drop type is specified. If the decoded audio data does not correspond to a section corresponding to a scan interval (a first scan interval or a second scan interval), the processor (220) according to one embodiment can drop the audio data corresponding to the skip interval and return to operation 1328.

[0230] In operation 1340, a processor (220) according to one embodiment can identify the sound source category (category) to which the decoded audio data belongs by analyzing at least a portion of the audio data corresponding to the scan interval (first scan interval or second scan interval) among the decoded audio data.

[0231] In operation 1342, a processor (220) according to one embodiment can analyze audio data of at least a portion of the interval corresponding to the scan interval and measure (or measure and store) the scan time taken to identify the sound source category (category) to which the decoded audio data belongs.

[0232] In operation 1344, the processor (220) according to one embodiment can identify whether the scan (or sound source analysis) to the audio data of the last decoded segment of the audio content has been completed. If the scan (or sound source analysis) to the audio data of the last decoded segment of the audio content has not been completed, the processor (220) according to one embodiment can identify that the scan has been completed if the end of the audio content (e.g., audio stream) (e.g., EOS (end of stream)) is identified after repeating operations 1314 through 1342.

[0233] In operation 1346, a processor (220) according to one embodiment may generate audio scan result information and store the audio scan result information in memory (230). A processor (220) according to one embodiment may acquire scan result information through an analyze result extractor (760) and store the scan result information in a specified data format (e.g., 1000).

[0234] FIG. 14 is a flowchart illustrating a scanning operation for audio content containing scan result information according to one embodiment.

[0235] Referring to FIG. 14, a processor (e.g., processor (120) of FIG. 1 or processor (220) of FIG. 2) of an electronic device according to one embodiment (e.g., electronic device (101) of FIG. 1 or electronic device (201) of FIG. 2) can perform at least one of operations 1412 to 1426.

[0236] In operation 1412, a processor (220) according to one embodiment can receive an input for a scan request of audio content.

[0237] In operation 1414, the processor (220) according to one embodiment can identify whether scan result information of audio content exists in memory (230) based on an input for a scan request of audio content. If scan result information of audio content does not exist, the processor (220) according to one embodiment can proceed to operation 1424.

[0238] In operation 1416, the processor (220) according to one embodiment can identify whether the version of the scan result information of the audio content is compatible with the electronic device (201) (e.g., whether it is a version available to the electronic device (201)) if scan result information of the audio content exists. If the version of the scan result information of the audio content according to one embodiment is not compatible with the electronic device (201), the processor (220) can proceed to operation 1424.

[0239] In operation 1418, the processor (220) according to one embodiment can identify whether the scan result information of the audio content includes the section requested by the user for scanning, if the scan result information of the audio content exists and the version of the scan result information of the audio content is a version compatible with the electronic device (201) (e.g., a version available in the electronic device (201)).

[0240] In operation 1420, the processor (220) according to one embodiment may request a scan of the excluded section requested by the user and proceed to operation 1424 if there is scan result information of audio content and the version of the scan result information of audio content is a version compatible with the electronic device (201) (e.g., a version available in the electronic device (201)) and the section requested by the user is not included.

[0241] In operation 1422, the processor (220) according to one embodiment can identify whether the segment requested by the user is included in the skip interval if scan result information of the audio content exists and the version of the scan result information of the audio content is a version compatible with the electronic device (201) (e.g., a version available in the electronic device (201)) and includes the segment requested by the user. The processor (220) according to one embodiment can proceed to operation 1424 if scan result information of the audio content exists and the version of the scan result information of the audio content is a version compatible with the electronic device (201) (e.g., a version available in the electronic device (201)) and includes sound source information of the segment requested by the user, and the segment requested by the user is included in the skip interval.

[0242] In operation 1424, the processor (220) according to one embodiment can perform a scan operation for audio content.

[0243] In operation 1426, the processor (220) according to one embodiment can extract sound source information of the section requested by the user. The processor (220) according to one embodiment can extract sound source information of the section requested by the user as a result of the scan operation of operation 1424, or extract sound source information of the section requested by the user from the scan result information of the audio content and display it on the display (260).

[0244] FIG. 15 is a drawing showing an example of a screen for audio scanning according to one embodiment.

[0245] Referring to FIG. 15, a processor (220) (e.g., the processor of FIG. 1) of an electronic device (201) (e.g., the electronic device (101) of FIG. 1) according to one embodiment may display a screen (1501) for audio scanning of content through a display (260). In the case where the content includes video content and audio content, the processor (220) according to one embodiment may display an image (1510) corresponding to the video data included in the content on the screen (1501) for scanning the content, display a bar (1520) indicating a section of the audio data, and display a menu (1530) associated with audio scanning. A processor (220) according to one embodiment may display a scan request icon (1540) for obtaining information on a segment containing a specific category (or category) of sound source (or sound source source) (e.g., vocals, instruments, background sound, noise, and / or other sound source sources) among segments of audio content (e.g., video or audio file) included in the content (e.g., video or audio file) when a menu (1530) associated with audio scanning is selected. A processor (1540) according to one embodiment may display a scan request icon (1540) for obtaining information on a segment containing specific sound source data (e.g., noise sound source data, auto eraser, or noise deduction). A processor (220) according to one embodiment can perform a scan within a limited maximum scan time through the scan operation of the present disclosure by displaying a screen (1502) indicating that audio scanning is in progress through a display (260) based on user input for a scan request icon (1540), and displaying information (1525) indicating that audio analysis is in progress on the screen (1502) indicating that audio scanning is in progress. A processor (220) according to one embodiment can store scan result information in a memory (230).

[0246] FIG. 16 is a drawing showing an example of a screen displaying audio scan results according to one embodiment.

[0247] Referring to FIG. 16, the processor (220) (e.g., the processor of FIG. 1) of the electronic device (201) (e.g., the electronic device (101) of FIG. 1) according to one embodiment may display screens (e.g., 1601, 1602) that display audio scan results of the content over time through the display (260) while performing a scan of the content. A processor (220) according to one embodiment may display an image (1610) corresponding to video data included in the content on a screen (1601) that displays the audio scan result of the content of the first section (1615) of the audio data, and may display a bar (1620) indicating the first section (1615) of the audio data, and may display at least one sound source information icon (1630) (e.g., an icon (1632) indicating vocal sound source audio data and / or an icon (1634) indicating noise sound source audio data) corresponding to a first point (1640) of the first section of the audio content using the audio scan result information. A processor (220) according to one embodiment may display an image (1612) corresponding to video data included in the content on a screen (1602) that displays the audio scan results of the content of the second section (1625) of the audio data and may display a bar (1626) indicating the second section (1625) of the audio data, and may display at least one sound source information icon (1650) (e.g., an icon (1651) indicating vocal sound source audio data and / or an icon (1652) indicating noise sound source audio data) corresponding to a first point (1660) of the second section (1625) of the audio content using the audio scan result information. On the screen (1602) that displays the audio scan results of the content of the second section (1625) of the audio data, input for volume adjustment using each icon (1652 or 1654) is received from the user, and a volume adjustment value for each sound source audio data can be set (or saved) according to the user input.For example, the processor (220) can set the volume adjustment value for the vocal audio data included in the audio content to 56% when it receives an input from a user to adjust the volume of the vocal audio data to 56% using an icon (1652) representing the vocal audio data. In one embodiment, the processor (220) can apply the stored or set volume adjustment value to the separated audio data when playing the audio content.

[0248] FIG. 17 is a drawing showing an example of a screen for content editing according to one embodiment.

[0249] Referring to FIG. 17, a processor (220) (e.g., the processor of FIG. 1) of an electronic device (201) (e.g., the electronic device (101) of FIG. 1) according to one embodiment may display a screen (1701) for editing (and / or playing) content through a display (260). Content according to one embodiment may include audio content and video content. Audio content according to one embodiment may include a first audio content and a second audio content. The first audio content and the second audio content according to one embodiment may be consecutive and different audio content. A processor (220) according to one embodiment may display a first screen (1701) for editing (and / or playing) content on the display (260) based on the execution of a content playback application (or content editing application) (or program). A processor (220) according to one embodiment may display a first image (1711) of video content included in the content on a first screen (1701) for editing (and / or playing) the content, and may display an object (1720) for starting and stopping playback. A processor (220) according to one embodiment may display images (1730) of video content played along a timeline and audio content (1740) played along a timeline on a first screen (1701) for editing (and / or playing) the content. A processor (220) according to one embodiment may play video content and audio content along a timeline, and while playing audio content, if separation of audio data is required for each section of the audio content, it may perform separation for the section where separation of audio data is required and / or adjust the volume of the separated audio data.A processor (220) according to one embodiment may perform scheduling using a real-time factor value when an input for starting playback is received through an object (1720) for starting and stopping playback on a first screen (1701) for editing (and / or playing) content, and may update and display (e.g., 1702, 1703) the screen (1701) for editing (and / or playing) content as audio content is played along with video content. Referring to the first screen (1701) for editing (and / or playing) content according to one embodiment, the processor (220) may delay audio rendering by storing the first PCM output in a first buffer when multiple sound source audio data (PCM output) are acquired after the first separation of audio data of the first section of audio content, because there is no data prepared for audio rendering. From the audio data separation operation of the next section following the first section, audio rendering can be performed without delay for the next PCM outputs because there may be sufficient data prepared for audio rendering. Referring to the second screen (1702) for editing (and / or playing) content according to one embodiment, the processor (220) can display the first audio content (1740-1) and the second audio content (1740-2) on a timeline as in the second screen (1702) for editing (and / or playing) content when the audio content includes the first audio content (1740-1) and the second audio content (1740-2), and can display the current playback position (1750).

[0250] A processor (220) according to one embodiment may perform separation of the last audio data while displaying a second image (1712) of the video content, such as a second screen (1702) for editing (and / or playing) the content, when the audio content includes a first audio content (1740-1) and a second audio content (1740-2), and the first audio content (1740-1) and the second audio content (1740-2) are different audio content and the last audio data of the first audio content (1740-1). A processor (220) according to one embodiment may delay audio rendering by storing the first plurality of audio data in a first buffer after separating the last audio data of the first audio content (1740-1) if the size of the plurality of audio data is smaller than the size of the data prepared for audio rendering.

[0251] A processor (220) according to one embodiment can prevent audio interruption by merging and processing the first multiple audio source data and the second multiple audio source data when acquiring the second multiple audio source data by performing separation of the first audio data of the second audio content (1740-2) following the last audio data of the first audio content (1740-1) while displaying a third image (1713) of video content, such as a third screen (1703) for editing (and / or playing) content.

[0252] The electronic device according to the various embodiments disclosed in this document may be a device of various forms. The electronic device may include, for example, a portable communication device (e.g., a smartphone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a consumer electronics device. The electronic device according to the embodiments of this disclosure is not limited to the devices described above.

[0253] Various embodiments of the present disclosure and the terms used therein are not intended to limit the technical features described in this document to specific embodiments, and should be understood to include various modifications, equivalents, or substitutions of said embodiments. In connection with the description of the drawings, similar reference numerals may be used for similar or related components. The singular form of a noun corresponding to an item may include one or more of said items unless the relevant context clearly indicates otherwise. In this document, phrases such as "A or B," "at least one of A and B," "at least one of A or B," "A, B or C," "at least one of A, B and C," and "at least one of A, B, or C" each may include any one of the items listed together in the corresponding phrase, or all possible combinations thereof. Terms such as "first," "second," or "first" or "second" may be used simply to distinguish said components from other said components and do not limit said components in any other aspect (e.g., importance or order). Where any (e.g., 1st) component is referred to as “coupled” or “connected” to another (e.g., 2nd) component, with or without the terms “functionally” or “communicationly,” it means that said any component may be connected to said other component directly (e.g., via a wire), wirelessly, or through a third component.

[0254] As used in this document, the term "module" may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit. A module may be a component formed integrally, or a minimum unit of said component or a part thereof that performs one or more functions. For example, according to one embodiment, a module may be implemented in the form of an application-specific integrated circuit (ASIC).

[0255] Various embodiments of the present disclosure may be implemented as software (e.g., program (140)) comprising one or more instructions stored in a storage medium (e.g., internal memory (136) or external memory (138)) readable by a machine (e.g., electronic device (101)). For example, a processor (e.g., processor (120)) of the machine (e.g., electronic device (101)) may call at least one of the one or more instructions stored from the storage medium and execute it. This enables the machine to be operated to perform at least one function according to the at least one called instruction. The one or more instructions may include code generated by a compiler or code that can be executed by an interpreter. The storage medium readable by the machine may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' merely means that the storage medium is a tangible device and does not contain a signal (e.g., electromagnetic waves), and this term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily.

[0256] According to one embodiment, the method according to the various embodiments disclosed herein may be provided by being included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)), or distributed online (e.g., download or upload) through an application store (e.g., Play Store™) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily created on a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0257] According to various embodiments, each component (e.g., module or program) of the components described above may include a singular or multiple entities. According to various embodiments, one or more of the components or operations among the aforementioned components may be omitted, or one or more other components or operations may be added. Generally or additionally, multiple components (e.g., module or program) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each of the components of the multiple components in the same or similar manner as those performed by the corresponding component among the multiple components prior to the integration. According to various embodiments, operations performed by the module, program, or other components may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, omitted, or one or more other operations may be added.

[0258] According to one embodiment, in a non-transient storage medium storing commands, the commands are configured to cause the electronic device to perform at least one operation when executed by the electronic device, wherein the at least one operation may include, based on an input for scanning audio content, determining a first scan interval such that the scan time for the audio content does not exceed the specified maximum scan time by using a time period corresponding to the audio content and a specified maximum scan time, and scanning the audio content by sampling audio data corresponding to the audio content using the first scan interval. The above at least one operation may include, based on an input for playing the audio content, identifying a first plurality of sound source audio data corresponding to a first audio data of a first time period among the audio data using the scan result while playing the audio data of the audio content, obtaining the first plurality of sound source audio data by performing separation of the first audio data of the first time period using a real time factor value, and outputting the first plurality of sound source audio data through the sound output module.

[0259] Furthermore, the embodiments of the present invention described in this specification and drawings are merely specific examples provided to facilitate the explanation of the technical content according to the embodiments of the present invention and to aid in understanding the embodiments of the present invention, and are not intended to limit the scope of the embodiments of the present invention. Accordingly, the scope of the various embodiments of the present invention should be interpreted to include all modifications or variations derived based on the technical concept of the various embodiments of the present invention, in addition to the embodiments described herein.

Claims

1. In an electronic device, display; Audio output module including a speaker; Memory for storing instructions; and It includes at least one processor, When the above commands are executed individually or collectively by the at least one processor, the electronic device, Based on the input for scanning audio content; Determining a first scan interval such that the scan time for the audio content does not exceed the specified maximum scan time by using a time period corresponding to the audio content and a specified maximum scan time, and Scanning the audio content by sampling audio data corresponding to the audio content using the first scan interval, and Based on the input for playing the above audio content: Identifying a first plurality of sound source audio data corresponding to a first audio data of a first time period among the audio data corresponding to the above audio content, and the first plurality of sound source audio data are identified using the scan result of the audio content while playing the audio data corresponding to the above audio content, and Separation of the first audio data of the first time period is performed using a real time factor value to obtain the first plurality of sound source audio data, and An electronic device that outputs the first plurality of sound source audio data through the sound output module.

2. In Paragraph 1, When the above commands are executed individually or collectively by the at least one processor, the electronic device, Identifying a first separation time for performing the separation of the first audio data of the first time period using the above real-time factor value, and If the data size of the first plurality of sound source audio data obtained by performing the separation on the first audio data of the first time period is equal to or greater than the data size corresponding to the first audio rendering time associated with the first separation time: The above first plurality of sound source audio data are transmitted to an audio renderer, and Output the first plurality of sound source audio data through the above sound output module, or If the data size of the first plurality of sound source audio data is smaller than the data size corresponding to the first audio rendering time associated with the first separation time, the first plurality of sound source audio data is stored in the first buffer of the memory, and When the above commands are executed individually or collectively by the at least one processor, the electronic device, Separation of the second audio data of the second time period following the first time interval is performed to obtain the second plurality of sound source audio data, and When the first plurality of sound source audio data is stored in the first buffer: The first plurality of sound source audio data and the second plurality of sound source audio data are merged, and An electronic device that transmits the merged audio data to the audio renderer to output it through the audio output module.

3. In Paragraph 1, When the above commands are executed individually or collectively by the at least one processor, the electronic device, The real-time factor value is obtained using the value obtained by dividing the separation time by the first time period, and the separation time is the electronic device obtained when separation is performed in the electronic device prior to the first audio data of the first time period.

4. In Paragraph 2, When the above commands are executed individually or collectively by the at least one processor, the electronic device, Identifying the cumulative average value of a plurality of real-time factor values ​​obtained when performing separation for each of the plurality of audio data prior to the first audio data of the first time period, and An electronic device that identifies the first separation time using the cumulative average value of the plurality of real-time factor values ​​and the state information of the electronic device.

5. In Paragraph 4, The electronic device comprises at least one of the following: the usage of at least one processor and / or memory, the occupancy rate of at least one processor and / or memory, the power consumption of the battery of the electronic device, information on applications running in the background of the electronic device, or network connection status information of the electronic device.

6. In Paragraph 2, When the above commands are executed individually or collectively by the at least one processor, the electronic device, If the separation for the first audio data of the first time period is not performed, the first audio data of the first time period is stored in the second buffer of the memory, and When the separation is performed on the second audio data of the second time period and the first audio data is stored in the second buffer upon acquiring the second plurality of sound source audio data: The first audio data and the second plurality of sound source audio data are merged, An electronic device that transmits the merged first audio data and the second plurality of sound source audio data to the audio renderer to output them through the audio output module.

7. In Paragraph 2, The above audio content includes a first audio content including the first audio data of the first time period and a second content including the second audio data of the second time period, and When the above commands are executed individually or collectively by the at least one processor, the electronic device, Identifying whether the first audio data of the first time period and the second audio data of the second time period are continuous, and If the first audio data of the first time period and the second audio data of the second time period have continuity, the first inference data accumulated with separation result information for the first audio content is not initialized, and the first inference data is updated by accumulating separation result information for the second audio content following the separation result information for the first audio content, or In the case where the first audio data of the first time period and the second audio data of the second time period do not have continuity: Initialize the above first inference data, and An electronic device for acquiring second inference data that accumulates separation result information for the above second content.

8. In Paragraph 1, When the above commands are executed individually or collectively by the at least one processor, the electronic device, Based on the input for scanning the audio content, the time period corresponding to the audio content, the first state information of the electronic device, and the specified maximum scan time are obtained, Determining the first scan interval and the first skip interval so that the scan time for the audio content does not exceed the specified maximum scan time by using the time period corresponding to the audio content, the first state information of the electronic device, and the specified maximum scan time, and The above audio content is decoded through a decoder to obtain audio data of a first section among the audio data corresponding to the above audio content, and At least a portion of the audio data of the first section is sampled using the first scan interval and the first skip interval, and An electronic device that analyzes at least a portion of the audio data of the first section to identify the sound source category to which the audio data of the first section belongs.

9. In Paragraph 8, When the above commands are executed individually or collectively by the at least one processor, the electronic device, Acquiring second state information of the electronic device for scanning audio data in a second section following the audio data in the first section among the audio data corresponding to the audio content, and Obtaining the estimated decoding time required for the audio data of the second section to be decoded, and Acquire the scan time required to scan audio data of a specified time interval, and The longer of the above-mentioned expected decoding time and the above-mentioned scan time is identified as the expected scan time for the audio data of the above-mentioned specified time interval, and Based on the above-mentioned expected scan time, the above-mentioned second state information of the electronic device, and the above-mentioned specified maximum scan time, the second scan interval and the second skip interval of the audio data of the above-mentioned second section are determined, and At least a portion of the audio data of the second section is sampled using the second scan interval and the second skip interval, and An electronic device that analyzes at least a portion of the audio data of the second section to identify the sound source category to which the audio data of the second section belongs.

10. In Paragraph 9, When the above commands are executed individually or collectively by the at least one processor, the electronic device, When a first sampling type is specified for sampling using the second scan interval and the second skip interval, the starting point of the second section of the audio content is calculated based on the second scan interval, and Using the above decoder, decoding is performed from the starting point of the second section of the audio content to obtain the audio data of the second section among the audio data corresponding to the audio content, and An electronic device that samples at least a portion of the audio data of the second section using the second scan interval and the second skip interval.

11. In Paragraph 9, When the above commands are executed individually or collectively by the at least one processor, the electronic device, When a second sampling type is specified for sampling using the second scan interval and the second skip interval, the audio data of the second section among the audio data corresponding to the audio content is obtained using the decoder, and An electronic device for sampling at least a portion of the audio data corresponding to the second scan interval from the audio data of the second section.

12. A method for scanning and separating audio data in an electronic device, Based on the input for scanning audio content: Determining a first scan interval such that the scan time for the audio content does not exceed the specified maximum scan time by using a time period corresponding to the audio content and a specified maximum scan time, and scanning the audio content by sampling audio data corresponding to the audio content using the first scan interval; and Based on the input for playing the above audio content: Identifying a first plurality of sound source audio data corresponding to a first audio data of a first time period among the audio data corresponding to the above audio content, and the first plurality of sound source audio data are identified using the scan result while playing the audio data corresponding to the above audio content, Separation of the first audio data of the first time period is performed using a real time factor value to obtain the first plurality of sound source audio data, and A method comprising the operation of outputting the above-mentioned first plurality of sound source audio data through a sound output module including a speaker.

13. In Paragraph 12, An operation to identify a first separation time for performing the separation of the first audio data of the first time period using the real-time factor value; If the data size of the first plurality of sound source audio data obtained by performing the separation on the first audio data of the first time period is equal to or greater than the data size corresponding to the first audio rendering time associated with the first separation time: The operation of transmitting the above-mentioned first plurality of sound source audio data to an audio renderer; and The operation of outputting the first plurality of sound source audio data through the above sound output module; or If the data size of the first plurality of sound source audio data is smaller than the data size corresponding to the first audio rendering time associated with the first separation time, the operation of storing the first plurality of sound source audio data in the first buffer of the memory of the electronic device is included. The above method is: The operation of obtaining second plurality of sound source audio data by performing the separation on second audio data of a second time period following the first time interval; and If the first plurality of sound source audio data exists in the first buffer: An operation of merging the first plurality of sound source audio data and the second plurality of sound source audio data; and A method comprising the operation of transmitting the merged audio data to an audio renderer and outputting it through an audio output module of the electronic device.

14. In Paragraph 12, A method comprising the operation of obtaining the real-time factor value using the value obtained by dividing the separation time taken when performing separation in the electronic device prior to the first audio data of the first time period by the first time period.

15. In a non-transient storage medium storing instructions, The above commands are configured to cause the electronic device to perform at least one operation when executed by the electronic device, wherein the at least one operation is, Based on the input for scanning audio content: An operation to determine a first scan interval using a time period corresponding to the audio content and a specified maximum scan time to ensure that the scan time for the audio content does not exceed the specified maximum scan time, and An operation of scanning the audio content by sampling audio data corresponding to the audio content using the first scan interval; and Based on the input for playing the above audio content: Identifying a first plurality of sound source audio data corresponding to a first audio data of a first time period among the audio data corresponding to the above audio content, and the first plurality of sound source audio data are identified using the scan result of the audio content while playing the audio data corresponding to the above audio content, and Separation of the first audio data of the first time period is performed using a real time factor value to obtain the first plurality of sound source audio data, and A storage medium comprising the operation of outputting the above-mentioned first plurality of sound source audio data through a sound output module including a speaker.