Information processing device, information processing method, and information processing program

The information processing apparatus simplifies music composition by analyzing reference sound sources and providing AI-assisted learning style information, addressing the challenge of manual analysis and retrieval for beginners.

WO2026141037A1PCT designated stage Publication Date: 2026-07-02SONY GROUP CORP

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SONY GROUP CORP
Filing Date
2025-12-16
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Music composition using AI assistance is challenging for beginners due to the need for proficiency in music theory to analyze learning style information and search for similar styles, which is difficult to achieve without proper training.

Method used

An information processing apparatus and method that analyzes a reference sound source, extracts similar style information using a pre-trained composition model, and displays it to users, simplifying the music composition process by providing AI-assisted learning style information directly.

Benefits of technology

Facilitates easier music production by automating the analysis and retrieval of suitable learning style information, enabling users to create music more intuitively and effectively using AI assistance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure JP2025043869_02072026_PF_FP_ABST
    Figure JP2025043869_02072026_PF_FP_ABST
Patent Text Reader

Abstract

An information processing device according to the present disclosure comprises: an analysis unit that acquires a reference sound source and analyzes the acquired reference sound source; an extraction unit that extracts, on the basis of a result of the analysis by the analysis unit, style information similar to the reference sound source from style information created in advance and used as training data when executing machine learning of a composition model; and a display unit that displays the style information extracted by the extraction unit.
Need to check novelty before this filing date? Find Prior Art

Description

Information Processing Apparatus, Information Processing Method, and Information Processing Program

[0001] The present disclosure relates to an information processing apparatus, an information processing method, and an information processing program.

[0002] Techniques for assisting in music production such as composition using AI (Artificial Intelligence) are known. For example, Patent Document 1 discloses a technique for generating a learning model for music generation by performing machine learning using learning style information including element information of a music piece as learning data, and causing a computer to compose a new music piece.

[0003] International Publication No. 2022 / 044646

[0004] In the conventional technique, when composing music using the above-described AI assistance, it is necessary for the user to select learning style information corresponding to the learning model for music generation. Also, when it is desired to create a music piece inspired by a certain music piece, it is generally performed to refer to the inspired music piece as a reference music piece. In this case, conventionally, it has been necessary for the user to search for learning style information similar to the reference music piece.

[0005] When searching for learning style information similar to a certain music piece, it is necessary to perform analysis according to the learning style information of the music piece. However, music analysis requires proficiency in music theory and has been difficult especially for beginners. Furthermore, even when the analysis of the reference music piece is successful, it has been difficult to prompt the analysis result to search for similar learning style information.

[0006] Therefore, an object of the present disclosure is to provide an information processing apparatus, an information processing method, and an information processing program that enable easier execution of music production using AI assistance.

[0007] The information processing device according to this disclosure includes: an analysis unit that acquires a reference sound source and analyzes the acquired reference sound source; an extraction unit that extracts style information similar to the reference sound source from style information used as training data when performing machine learning of a composition model, which has been prepared in advance, based on the analysis results by the analysis unit; and a display unit that displays the style information extracted by the extraction unit.

[0008] This is a schematic diagram illustrating a production system applicable to this disclosure. This is a block diagram showing the hardware configuration of an example of a server device applicable to the embodiments of this disclosure. This is a block diagram showing the hardware configuration of an example of a client device applicable to the embodiments of this disclosure. This is a block diagram illustrating an example of the configuration of a production system according to an embodiment. This is a functional block diagram illustrating an example of the functions of a server device according to an embodiment. This is a functional block diagram illustrating an example of the functions of a music composition support plugin according to an embodiment. This is a functional block diagram illustrating an example of the functions of a user host application according to an embodiment. This is a schematic diagram showing an example of information stored in the DB of a server device according to an embodiment. This is a schematic diagram showing an example of information stored in the DB of a music composition support plugin according to an embodiment. This is a schematic diagram showing an example of information stored in the DB of a user host application according to an embodiment. This is a flowchart illustrating an example of processing according to an embodiment. This is a schematic diagram showing an example of the initial screen in a music composition support plugin according to an embodiment. This is a schematic diagram showing a screen during the analysis of a reference sound source by the sound source analysis function according to an embodiment. This is a schematic diagram showing an example of the analysis result display screen according to an embodiment. This is a schematic diagram showing how to specify the range for a similarity search according to an embodiment. This is a schematic diagram showing an example of a list display screen for learning style information related to implementation form. This is a schematic diagram showing an example of a list display screen for learning style information by a learning style information suggestion type processing according to an embodiment. This is a schematic diagram showing an example of a learning style information list display screen using a learning style information filtering search type processing according to the embodiment. This is a schematic diagram showing an example of a composition editing screen for composing and playing music using a composition support plugin according to the embodiment. This is a schematic diagram showing how to edit composed music information according to the embodiment. This is a schematic diagram for explaining the operation to execute the export process for composed music information according to the embodiment. This is a schematic diagram showing an example of an editing screen by a user host application according to the embodiment. This is a flowchart showing an example of a similarity search process according to the embodiment.

[0009] The embodiments of this disclosure will be described in detail below with reference to the drawings. In the following embodiments, the same parts will be denoted by the same reference numerals, and redundant descriptions will be omitted.

[0010] The embodiments of this disclosure will be described below in the following order: 1. Overview of the Disclosure 1-1. Overview of the System Configuration Related to the Disclosure 1-2. Hardware Configuration Applicable to the Embodiments 2. Embodiments of the Disclosure 2-1. Configuration According to the Embodiment 2-2. Processing According to the Embodiment 2-2-1. Similarity Search According to the Embodiment 2-3. Modifications of the Embodiment 3. Summary

[0011] (1. Outline of the Disclosure) First, we will provide an outline of the technology related to this disclosure. This disclosure relates to a technology that uses AI (Artificial Intelligence) to support music production, such as composition. More specifically, this disclosure may present to the user learning style information, which is used as training data for machine learning a composition model used to compose music using AI, based on reference sound sources that are referenced during composition. The composition model may, for example, correspond to a Markov model.

[0012] This disclosure presents learning style information to the user based on reference audio sources, making it easier for users to create music using AI assistance.

[0013] Furthermore, the information processing device that implements the functions related to this disclosure may be provided as a plug-in application for use by being embedded in a host application such as music editing software (DAW (Digital Audio Workstation)) for producing and editing music. By providing the information processing device as a plug-in application, users can easily edit the music data generated by the information processing device on a host application with more functions.

[0014] (1-1. Outline System Configuration Related to This Disclosure) Figure 1 is a schematic diagram illustrating the production system applicable to this disclosure.

[0015] In Figure 1, the production system 1 applicable to this disclosure includes a server device 10 and client devices 20a, 20b, ... used by users. The server device 10 and each client device 20a, 20b, ... are connected to each other via a communication network 2 such as the Internet.

[0016] The server device 10 has a database (DB) 11 that stores learning style information, which is created in advance and stored in a predetermined storage medium (for example, a storage device owned by the server device 10). Details of the learning style information will be described later.

[0017] Each client device 20a, 20b, ... is equipped with a user host application 30a, 30b, ..., respectively. Here, each user host application 30a, 30b, ... is assumed to be a DAW for music production, editing, etc. In addition, in the example in Figure 1, each client device 20a, 20b, ... is equipped with a plug-in application 40a, 40b, 40c, ... or a plug-in application 40a, 40d, 40e, ..., respectively.

[0018] For example, plug-in applications 40a, 40b, 40c, ... are used by being incorporated into user host application 30a. Similarly, plug-in applications 40a, 40d, 40e, ... are used by being incorporated into user host application 30b. Here, plug-in application 40a is a plug-in application as an information processing device related to this disclosure, and plug-in applications 40b, 40c, 40d, 40e, ... are other plug-in applications.

[0019] Thus, each user host application 30a, 30b, ... can incorporate and use multiple plug-in applications. Hereafter, unless otherwise specified, the plug-in application 40a related to this disclosure will be referred to as plug-in application 40. Furthermore, hereafter, among the client devices 20a, 20b, ... and each user host application 30a, 30b, ..., the client device 20a and the user host application 30a will be used as examples, and unless otherwise specified, they will be described as client device 20 and user host application 30.

[0020] For example, when a user composes music using the AI ​​assist function in production system 1, they start a user host application 30 on client device 20, and then call a plugin application 40 from the started user host application 30. The user prepares sound source data of a song, for example, "I want to compose a song inspired by this song," as a reference sound source 50, and loads the reference sound source 50 into the plugin application 40.

[0021] The plug-in application 40 analyzes the loaded reference sound source 50, communicates with the server device 10 based on the analysis results, and searches for learning style information similar to the reference sound source 50 from the learning style information stored in the DB 11 of the server device 10. The plug-in application 40 presents the user with one or more learning style information retrieved from DB 11 and prompts the user to select a learning style information. In response to user instructions, the plug-in application 40 composes music using the AI ​​assist function based on the learning style information selected by the user, and presents the music data of the composed song to the user.

[0022] The plug-in application 40, in response to user instructions, composes music based on learning style information using its AI assist function and presents the music data to the user host application 30. The plug-in application 40 may, in response to user instructions, perform editing or other processing on the music data before passing it to the user host application 30. The user host application 30 may, in response to user instructions, perform editing or other processing on the music data received from the plug-in application 40.

[0023] Thus, the production system 1 related to this disclosure can perform composition using an AI-assisted function based on learning style information retrieved from reference sound sources 50 provided by the user. Therefore, users can perform composition according to the reference sound sources 50 more easily.

[0024] (1-2. Hardware Configurations Applicable to Embodiments) Next, an example of a hardware configuration applicable to embodiments of the present disclosure will be described.

[0025] Figure 2 is a block diagram showing an example of the hardware configuration of a server device 10 applicable to the embodiments of this disclosure.

[0026] In Figure 2, the server device 10 includes a CPU (Central Processing Unit) 1000, a ROM (Read Only Memory) 1001, a RAM (Random Access Memory) 1002, a storage device 1004, a data interface (I / F) 1005, and a communication I / F 1006, and each of these parts is connected to each other via a bus 1010 so that they can communicate with one another.

[0027] The storage device 1004 is a non-volatile storage medium such as flash memory or a hard disk drive. The CPU 1000 controls the overall operation of the server device 10 using the RAM 1002 as work memory, according to the programs stored in the storage device 1004 and ROM 1001.

[0028] The data interface 1005 is an interface for communicating with external devices. The server device 10 may send and receive data with the DB 11 via the data interface 1005. The communication interface 1006 controls communication with the communication network 2.

[0029] Furthermore, an input device for receiving user input, and a display and display control unit for presenting information to the user may be connected to the server device 10. Also, although the server device 10 is shown here as being composed of a single computer device, this is not limited to this example, and the server device 10 may be configured in a distributed manner with multiple computers connected to each other in a manner that allows them to communicate with one another, or it may be configured on a cloud network (not shown) connected to the communication network 2.

[0030] Figure 3 is a block diagram showing an example of the hardware configuration of a client device 20 applicable to the embodiments of this disclosure.

[0031] In Figure 3, the client device 20 includes a CPU 2000, a ROM 2001, a RAM 2002, a storage device 2004, a data interface 2005, a communication interface 2006, and a display control unit 2007, with each of these components connected to each other via a bus 2030 for communication. Thus, the client device 20 can be implemented with a configuration equivalent to that of a typical personal computer.

[0032] The storage device 2004 is a non-volatile storage medium such as flash memory or a hard disk drive. The CPU 2000 controls the overall operation of the client device 20 using RAM 2002 as work memory, according to the programs stored in the storage device 2004 and ROM 2001.

[0033] The data interface 2005 is an interface for communication with external devices. In the example in Figure 3, the input device 2020 and the audio interface 2021 are connected to the data interface 2005. The input device 2020 is a device for receiving user input and may be a pointing device such as a mouse or a keyboard.

[0034] The audio interface 2021 has the function of converting the digital audio signal supplied from the data interface 2005 into an analog audio signal. The analog audio signal output from the audio interface 2021 may be amplified by an amplifier and used to drive, for example, the left and right speakers 2022L and 2022R. This allows the digital audio signal generated by, for example, the CPU 2000 according to a program to be output as sound.

[0035] The display control unit 2007 generates display signals that the display 2010 can display based on display information generated by the CPU 2000 according to the program. The display 2010 includes a display device such as an LCD (Liquid Crystal Display) or an OLED (Organic Light Emitting Diode) display panel, and a drive circuit that drives the display device according to the display signals. The display 2010 displays a screen corresponding to the display signals supplied from the display control unit 2007. This makes it possible to display a screen on the display 2010 according to the program.

[0036] The display 2010 may be a touch panel integrated with the input device 2020. Furthermore, the client device 20 is not limited to a general-purpose personal computer; it may also consist of a tablet computer or the like.

[0037] (2. Embodiments of the Disclosure) Next, embodiments of the Disclosure will be described.

[0038] (2-1. Configuration according to the embodiment) Figure 4 is a block diagram showing an example of the configuration of the production system 1 according to the embodiment. The production system 1 shown in Figure 4 corresponds to the production system 1 shown in Figure 1 and includes a server device 10 and a client device 20 that are connected to each other via a communication network 2.

[0039] In Figure 4, the server device 10 includes a control unit 100 and a DB 110. The DB 110 is the server database included in the server device 10. The client device 20 is equipped with a user host application 30, which is, for example, a DAW, and a plug-in application 40 according to the embodiment (labeled as composition support plug-in 40 in the figure). The plug-in application 40 is used by being incorporated into the user host application 30.

[0040] For the purposes of this explanation, it will be assumed that the only plugin application incorporated into the user host application 30 is the plugin application 40 according to this embodiment. Furthermore, from now on, the plugin application 40 will be referred to as the composition support plugin 40.

[0041] The user host application 30 includes the composition support plugin 40 described above, a display operation unit 300, a playback unit 310, a control unit 320, and a database DB 330 within the user host application 30.

[0042] The display operation unit 300 controls the screen display in the user host application 30 and accepts user operations for the user host application 30. The playback unit 310 performs playback of music data and performance using music data in the user host application 30. The control unit 320 controls the overall operation of the user host application 30. The DB 330 stores information about music composed in the user host application 30, for example.

[0043] The composition support plugin 40 includes a display operation unit 400, a control algorithm unit 410, a playback performance unit 420, a control unit 430, and a DB 440 which is a database within the composition support plugin 40.

[0044] The display operation unit 400 controls the screen display in the composition support plugin 40 and accepts user operations on the composition support plugin 40. The control algorithm unit 410 performs processes related to the analysis of the reference sound source 50 by the composition support plugin 40 and composes music by AI. The playback performance unit 420 performs the playback of music data and the performance based on the music data in the composition support plugin 40. The playback performance unit 420 may control the playback and performance of the music data independently and exclusively or in conjunction with the playback and performance by the playback performance unit 310 in the user host application 30. The control unit 430 controls the overall operation of the composition support plugin 40. The DB 440 stores information related to music composed in the user host application 30, for example.

[0045] In the client device 20, when the information processing program according to the embodiment is executed, the CPU 2000 configures the display operation unit �00, the control algorithm unit 410, the playback performance unit 420, the control unit 430, and the DB 440 (data management unit for managing data stored in the DB 440) included in the above-described composition support plugin 40 as modules, for example, in the main memory area in the RAM 2002.

[0046] The information processing program can be obtained from the outside (for example, the server device 10) via the communication network 2 by communication via the communication I / F 2006, or obtained from a storage medium connected to the data I / F 2005 and installed on the client device 20.

[0047] Similarly, in the server device 10, when the server device program according to the embodiment is executed, the CPU 1000 configures the control unit 100 and the DB 110 (data management unit in the DB 110) as modules, for example, in the main memory area in the RAM 1002.

[0048] The program for the server device can be obtained from the outside via a communication network 2 by communication via, for example, a communication I / F 1006, or can be obtained from a storage medium connected to a data I / F 1005 and installed on the server device 10.

[0049] Here, an example of incorporating the composition support plug-in 40 into the user host application 30 will be schematically described.

[0050] The screen configuration of the user host application 30 generally includes a track display area, a channel strip display area, and a time information display area. The track display area displays tracks that represent musical piece data divided by part in time series. The channel strip display area displays channel strips that control playback in channel units for each track as a channel. The time information display area displays time information including tempo information, playback time, and the like.

[0051] Generally, a plurality of types of tracks corresponding to the type of musical piece data to be applied are prepared. Examples of track types may include, for example, an audio track, a MIDI (Musical Instruments Digital Interface) track, and an instruments track. An audio track is a track to which audio data is applied. A MIDI track is a track to which MIDI data for controlling an external MIDI device is applied. Also, an instruments track is a track to which instruments provided as a plug-in application for the user host application 30 such as a software synthesizer or a sampler are applied. In an instruments track, the instruments are internally performance-controlled by MIDI data. Note that the track types are not limited to these.

[0052] A channel strip is assigned to each track. In an instrument track, multiple channels may be assigned to a single instrument. The channel strip is provided with controls for performing basic operations on the track, such as volume, pan, and bus send adjustments, as well as controls for calling up plugin applications. The channel strip of an instrument track is also provided with controls for calling up instruments. The composition support plugin 40 according to this embodiment may be called from the channel strip of an instrument track.

[0053] The functions of the server device 10, user host application 30, and music composition support plugin 40 according to this embodiment will be explained in more detail using Figures 5 to 7.

[0054] Figure 5 is a functional block diagram of an example illustrating the functions of a server device 10 according to an embodiment. In the server device 10, the control unit 100 includes a learning style information management function 101, a user management function 102, and an application communication function 103.

[0055] The learning style information management function 101 is a function that manages learning style information that has been created in advance and stored in the DB 110. The learning style information management function 101 searches for learning style information from the DB 110 and outputs it, for example, in response to a request from the composition support plugin 40.

[0056] The user management function 102 is a function that manages users who use the composition support plugin 40. For example, a user of the composition support plugin 40 accesses the server device 10 and creates their own account using the user management function 102 before using the composition support plugin 40. The user management function 102 is a function that registers the user as a user of the composition support plugin 40 based on the created account. The application communication function 103 is a function that communicates with the composition support plugin 40 via the communication network 2.

[0057] Figure 6 is an example of a functional block diagram illustrating the functions of the composition support plugin 40 according to this embodiment.

[0058] In the composition support plugin 40, the display operation unit 400 includes a sound source upload function 401, a sound source analysis information display function 402, a range / search instruction function 403, a search result list display function 404, a composition instruction function 405, a composition editing function 406, a composition result export function 407, and a search filter function 408. The control algorithm unit 410 includes a sound source analysis function 411, a similarity search function 412, and a composition function 413. The playback unit 420 includes a playback information transmission function 421 and a playback sound source function 422. The control unit 430 includes a history management function 431 and a server communication function 432.

[0059] In the control unit 430, the history management function 431 manages, for example, the composition history of the composition support plugin 40 and the editing history of music data. The server communication function 432 controls communication between the composition support plugin 40 and the server device 10.

[0060] In the display operation unit 400, the sound source upload function 401 is a function that acquires a reference sound source 50 provided by the user and stores it in, for example, a storage device 2004 or RAM 2002. In the control algorithm unit 410, the sound source analysis function 411 is a function that analyzes the reference sound source 50 acquired by the sound source upload function 401. In the display operation unit 400, the sound source analysis information display function 402 is a function that generates display information for displaying the analysis results obtained by the sound source analysis function 411 analyzing the reference sound source 50. The sound source analysis information display function 402 may display the analysis results using, for example, the waveform of the reference sound source 50.

[0061] In the display operation unit 400, the range / search instruction function 403 is a function that sets the range to be searched according to user operation for the analysis results displayed by the sound source analysis information display function 402. In the control algorithm unit 410, the similarity search function 412 is a function that searches for learning style information similar to the information within the range set by the range / search instruction function 403 according to user operation in the analysis results, from the learning style information stored in the DB 110 of the server device 10. In the display operation unit 400, the search result list display function 404 is a function that displays the learning style information retrieved from the DB 110 by the range / search instruction function 403 as a list of search results.

[0062] In the display operation unit 400, the composition instruction function 405 instructs the composition function 413 of the control algorithm unit 410 to compose music using the learning style information specified in response to user operation from the learning style information displayed in the list by the search result list display function 404. The composition function 413 includes a composition model that has been pre-trained using the learning style information. The control algorithm unit 410 uses the composition model of the composition function 413 to perform composition based on the learning style information specified in response to user operation.

[0063] In the display operation unit 400, the composition editing function 406 is a function that performs editing on the music data composed by the composition function 413 of the control algorithm unit 410 in accordance with user operations. The composition result export function 407 is a function that outputs the music data of the music composed by the composition function 413 from the composition support plugin 40 in a format that can be used outside the composition support plugin 40, for example. The composition result export function 407 may output the music data in the format of a MIDI file, for example.

[0064] MIDI information may include information indicating pitch, note length, note intensity, and the starting position of the note.

[0065] In the display operation unit 400, the search filter function 408 is a function that performs a search on the learning style information stored in the DB 110 of the server device 10 using keywords or tags specified by the user.

[0066] In the playback unit 420, the playback information transmission function 421 is a function that sends playback information for playing music data to the user host application 30. For example, the playback unit 420, using the playback information transmission function 421, sends playback information for playing music data of a song composed by the composition function 413 in the control algorithm unit 410 to the user host application 30. The playback information may be, for example, MIDI information.

[0067] In the playback unit 420, the playback sound source function 422 is a function that realizes the sound source for each track when playing back a song based on the playback information sent out by the playback information transmission function 421. The playback sound source function 422 may include, for example, a software synthesizer or a sampler.

[0068] Figure 7 is an example of a functional block diagram illustrating the functions of the user host application 30 according to this embodiment. Note that in Figure 7, the composition support plugin 40 is shown in a simplified form to avoid complexity.

[0069] In the user host application 30, the display operation unit 300 includes a playback instruction function 301 and an editing control function 302. The playback unit 310 includes a playback information receiving function 311 and a playback sound source function 312. The control unit 320 also includes a history management function 321.

[0070] In the display operation unit 300, the playback instruction function 301 is a function that generates playback information for each track displayed in the track display area, for example, in response to user operations. The editing control function 302 is a function that controls editing for each track displayed in the track display area, for example, in response to user operations.

[0071] In the playback unit 310, the playback information receiving function 311 is a function that receives playback information generated by the playback instruction function 301 or the composition function 413 in the composition support plugin 40. The playback sound source function 312 is a function that realizes the sound source for each track when playing a song based on the playback information received by the playback information receiving function 311. The playback sound source function 312 may include, for example, a software synthesizer or a sampler.

[0072] In the control unit 320, the history management function 321 is a function that manages the history of edits to music data performed by, for example, the editing control function 302 of the display operation unit 300 in the user host application 30.

[0073] Next, using Figures 8 to 10, we will explain the information stored in the DBs 110, 330, and 440 of the server device 10, user host application 30, and music composition support plugin 40 according to this embodiment.

[0074] Figure 8 is a schematic diagram showing an example of information stored in the DB 110 of the server device 10 according to the embodiment. As shown in Figure 8, the DB 110 stores a plurality of learning style information 1100. The learning style information 1100 includes learning style basic information 1110 and learning style information music information 1120.

[0075] The learning style basic information 1110 includes key signature information 1111, tempo information 1112, and musical meta information 1113.

[0076] Key signature information 1111 defines the basic key (C major, A minor, F-sharp minor, etc.) in the learning style basic information 1110. Tempo information 1112 defines the basic tempo (BPM (Beats Per Second) = 120, BPM = 96, etc.) in the learning style basic information 1110. Music metadata 1113 is information for providing a higher-level conceptual definition to the learning style basic information 1110, and may, for example, be information indicating the genre or mood of the music related to the learning style basic information 1110. These key signature information 1111, tempo information 1112, and music metadata 1113 included in the learning style basic information 1110 may be tag information or an index for the learning style information 1100.

[0077] The music information 1120 within the learning style information includes chord progression information 1121, bass progression information 1122, and melody information 1123. However, the music information 1120 within the learning style information may also include rhythm information.

[0078] The chord progression information 1121 and bass progression information 1122 are information that indicates a chord progression and bass progression of a predetermined length, such as eight measures, respectively. The melody information 1123 is information that indicates a melody, and multiple melody information 1123 are provided for one set of chord progression information 1121 and bass progression information 1122. The chord progression information 1121, bass progression information 1122, and melody information 1123 may each include information such as pitch, starting position, and note length, and may be represented as MIDI information.

[0079] For example, the composition support plugin 40 trains the composition model included in the composition function 413 of the control algorithm unit 410 based on multiple melody information 1123 included in the learning style information 1100. This training of the composition model may be performed in advance in the composition support plugin 40 before it is installed in the client device 20. The control algorithm unit 410 can generate new melodies related to this information by using the composition model trained by the composition function 413 based on multiple melody information 1123 that share chord progression information 1121 and bass progression information 1122.

[0080] Figure 9 is a schematic diagram showing an example of information stored in the DB 440 of the composition support plugin 40 according to the embodiment. As shown in Figure 9, the DB 440 stores reference sound source analysis information 4400. The reference sound source analysis information 4400 includes key signature information 4401, tempo information 4402, chord progression information 4403, and song metadata 4404.

[0081] As described above, in the composition support plugin 40, the control algorithm unit 410 analyzes the reference sound source 50 provided by the user and acquired by the sound source upload function 401 using the sound source analysis function 411. The control algorithm unit 410 may perform the analysis of the reference sound source 50 using the sound source analysis function 411 using a known 12-tone analysis method.

[0082] In 12-tone analysis, the musical structure, such as tempo, melody structure, chord progression, and number of notes, is analyzed for each of the 12 notes of the scale, each representing a semitone, based on factors such as the intensity and length of the sound. By utilizing 12-tone analysis, it is possible to detect not only the structure of the song data (intro, A section, B section, chorus, etc.) but also the emotion of the song data. The control algorithm unit 410 may, using the sound source analysis function 411, analyze the reference sound source 50 using the 12-tone analysis method to obtain key signature information 4401, tempo information 4402, chord progression information 4403, and song metadata information 4404 as analysis results. The control algorithm unit 410 stores the key signature information 4401, tempo information 4402, chord progression information 4403, and song metadata information 4404 obtained by analyzing the reference sound source 50 in the DB 440.

[0083] Figure 10 is a schematic diagram showing an example of information stored in the DB 330 of the user host application 30 according to an embodiment. As shown in Figure 10, the DB 330 stores composed music information 3300. The composed music information 3300 includes chord progression information 3301, bass progression information 3302, and melody information 3303. However, the composed music information 3300 may also include rhythm information.

[0084] The user host application 30 may store the playback information received from the composition support plugin 40 via the playback information receiving function 311 in the playback unit 310, or the edited playback information obtained by editing the playback information using the editing control function 302 in the display operation unit 300, as the composed music information 3300 in the DB 330.

[0085] (2-2. Processing according to the embodiment) Next, processing according to the embodiment will be described.

[0086] Figure 11 is a flowchart illustrating an example of the process according to the embodiment. Figures 12 to 22 are schematic diagrams showing examples of GUI (Graphical User Interface) screens for each process in the flowchart.

[0087] In step S100, the user begins using the composition support plugin 40. For example, the user starts the user host application 30 on the client device 20. The user host application 30 calls the composition support plugin 40 in response to the user's operation and inserts the composition support plugin 40 into a predetermined track on the user host application 30.

[0088] When the composition support plugin 40 is used by a user for the first time, it accesses the server device 10 via the server communication function 432, creates an account for that user via the user management function 102, and registers the user with the user management function 102.

[0089] Figure 12 is a schematic diagram showing an example of the initial screen in the composition support plugin 40 according to the embodiment. For example, in the composition support plugin 40, when the composition support plugin 40 is inserted into the user host application 30, the display operation unit 400 displays the initial screen 60 shown in Figure 12 on the display 2010 of the client device 20. In the following, "displaying (the screen) on the display 2010 of the client device 20" will be simply referred to as "displaying (the screen)".

[0090] In the example shown in Figure 12, the initial screen 60 includes a file upload area 600 and a file selection button 601. The file upload area 600 is an area for loading the reference sound source 50 into the composition support plugin 40. For example, in response to user operation, the reference sound source 50 is uploaded and loaded into the composition support plugin 40 by dragging and dropping an icon representing the reference sound source 50 into the file upload area 600. Alternatively, the reference sound source 50 can be loaded into the composition support plugin 40 by directly specifying the file of the reference sound source 50. The file selection button 601 is a button for selecting learning style information 1100.

[0091] In the next step S101, the composition support plugin 40 determines whether to perform a learning style information suggestion type process or a learning style information refinement search type process, depending on the user operation. For example, if a reference sound source 50 is uploaded on the initial screen 60, the composition support plugin 40 determines to perform a learning style information suggestion type process (step S101, "suggestion"). On the other hand, if the file selection button 601 is operated on the initial screen 60, the composition support plugin 40 determines to perform a learning style information refinement search type process (step S101, "search").

[0092] If it is determined in step S101 to perform a learning style information suggestion type process (step S101, "suggestion"), the process proceeds to step S102.

[0093] In the next step S102, the composition support plugin 40 retrieves the reference sound source 50 uploaded to the file upload area 600. The composition support plugin 40 passes the retrieved reference sound source 50 to the control algorithm unit 410. In the next step S103, the control algorithm unit 410 analyzes the reference sound source 50 using the sound source analysis function 411. The analysis of the reference sound source 50 may be performed using 12-tone analysis as described above, or by other methods.

[0094] Figure 13 is a schematic diagram of a screen showing the analysis of a reference sound source 50 by the sound source analysis function 411 according to an embodiment. In Figure 13, the analysis execution screen 61 is provided with a metadata area 610 that displays metadata of the reference sound source 50, an execution indicator 611 that indicates that the analysis is in progress, and a cancel button 612. The metadata area 610 displays at least the data file name of the reference sound source 50, and may also display information such as the song title, jacket image, and artist name if available. The composition support plugin 40 may interrupt the analysis of the reference sound source 50 and switch the display screen to another screen, such as the initial screen 60, if the cancel button 612 is pressed.

[0095] The composition support plugin 40 displays the analysis results of the reference sound source 50 using the sound source analysis information display function 402 of the display operation unit 400. Figure 14 is a schematic diagram showing an example of an analysis result display screen according to the embodiment. In Figure 14, the analysis result display screen 62 is provided with the metadata area 610 described above, the analysis result display area 620, an OK button 625, and a cancel button 626.

[0096] In the example shown in Figure 14, the analysis result display area 620 includes a waveform display 621 showing the waveform of the reference sound source 50, bars 623a and 623b indicating the analysis range 622, and a zoom operation unit 624. Bars 623a and 623b indicate the start and end positions of the analysis in the waveform shown in the waveform display 621, respectively. The zoom operation unit 624 adjusts the time axis scale of the waveform display 621 according to user operation.

[0097] The OK button 625 is a button to proceed to the next process. The Cancel button 626 is a button to discard the analysis results and switch the display screen to another screen, such as the initial screen 60.

[0098] The sound source analysis information display function 402 may also display key signature information 4401, tempo information 4402, chord progression information 4403, and song metadata 4404 obtained as a result of analyzing the reference sound source 50 on the analysis result display screen 62.

[0099] In the next step S104, the composition support plugin 40 performs a similarity search according to the user-specified range using the range / search instruction function 403 of the control algorithm unit 410. Figure 15 is a schematic diagram showing how the range of the similarity search is specified according to the embodiment. The user can change the start or end position of the analysis by specifying and moving, for example, the knob portion at the upper end of the bar 623a or 623b in the analysis result display area 620 of the analysis result display screen 62, using the cursor 627, and adjust the analysis range 622 in the waveform display 621.

[0100] If, for example, the OK button 625 is pressed in step S104, a similarity search is performed to find learning style information 1100 similar to the reference sound source 50 based on the specified analysis range 622, and the process moves to step S106. Details of the similarity search will be described later.

[0101] On the other hand, if it is determined in step S101 above to execute a learning style information refinement search type process (step S101, "search"), the process moves to step S105. In step S105, the composition support plugin 40 obtains learning style information 1100 from the DB 110 of the server device 10 using the search results list display function 404 of the display operation unit 400, and displays the obtained learning style information 1100. At this time, the display operation unit 400 may use the search filter function 408 to refine the obtained learning style information 1100 using a filter function according to user operation.

[0102] Figure 16 is a schematic diagram showing an example of a list display screen for learning style information 1100 according to the embodiment.

[0103] In Figure 16, the learning style information list display screen 70 includes a condition setting unit 700 for setting filtering conditions, a tag specification unit 710 for setting filtering conditions based on tag information, and a list display area 720a for displaying a list of filtered learning style information 1100. The learning style information list display screen 70 is further provided with a search term input unit 731, a playback adjustment unit 732, a display switching unit 733, an OK button 734, and a cancel button 735.

[0104] The list display area 720a displays a list of the learning style information 1100 acquired by the search results list display function 404 in step S105. The list display area 720a includes items 721 to 726 that specify the "name," "tempo," "key," "genre," "chord progression," and "user evaluation" corresponding to each piece of information contained in the learning style information 1100. The learning style information 1100 displayed in the list display area 720a is sorted according to the item specified by the user from among the items 721 to 726.

[0105] The condition setting unit 700 sets conditions for narrowing down the learning style information 1100 acquired by the search result list display function 404 in step S105. In the example in Figure 16, the condition setting unit 700 includes a genre specification unit 701 for specifying the genre related to the learning style information 1100, a key specification unit 702 for specifying the key, and a tempo specification unit 703 for specifying the tempo (BPM). In the example in Figure 16, these genre specification unit 701, key specification unit 702, and tempo specification unit 703 are configured to allow the user to specify the genre, key, and tempo, respectively, using drop-down lists. The search result list display function 404 narrows down the learning style information 1100 displayed in the list display area 720a according to the logical AND of the conditions specified in, for example, the genre specification unit 701, the key specification unit 702, and the tempo specification unit 703, and updates the display in the list display area 720a.

[0106] The tag specification unit 710 displays a list of pre-set tag information for each learning style information 1100, which is used to narrow down the learning style information 1100 acquired by the search results list display function 404 in step S105. The tag information may include genre and mood. Multiple tag information items can be specified simultaneously in the tag specification unit 710. When one or more tag information items displayed in the tag specification unit 710 are specified in response to user operation, the search results list display function 404 narrows down the learning style information 1100 displayed in the list display area 720a according to the logical AND of the specified tag information and updates the display in the list display area 720a.

[0107] Furthermore, the display operation unit 400 may, using the search results list display function 404, narrow down the learning style information 1100 in the list display area 720a by performing a logical AND operation between the conditions set by the condition setting unit 700 and the tag information specified by the tag specification unit 710.

[0108] On the learning style information list display screen 70, the search term input unit 731 allows direct input of search terms to narrow down the learning style information 1100. The search results list display function 404 may prioritize the filtering based on the search terms entered into the search term input unit 731 against the conditions set by the condition setting unit 700 and the tag information specified by the tag specification unit 710.

[0109] In the list display area 720a, the user can instruct the output of playback sound based on the learning style information 1100 corresponding to the currently selected learning style information 1100 (shown as a filled area in Figure 16) by operating the playback control button 727. At that time, the display operation unit 400 may display an adjustment screen to adjust the sound quality when playing music, etc., in response to user operations on the playback adjustment unit 732.

[0110] Furthermore, the display operation unit 400 may, using the search results list display function 404, switch the display format of each learning style information 1100 in the list display area 720a to a list display, grid display, or the like, in response to user operations on the display switching unit 733.

[0111] When the OK button 734 is pressed on the learning style information list display screen 70, the display operation unit 400 determines that the filtering of the learning style information 1100 displayed in the list display area 720a is complete and proceeds to the next step S106 (see Figure 11).

[0112] Returning to Figure 11, in step S106, the display operation unit 400 uses the search result list display function 404 to display a list of learning style information 1100 obtained according to the similarity search results in step S104, or learning style information 1100 narrowed down in step S105. Using Figures 17 and 18, examples of the learning style information list display screens for the learning style information suggestion type processing and the learning style information narrowing search type processing will be explained.

[0113] Figure 17 is a schematic diagram showing an example of a learning style information list display screen 70a based on a learning style information suggestion type processing according to an embodiment. The learning style information list display screen 70a shown in Figure 17 has a metadata area 610, an analysis result summary display area 740, and an analysis result detail display area 750 added to the learning style information list display screen 70 shown in Figure 16. In addition, the search term input section 731 in the learning style information list display screen 70 of Figure 16 has been removed from the learning style information list display screen 70a. The metadata area 610 is the same as the metadata area 610 shown in Figure 13.

[0114] The analysis result summary display area 740 provides a schematic overview of the analysis results of the reference sound source 50 performed by the sound source analysis process in step S103. In the example in Figure 17, the analysis result summary display area 740 displays tempo information, key information, and mood information. The analysis result detail display area 750 includes a waveform display 751 showing the waveform of the reference sound source 50. The upper part 752 of the waveform display 751 shows the song structure (verse A, verse B, chorus, etc.) predicted from the analysis of the reference sound source 50, and the lower part 753 shows information indicating the chord progression predicted from the analysis of the reference sound source 50. Furthermore, a bar 754 indicates the starting position when searching for similar sections.

[0115] Figure 18 is a schematic diagram showing an example of a learning style information list display screen 70b based on a learning style information refinement search type processing according to an embodiment. In the example of Figure 18, the learning style information list display screen 70b is a screen that directly inherits the learning style information list display screen 70 of Figure 16 described above.

[0116] Furthermore, the learning style information suggestion type processing by steps S102 to S104 described above and the learning style information refinement search type processing by step S105 may be executed in combination. For example, the user can upload a reference sound source 50 to the composition support plugin 40 and then use the search filter function 408 on the display operation unit 400 of the composition support plugin 40 to refine the learning style information 1100. In this case as well, the composition support plugin 40 may present the results of the learning style information search type processing and the results of the learning style information refinement search type processing to the user using the search results list display function 404 of the display operation unit 400, on the learning style information list display screen 70a or 70b shown in Figure 17 or Figure 18 described above.

[0117] Returning to Figure 11, when the learning style information list display screen 70a or 70b is displayed in step S106, the process moves to the next step S107. In step S107, the display operation unit 400 plays back the learning style information 1100 in response to user instructions on the learning style information list display screen 70a or 70b.

[0118] In the example shown in Figure 17, the playback control button 727 corresponding to the selected learning style information 1100 in the list display area 720b is operated in response to user input, and playback of the learning style information 1100 is instructed. The same applies to the example shown in Figure 18.

[0119] For example, when the playback control button 727 is operated on the learning style information list display screen 70a or 70b and playback of the corresponding learning style information 1100 is instructed, the display operation unit 400 requests the playback unit 420 to play the learning style information 1100. In response to this request, the playback unit 420 sends the learning style information 1100 to the user host application 30 using the playback information transmission function 421. The user host application 30 receives the learning style information 1100 sent from the composition support plugin 40 by the playback unit 310 and plays the music based on the received learning style information 1100.

[0120] In the next step S108, for example, the display operation unit 400 determines whether the selection of learning style information 1100 has been completed in response to user operation. For example, the display operation unit 400 may determine that the selection of learning style information has been completed when the OK button 734 is pressed on the learning style information list display screen 70a or 70b by user operation.

[0121] If the display operation unit 400 determines that the selection of learning style information 1100 has not been completed (step S108, "No"), it returns the process to step S107. The user can, for example, play other learning style information 1100 displayed on the learning style information list display screen 70a or 70b and decide whether or not to select that learning style information 1100.

[0122] On the other hand, if the display operation unit 400 determines that the selection of learning style information 1100 has been completed in step S108 (step S108, "Yes"), it proceeds to step S109 to determine the learning style information 1100 to be used for composition.

[0123] In the next step S110, the control algorithm unit 410, in response to user operation, uses the composition function 413 to compose music using a pre-learned composition model based on the learning style information 1100 determined in step S109. The control algorithm unit 410 then passes the composed music information 3300 composed by the composition function 413 to the playback unit 420.

[0124] The playback unit 420 can simultaneously play three parts—chords, bass, and melody—based on the chord progression information 3301, bass progression information 3302, and melody information 3303 included in the pre-composed song information 3300. If the pre-composed song information 3300 also includes rhythm information, it can also play the rhythm simultaneously in addition to these three parts.

[0125] For example, the playback unit 420 may use the playback sound source function 422 to assign instruments to chords, bass, melody, and rhythm, and supply the sound data played by the assigned instruments to the audio interface 2021. The audio interface 2021 may convert the supplied sound data into analog audio signals and output them as sound from speakers 2022L and 2022R.

[0126] The playback unit 420 may also convert the composed music information 3300 into a music signal such as MIDI data using the playback information transmission function 421 and send it to the user host application 30. The user host application 30 receives the composed music information 3300 sent from the composition support plugin 40 by the playback unit 310 using the playback information reception function 311, and plays the music based on the composed music information 3300 received by the playback sound source function 312.

[0127] In this case as well, the playback unit 310 may use the playback sound source function 312 to assign instruments to chords, bass, melody, and rhythm, and supply the sound data played by the assigned instruments to the audio I / F 2021, outputting it as sound from speakers 2022L and 2022R.

[0128] Figure 19 is a schematic diagram showing an example of a composition editing screen for composing and playing music using the composition support plugin 40 according to the embodiment.

[0129] In Figure 19, the composition editing screen 80 provided by the composition support plugin 40 includes a composition control area 810 and a composition assistance area 820. The composition editing screen 80 also includes a style name display area 800 that displays the name of the learning style information 1100 that is being edited and played on the composition editing screen 80, setting buttons 801 for making various settings related to the composition editing screen 80, and volume adjustment buttons 802 for adjusting the volume of the music being played. Furthermore, the composition editing screen 80 includes a history display area 830 that shows the history of composition and editing, a sound source assignment area 831 for assigning sound sources (instruments) to each part such as melody, chords, and rhythm, and output instruction buttons 832a and 832b for instructing the output of music data.

[0130] The composition control area 810 includes a melody editing area 811, a composition instruction button 812, a composition parameter setting area 813, a playback control area 814, a tempo control area 815, a key control area 816, and a volume adjustment knob 817.

[0131] The melody editing area 811 is a so-called piano roll screen, where the vertical axis represents pitch using an image 8110 that mimics piano keys, and the horizontal axis represents time. In the melody editing area 811, each note of the melody information is displayed chronologically by bars 8111 that indicate pitch, note length, and note start position. A bar indicating volume may be further displayed inside the bars 8111. In the melody editing area 811, the vertical bar 8112 indicates the current playback position. In the melody editing area 811, the pitch, note length, and note start position can be changed by changing the position and length of each bar 8111 according to user operations, thereby editing the melody.

[0132] The composition instruction button 812 is a button used to instruct the control algorithm unit 410 to compose music using the composition function 413. In other words, in step S110 described above, when the user operates the composition instruction button 812, the control algorithm unit 410 uses the composition function 413 to compose music using a pre-learned composition model based on the learning style information 1100 determined in step S109, and generates composed music information 3300.

[0133] The composition parameter setting area 813 is an area for setting parameters for composition using the composition function 413. In this example, the composition parameter setting area 813 allows setting the time and complexity of the music to be composed as parameters for composition. The playback control area 814 is provided with controls for controlling the playback of music for which the melody is displayed in the melody editing area 811, and allows instructing the music to start playback, stop playback, fast forward playback, fast rewind playback, and repeat playback for a specified period.

[0134] The tempo control area 815 and the key control area 816 are areas for editing the tempo and key of the song whose melody is displayed in the melody editing area 811, respectively. The volume adjustment knob 817 is an operation for adjusting the volume of the song during playback of the song whose melody is displayed in the melody editing area 811.

[0135] The composition assistance area 820 includes a chord editing area 821 and a rhythm editing area 822.

[0136] The chord editing area 821 includes a playback control area 8210 and a chord display area 8211. The chord display area 8211 shows the chords in chronological order using the piano roll screen described above. The playback control area 8210 is provided with various controls for controlling the playback (solo, mute, volume) of the chords displayed in the chord display area 8211.

[0137] The rhythm editing area 822 includes a playback control area 8220 and a rhythm display area 8221. In the example in Figure 19, the rhythm display area 8221 shows each rhythm instrument on the vertical axis and time on the horizontal axis, with the horizontal axis divided into a grid for each beat. The playback control area 8220 is provided with various controls for controlling the playback of the rhythm displayed in the rhythm display area 8221 (solo, mute, volume).

[0138] The chord editing area 821 and the rhythm editing area 822 allow users to edit chords and rhythms, respectively, in response to user operations.

[0139] Returning to Figure 11, the user may repeat the process in step S110 any number of times. That is, the user may operate the composition instruction button 812 again to repeatedly perform the process of generating new composed music information 3300 using the composition function 413, for example, until the desired composed music information 3300 is generated.

[0140] Furthermore, in the composition editing screen 80, the history of compositions made using the composition function 413 is stored in the history display area 830. Therefore, the user may execute the process in step S110 multiple times and select the desired composed song information 3300 from the composed song information 3300 generated in each process.

[0141] In the next step S111, the display operation unit 400 may use the composition editing function 406 to edit the composed music information 3300 created in the process of step S110 in accordance with user operations.

[0142] Figure 20 is a schematic diagram showing how to edit the composed music information 3300 according to the embodiment. For example, as shown in Figure 20, the user can change the position (pitch, start position) and length of the bar 8111 to be edited, or add a new bar 8111, by operating, for example, the cursor 627 in the melody editing area 811. Similar operations are possible in the chord editing area 821. In the rhythm editing area 822, the user can change and add the timing of each instrument's sound by operating, for example, the cursor 627. The display operation unit 400 updates the composed music information 3300 according to the edits made by the user using the composition editing function 406.

[0143] In the next step S112, the display operation unit 400 uses the composition result export function 407 to export the composed music information 3300 in a format editable by the user host application 30. Here, the format editable by the user host application 30 is assumed to be performance information such as audio data and / or MIDI data.

[0144] Figure 21 is a schematic diagram illustrating the operation of executing the export process for composed music information 3300 according to the embodiment. In the composition editing screen 80, the output instruction button 832a is a button for exporting the composed music information 3300 as an audio file containing audio data based on the composed music information 3300. The output instruction button 832b is a button for exporting the composed music information 3300 as performance information such as MIDI data to, for example, the user host application 30 by drag and drop.

[0145] The control algorithm unit 410, using the composition result export function 407, executes the export of the composed music information 3300 as audio data in response to the user's operation of, for example, the cursor 627 on the output instruction button 832a, as shown in Figure 21. The control algorithm unit 410, using the composition result export function 407, also executes the export of the composed music information 3300 to the user host application 30 by drag-and-drop in response to the user's operation of, for example, the cursor 627 on the output instruction button 832b.

[0146] In the next step, S113, the user can edit the composed music information 3300 exported in step S112 using the user host application 30 as needed.

[0147] The user can obtain multiple pieces of pre-composed music information 3300 by, for example, repeating the process from steps S101 to S112. The user can then combine these multiple pieces of pre-composed music information 3300 obtained in this way on, for example, a user host application 30 to complete a song.

[0148] Figure 22 is a schematic diagram showing an example of an editing screen by the user host application 30 according to the embodiment.

[0149] In Figure 22, the editing screen 90 provided by the user host application 30 includes a track editing area 900 and a playback control area 910. In addition, in the example shown in Figure 22, the composition editing screen 80 provided by the composition support plugin 40, which is called from the user host application 30, is displayed overlaid on top of the image.

[0150] The track editing area 900 may include multiple track displays 9011, 9012, 9013, ... corresponding to each track. In the example in Figure 22, the melody, chord, and bass tracks are assigned to track displays 9011, 9012, and 9013, respectively. Also in the example in Figure 22, the data for each of the melody, chord, and bass tracks may be data exported as MIDI data by the composition support plugin 40.

[0151] Each track display 9011 to 9013 includes a playback control area 902, allowing for playback control of each track. In the track editing area 900, the vertical bar 903 indicates the current playback position.

[0152] The playback control area 910 is provided with controls for controlling the overall playback of the track editing area 900, and can be used to instruct the music to start playback, stop playback, fast forward playback, rewind playback, and repeat playback for a specified period.

[0153] The display operation unit 300 of the user host application 30 can edit the data of each track in response to user operations using the editing control function 302. For example, as shown in Figure 22, the display operation unit 300 may, using the editing control function 302, select a track in response to the user's operation of the cursor 920 and make the information of the selected track editable.

[0154] (2-2-1. Similarity Search According to an Embodiment) Next, the similarity search according to an embodiment, which relates to the processing of step S104 in the flowchart of Figure 11, will be described in more detail. Figure 23 is a flowchart of an example showing the similarity search process according to an embodiment.

[0155] Section (a) of Figure 23 is a flowchart illustrating an example of the similarity search process according to the embodiment. In Section (a) of Figure 23, in step S200, the control algorithm unit 410 of the composition support plugin 40 matches the key signatures of all target musical data using the similarity search function 412.

[0156] In other words, the key signature information 4401 (see Figure 9) of the reference sound source analysis information 4400 obtained by analyzing the reference sound source 50 and the key signature information 1111 (see Figure 8) of each learning style information 1100 stored in the DB 110 of the server device 10 are generally not unified. In this case, since it is difficult to compare the reference sound source analysis information 4400 with each learning style information 1100, all key signatures are transposed to the same key signature. For example, if the key signature information 1111 or 4401 of the learning style information 1100 or reference sound source analysis information 4400 is in a major key, the key signature information is transposed to C major. Also, if the key signature information 1111 or 4401 of the learning style information 1100 or reference sound source analysis information 4400 is in a minor key, the key signature information is transposed to A minor.

[0157] In the next step S201, the control algorithm unit 410 uses the similarity search function 412 to calculate the similarity of each learning style information 1100 with the reference sound source analysis information 4400, and assigns a score to each learning style information 1100 based on the calculated similarity. In the next step S202, the control algorithm unit 410 uses the similarity search function 412 to collect the scores assigned to each learning style information 1100 in step S201, and ranks each learning style information 1100 based on the collected scores.

[0158] In the next step S203, the display operation unit 400 of the composition support plugin 40 sorts each learning style information 1100 according to the ranking result using the search results list display function 404 and displays it on, for example, the learning style information list display screen 70a shown in Figure 17.

[0159] Section (b) of Figure 23 is a flowchart illustrating in more detail the process of step S201 in section (a) of the same figure. The flowchart in section (b) of Figure 23 shows the calculation of similarity and scoring between the reference sound source analysis information 4400 and one learning style information 1100.

[0160] In the flowchart shown in section (b) of Figure 23, steps S210 to S214 compare the reference sound source analysis information 4400 with one learning style information 1100 for multiple items and calculate the similarity. In step S215, a score evaluation value for the learning style information 1100 is calculated based on the similarity calculated for each item in steps S210 to S214.

[0161] In the flowchart of section (b) of Figure 23, in step S210, the control algorithm unit 410 uses the similarity search function 412 to perform a subcategory comparison of chord progressions between the reference sound source analysis information 4400 and the learning style information 1100.

[0162] In other words, in step S200 of the flowchart in section (a) of Figure 23, the key signatures of the reference sound source analysis information 4400 and the learning style information 1100 are matched, so the chord progression information 4403 in the reference sound source analysis information 4400 and the chord progression information 1121 in the learning style information 1100 can be compared. For example, suppose the chord progression information 4403 in the reference sound source analysis information 4400 is "C-F-G-C" and the chord progression information 1121 in the learning style information 1100 is "C-Dm-Em-G". In this case, the only chord common to both is "C", and the similarity (let's call the similarity (A)) can be calculated by using the number of target chords as the denominator and the number of common chords as the numerator, with "1 / 4 = 25%".

[0163] This process of comparing the chords themselves that are included in a chord progression is called a sub-category comparison of chord progressions.

[0164] In the next step S211, the control algorithm unit 410 uses the similarity search function 412 to perform a broad classification comparison of the chord progressions between the reference sound source analysis information 4400 and the learning style information 1100.

[0165] Chords in a song's chord progression can be classified into Tonic, Sub-dominant, and Dominant. Specifically, for example, in the key of C major, the chords can be classified as Tonic: C / Em / Am, Sub-dominant: F / Dm, and Dominant: G / Bm7-5. This classification by Tonic, Sub-dominant, and Dominant is called the major classification. Hereafter, for simplicity, Tonic, Sub-dominant, and Dominant will be abbreviated as T, S, and D, respectively, representing the major classification chords.

[0166] Once the processing of step S200 in the flowchart of section (a) of Figure 23 described above is completed, the chord progression information 4403 in the reference sound source analysis information 4400 and the chord progression information 1121 in the learning style information 1100 can be compared in terms of major categories. For example, if the chord progression information 4403 in the reference sound source analysis information 4400 is "C-F-G-C", the major categorized chord progression can be expressed as "T-S-D-T". Also, if the chord progression information 1121 in the learning style information 1100 is "C-Dm-Em-G", the major categorized chord progression can be expressed as "T-S-T-D". In this case, the major category codes common to both are "T" and "S", and the similarity (referred to as similarity (B)) can be calculated by using the number of target chords as the denominator and the number of common chords as the numerator, with "2 / 4 = 50%".

[0167] In this way, by representing each chord included in a chord progression with a broad category of chords, the chord progression can be expressed as a pattern (type), and by using broad categories, a more comprehensive comparison becomes possible compared to the minor category comparison mentioned above.

[0168] In the next step S212, the control algorithm unit 410 uses the similarity search function 412 to extract and compare characteristic chord progressions from the reference sound source analysis information 4400 and the learning style information 1100.

[0169] Characteristic chord progressions can sometimes be found in the cadence of the final part of a chord progression. Examples of characteristic chord progressions include those found in the cadence, such as "Dm-G-C", "G-Am", and "G-C". For example, if the chord progression information 4403 in the reference sound source analysis information 4400 is "C-F-G-C" and the chord progression information 1121 in the learning style information 1100 is "C-Dm-Em-G", then there are no characteristic chord progressions in either case. Therefore, the similarity between the two (denoted as similarity (C)) can be calculated by using the number of target chords as the denominator and the number of common chords as the numerator, with "0 / 4 = 0%".

[0170] In the next step S213, the control algorithm unit 410 compares the tempo information of the reference sound source analysis information 4400 and the learning style information 1100 using the similarity search function 412.

[0171] The tempo information 4402 in the reference sound source analysis information 4400 and the tempo information 1112 in the learning style information 1100 can be compared. For example, if the tempo information is expressed in BPM, and the tempo information 4402 in the reference sound source analysis information 4400 is BPM = 105 and the tempo information 1112 in the learning style information 1100 is BPM = 125, then the difference in BPM between the two is "20". Therefore, the similarity between the two (let's call the similarity (D)) can be calculated using this difference of "20" as a percentage value, with tempo information 1112 set to 100%, resulting in a numerical value such as "100% - 20 (%) = 80%".

[0172] In the next step S214, the control algorithm unit 410 compares additional information (meta-information) between the reference sound source analysis information 4400 and the learning style information 1100 using the similarity search function 412.

[0173] The music metadata 4404 in the reference sound source analysis information 4400 can be compared with the music metadata 1113 in the learning style information 1100. For example, suppose the music metadata includes genre information indicating "genre" and mood information indicating "mood," and the genre information in the music metadata 4404 in the reference sound source analysis information 4400 is "Rock" and the mood information is "Positive," and the genre information in the music metadata 1113 in the learning style information 1100 is "Rock" and the mood information is "Sad." In this case, since only the genre information "Rock" matches between the two, the similarity between them (let's call the similarity (E)) can be quantified as "1 / 2 = 50%" by using the number of items in the compared metadata as the denominator and the number of matching metadata items as the numerator.

[0174] In the next step S215, the control algorithm unit 410 uses the similarity search function 412 to calculate a score (score evaluation value) for the target learning style information 1100 based on the similarity scores calculated in steps S210 to S214 described above.

[0175] Let's explain the calculation of the score evaluation value in more detail. Steps S210 to S214 described above calculate the similarity (A), (B), (C), (D), and (E) to the learning style information 1100. Since these similarity values ​​(A), (B), (C), (D), and (E) each have different values, the score evaluation value can be calculated by applying predetermined coefficients to each and summing them up.

[0176] For example, let the coefficients for similarity (A), (B), (C), (D), and (E) be a, b, c, d, and e, respectively (a + b + c + d + e = 1.0), with coefficients a = 0.2, b = 0.3, c = 0.3, d = 0.1, and e = 0.1. In this case, the score evaluation value α for the learning style information 1100 in each example of steps S210 to S214 described above can be calculated by the following formula (1): α = Similarity (A) × a + Similarity (B) × b + Similarity (C) × c + Similarity (D) × d + Similarity (E) × e …(1)

[0177] By substituting the similarity values ​​(A) to (E) for each example in steps S210 to S214, the specific score evaluation value α is calculated using the following formula (2): α = 25% × 0.2 + 50% × 0.3 + 0% × 0.3 + 80% × 0.1 + 50% × 0.1 = 0.33 …(2)

[0178] Similarly, the processing described in steps S210 to S214 above is performed on other learning style information 1100 stored in the DB 110 of the server device 10 based on the target reference sound source analysis information 4400.

[0179] The control algorithm unit 410, using the similarity search function 412, collects each score evaluation value α calculated for each learning style information 1100 stored in the DB 110 of the server device 10 during the process of step S202 in the flowchart of section (a) of Figure 23.

[0180] Suppose that, after collecting the score evaluation value α for each learning style information 1100, the score evaluation value α for the first learning style information 1100 (let's call it learning style information #1) is 0.33, the score evaluation value α for the second learning style information 1100 (let's call it learning style information #2) is 0.25, and the score evaluation value α for the third learning style information 1100 (let's call it learning style information #3) is 0.6. When learning style information #1 to #3 are arranged in descending order of similarity based on the score evaluation value α, the order is as shown in equation (3) below: Learning style information #3 (α = 0.6) > Learning style information #1 (α = 0.33) > Learning style information #2 (α = 0.25) ... (3)

[0181] The control algorithm unit 410 may, using the similarity search function 412, output a score evaluation value α for each learning style information 1100 relative to the reference sound source analysis information 4400 in step S203 of the flowchart in section (a) of Figure 23. The control algorithm unit 410 may, using the similarity search function 412, describe each learning style information 1100 as a pair of identifiers and score evaluation values ​​α, arranged in order of similarity, for example, in JSON (JavaScript Object Notation) format (JavaScript is a registered trademark).

[0182] In the above description, the similarity search according to the embodiment is performed using a similar chord progression search function based on comparisons such as subcategory and major category comparisons of chord progressions, and comparisons of characteristic chord progressions, based on reference sound source analysis information 4400 and learning style information 1100. However, this is not limited to this example. For example, the similarity search may be performed using a similar melody search function based on the similarity of changes over time using concat, or a similar sound source search function based on overall comparison using mean, as disclosed in Patent Document 2 "International Publication No. 2024 / 042962".

[0183] (2-3. Modifications of the Embodiments) Next, modifications of the embodiments will be described.

[0184] In the above description, the processing in step S110 of the flowchart in Figure 11 is assumed to be the main melody of the song as the melody information 3303 in the composed song information 3300, but this is not limited to this example. For example, the melody information 3303 can also be applied to the performance parts of each instrument in a full orchestra.

[0185] Furthermore, while the above description assumes that the user host application 30 is music editing software (DAW) in steps S111 and S112 of the flowchart in Figure 11, this is not limited to this example. For example, the user host application 30 may be video editing software for performing video editing.

[0186] (3. Summary) By applying the embodiments of this disclosure, the composition support plugin 40 can be inserted into the user host application 30, thereby communicating with the server device 10, downloading each learning style information 1100 stored in the server database DB 110 of the server device 10, and storing it in the plugin database DB 440 of the composition support plugin 40.

[0187] Furthermore, according to the embodiments of this disclosure, the learning style information 1100 can be stored in DB440 in MIDI file format, and the server device 10 can be operated without making any changes to, for example, an existing MIDI file sales site.

[0188] According to the embodiments of this disclosure, the selection of learning style information 1100 can be performed using either the learning style information proposal type processing in steps S102 to S104 of the flowchart in Figure 11, or the learning style information refinement search type processing in step S105. Furthermore, a process using both methods in combination can also be used.

[0189] According to the embodiments of this disclosure, when a learning style information suggestion type processing is used in selecting learning style information 1100, similar learning style information 1100 can be suggested from the reference sound source 50. In this case, the sound source analysis function 411 in the control algorithm unit 410 of the composition support plugin 40 can be used to analyze the reference sound source 50.

[0190] Furthermore, according to the embodiment of this disclosure, the sound source analysis information 4400 is displayed by the sound source analysis information display function 402 in the display operation unit 400 of the composition support plugin 40. This allows the user to specify a range for the reference sound source 50, such as "I want to compose a song inspired by this part of this song," using the sound source range instruction by the range / search instruction function 403 in the display operation unit 400 of the composition support plugin 40, and the learning style information search instruction, and to issue a search instruction to the similarity search function 412 in the control algorithm unit 410 of the composition support plugin 40.

[0191] In this case, the similarity search function 412 can be used individually or in combination with the similarity search function and similarity melody search function described in Patent Document 2, as well as the similarity chord progression search process explained using the flowchart in Figure 23.

[0192] According to the embodiments of this disclosure, when a learning style information refinement search type processing is used in selecting the learning style information 1100, the display operation unit 400 of the composition support plugin 40 can use the tempo information 1112 in the learning style information 1100, as well as the music metadata 1113 in the learning style information 1100, such as "genre" and "mood," through the search results list display function 404.

[0193] According to the embodiments of this disclosure, in the learning style information suggestion type processing and the learning style information refinement search type processing, the user can search for their preferred learning style information 1100 while playing back and confirming the suggested or searched learning style information 1100. Furthermore, once the user has found their preferred learning style information 1100, they can obtain composed music information 3300 using the composition function 413 in the control algorithm unit 410 of the composition support plugin 40. By repeatedly performing the process of obtaining composed music information 3300 using this composition function 413 while playing back and confirming, the user can obtain their preferred composed music information 3300.

[0194] According to the embodiments of this disclosure, once a user obtains their preferred pre-composed song information 3300 as described above, they can further edit the pre-composed song information 3300 using the composition editing function 406 in the display operation unit 400 of the composition support plugin 40 to obtain edited pre-composed song information 3300.

[0195] According to the embodiment of this disclosure, when a user obtains edited composed music information 3300, the user can export the edited composed music information 3300 for external use using the composition result export function 407 in the display operation unit 400 of the composition support plugin 40. The destination to which the edited composed music information 3300 is exported may be the user host application 30 or another application.

[0196] According to the embodiments of this disclosure, once the edited composed music information 3300 is written to the user host application 30, the user can use the editing control function 302 in the display operation unit 300 of the user host application 30 to further edit the edited composed music information 3300 and obtain the composed music information.

[0197] Furthermore, the effects described herein are merely illustrative and not limiting, and other effects may also occur.

[0198] Furthermore, this technology can also take the following configurations: (1) An information processing device comprising: an analysis unit that acquires a reference sound source and analyzes the acquired reference sound source; an extraction unit that extracts style information similar to the reference sound source from style information used as training data when performing machine learning of a composition model, which has been created in advance based on the analysis results by the analysis unit; and a display unit that displays the style information extracted by the extraction unit. (2) The information processing device according to (1), wherein the style information includes at least chord progression information, and the analysis unit analyzes the reference sound source and outputs at least the chord progression information of the reference sound source as an analysis result. (3) The information processing device according to (2), wherein the extraction unit extracts style information similar to the reference sound source based on the similarity between the chord progression information based on the reference sound source and the chord progression information included in the style information. (4) The information processing apparatus according to (2) or (3), wherein the extraction unit extracts style information similar to the reference sound source based on the similarity between the pattern shown by the chord progression information of the reference sound source and the pattern shown by the chord progression information included in the style information. (5) The information processing apparatus according to any one of (2) to (4), wherein the style information further includes tempo information, the analysis unit analyzes the reference sound source and outputs the tempo information of the reference sound source as an analysis result, and the extraction unit further extracts style information similar to the reference sound source based on the similarity between the tempo information of the reference sound source and the tempo information included in the style information. (6) The information processing apparatus according to any one of (2) to (5), wherein the style information further includes metadata, the analysis unit analyzes the reference sound source and outputs metadata of the reference sound source as an analysis result, and the extraction unit further extracts style information similar to the reference sound source based on the similarity between the metadata of the reference sound source and the metadata included in the style information.(7) An information processing device according to any one of (1) to (6), further comprising: a playback unit that generates musical data including performance information using the composition model based on the style information, and plays back the generated musical data, wherein the playback unit generates the musical data based on style information selected from the style information displayed on the display unit. (8) An information processing device according to (7), wherein the playback unit plays back the musical data using a specified sound source. (9) An information processing device according to (7) or (8), wherein the information processing device is incorporated into and used in another information processing device capable of editing the musical data. (10) An information processing device according to (9), wherein the playback unit passes the generated musical data to the other information processing device. (11) An information processing device according to any one of (1) to (10), wherein the extraction unit extracts style information similar to the reference sound source based on a specified range for the analysis result. (12) The display unit displays the style information similar to the reference sound source extracted by the extraction unit according to the degree of similarity with the reference sound source, as described in any of (1) to (11). (13) The extraction unit acquires the style information from a server device via a communication network, as described in any of (1) to (12). (14) An information processing method comprising: an acquisition of a reference sound source by a processor; analysis of the acquired reference sound source; extraction of style information similar to the reference sound source from style information used as training data when performing machine learning of a composition model, which has been created in advance, based on the analysis results of the reference sound source; and displaying the extracted style information.(15) An information processing program for causing a computer to function as: an analysis unit that acquires a reference sound source and analyzes the acquired reference sound source; an extraction unit that extracts style information similar to the reference sound source from style information used as training data when performing machine learning of a composition model, which has been prepared in advance, based on the analysis results by the analysis unit; and a display unit that displays the style information extracted by the extraction unit.

[0199] 1 Production System 10 Server Devices 11, 110, 330, 440 DB 20, 20a, 20b Client Devices 30, 30a, 30b User Host Application 40 Composition Support Plugin 50 Reference Sound Source 60 Initial Screen 61 Analysis Execution Screen 62 Analysis Result Display Screen 70, 70a, 70b Learning Style Information List Display Screen 80 Composition Editing Screen 90 Editing Screen 100, 320, 430 Control Unit 101 Learning Style Information Management Function 102 User Management Function 103 Application Communication Function 300, 400 Display Operation Unit 301 Playback Instruction Function 302 Editing Control Function 310, 420 Playback Performance Unit 311 Playback Information Reception Function 312, 422 Playback Sound Source Function 321, 431 History Management Function 401 Sound Source Upload Function 402 Sound Source Analysis Information Display Function 403 Range / Search Instruction Function 404 Search Results List Display Function 405 Composition Instruction Function 406 Composition Editing Function 407 Composition Result Export Function 408 Search Filter Function 410 Control Algorithm Section 411 Sound Source Analysis Function 412 Similarity Search Function 413 Composition Function 421 Playback Information Sending Function 432 Server Communication Function 600 File Upload Area 601 File Selection Button 610 Meta Information Area 620 Analysis Result Display Area 621 Waveform Display 622 Analysis Range 627, 920 Cursor 700 Condition Setting Section 710 Tag Specification Section 720a, 720b List Display Area 740 Analysis Result Summary Display Area 750 Analysis Result Details Display Area 800 Style Name Display Area 810 Composition Control Area 811 Melody Editing Area 812 Composition command button 813 Composition parameter setting area 814, 902, 910 Playback control area 815 Tempo control area 816 Key control area 817 Volume adjustment knob 820 Composition aid area 821 Chord editing area 822 Rhythm editing area 830 History display area 831 Sound source assignment area 832a, 832b Output command button 900 Track editing area 9011, 9012, 9013 Track display 1100 Learning style information 1111 Key signature information 1112 Tempo information 1113 Song metadata information 1120 Music information within learning style information 1121, 3301 Chord progression information 1122,3302 Bass progression information 1123, 3303 Melody information 3300 Composed song information,

Claims

1. An information processing device comprising: an analysis unit that acquires a reference sound source and analyzes the acquired reference sound source; an extraction unit that extracts style information similar to the reference sound source from style information used as training data when performing machine learning of a composition model, which has been created in advance, based on the analysis results by the analysis unit; and a display unit that displays the style information extracted by the extraction unit.

2. The information processing apparatus according to claim 1, wherein the style information includes at least chord progression information, and the analysis unit analyzes the reference sound source and outputs at least the chord progression information of the reference sound source as an analysis result.

3. The information processing apparatus according to claim 2, wherein the extraction unit extracts style information similar to the reference sound source based on the degree of similarity between the chord progression information based on the reference sound source and the chord progression information included in the style information.

4. The information processing apparatus according to claim 2, wherein the extraction unit extracts style information similar to the reference sound source based on the degree of similarity between the pattern shown by the chord progression information of the reference sound source and the pattern shown by the chord progression information included in the style information.

5. The information processing apparatus according to claim 2, wherein the style information further includes tempo information, the analysis unit analyzes the reference sound source and outputs the tempo information of the reference sound source as an analysis result, and the extraction unit further extracts style information similar to the reference sound source based on the similarity between the tempo information of the reference sound source and the tempo information included in the style information.

6. The information processing apparatus according to claim 2, wherein the style information further includes metadata, the analysis unit analyzes the reference sound source and outputs metadata of the reference sound source as an analysis result, and the extraction unit further extracts style information similar to the reference sound source based on the similarity between the metadata of the reference sound source and the metadata included in the style information.

7. The information processing apparatus according to claim 1, further comprising: a playback unit that generates musical data including performance information using the composition model based on the style information, and plays back the generated musical data, wherein the playback unit generates the musical data based on style information selected from the style information displayed on the display unit.

8. The information processing apparatus according to claim 7, wherein the playback unit plays back the music data using a specified sound source.

9. The information processing device according to claim 7, wherein the information processing device is used by being incorporated into another information processing device capable of editing the music data.

10. The information processing apparatus according to claim 9, wherein the playback unit passes the generated music data to the other information processing apparatus.

11. The information processing apparatus according to claim 1, wherein the extraction unit extracts style information similar to the reference sound source based on a specified range for the analysis results.

12. The information processing apparatus according to claim 1, wherein the display unit displays the style information extracted by the extraction unit that is similar to the reference sound source, according to the degree of similarity with the reference sound source.

13. The information processing apparatus according to claim 1, wherein the extraction unit acquires the style information from a server device via a communication network.

14. An information processing method comprising: a processor acquiring a reference sound source; analyzing the acquired reference sound source; extracting style information similar to the reference sound source from pre-created style information used as training data when performing machine learning on a composition model, based on the analysis results of the reference sound source; and displaying the extracted style information.

15. An information processing program for causing a computer to function as: an analysis unit that acquires a reference sound source and analyzes the acquired reference sound source; an extraction unit that extracts style information similar to the reference sound source from style information used as training data when performing machine learning of a composition model, which has been created in advance, based on the analysis results by the analysis unit; and a display unit that displays the style information extracted by the extraction unit.