Music key estimation method, music key estimation device, and music key estimation program
The method aggregates notes for candidate keys and uses the first and fifth notes to estimate music keys, addressing the challenge of key estimation without training data and providing a visually clear representation of key transitions.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- 藤澤 達矢
- Filing Date
- 2023-10-06
- Publication Date
- 2026-06-11
- Estimated Expiration
- Not applicable · inactive patent
AI Technical Summary
Existing methods struggle to accurately estimate the key of a music piece based on musical scores without training data, particularly in distinguishing between major and minor scales, and face inaccuracies when using electronically recorded metadata.
A method that aggregates notes for each candidate key, selects pairs of parallel keys with the largest aggregate result, and estimates the key based on the first and fifth notes, while considering continuity and weighting initial and final sections.
Enables quantitative key estimation consistent with music theory, allowing for accurate determination of key changes and visual representation of key transitions.
Smart Images

Figure 0007873218000001 
Figure 0007873218000002 
Figure 0007873218000003
Abstract
Description
【Technical Field】 【0001】 The present invention relates to a method, apparatus, and program for estimating a key of music. 【Background Art】 【0002】 The key of music is a term that combines the pitch class of the tonic, such as C major or A minor, and the type of major or minor scale, and represents the combination and order of musical notes basically used in that music. 【0003】 As a method for discriminating keys, there is a method based on the type and number of sharp and flat key signatures described at the beginning of a musical score. Also, in Patent Document 1, a musical sound signal of a music piece is subjected to short-time frequency analysis to detect pitch feature amounts, and the time-series information of the detected pitch feature amounts is learned using learning data, and then searched using a plurality of hidden Markov models (HMMs) for each key modeled in a plurality of harmonic states, and a key estimation method for estimating the key of the music piece as the key of the HMM with the highest likelihood is disclosed. 【Prior Art Documents】 【Patent Documents】 【0004】 【Patent Document 1】 Japanese Patent Application Laid-Open No. 2007-041234 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0005】 According to the prior art, it has become possible to estimate a key from the sound source of a music piece by learning using learning data. 【0006】 On the other hand, when considering estimating the key from sources other than sound sources, such as musical scores, without using training data, it becomes difficult to distinguish between major and minor scales when determining the key based on the key signature of the score. Generally, major scales are described subjectively as having a bright feel, and minor scales as having a melancholic feel, but we want to estimate major and minor scales quantitatively. 【0007】 Furthermore, we want to estimate whether a musical piece is major or minor in electronically recorded musical note data, such as MIDI® (Musical Instrument Digital Interface) data. While standards like MIDI® allow for the recording of key signatures and scales as metadata, this recorded information is merely added by the creator and may not be accurate. Moreover, this information may not be recorded at all in the musical piece data. 【0008】 This invention was made in view of the above-mentioned circumstances, and aims to solve the problem of providing a novel technique for quantitatively estimating the key without using training data, based on a basis consistent with music theory. [Means for solving the problem] 【0009】 To solve the above problems, the present invention is a method for estimating the key of music, wherein a computer aggregates the constituent notes for each candidate key from the musical notes indicated by the musical piece information, selects at least one pair of parallel keys with the largest aggregate result, aggregates the first and fifth notes of each key for the pair of parallel keys, and estimates the key with the largest result as the key of the music. 【0010】 According to the present invention, the key of a piece of music can be quantitatively estimated, along with a basis consistent with music theory. Furthermore, it is possible to estimate the key without using training data. 【0011】 Furthermore, in a preferred embodiment of the present invention, the computer divides the music into multiple sections at arbitrary time intervals, The key is estimated for each of the aforementioned intervals, and the key of the music is estimated from the estimation results for each interval. In some songs, the key may change (modulate) midway through, which allows us to estimate the key for each section in chronological order. 【0012】 Furthermore, in a preferred embodiment of the present invention, if the computer selects two or more pairs of parallel tones with the largest aggregate results for a given interval, it selects the parallel tones for that interval using information from at least one of the preceding or succeeding intervals. There is continuity between the intervals, and it is usually assumed that the previous key is continuous. Therefore, even for intervals where a parallel key could not be selected, it can be estimated from the content of the preceding and following intervals. 【0013】 In a preferred embodiment of the present invention, the computer weights the estimation results for the first and last intervals twice as much as the other intervals. This allows us to estimate the key of the music by weighting the first and last sections. 【0014】 In a preferred embodiment of the present invention, the computer removes octave information from the musical notes indicated by the music information and aggregates the constituent notes for each candidate key. This allows for optimal key estimation. 【0015】 In a preferred embodiment of the present invention, the computer displays a display screen based on the musical information, the display screen comprising a note counting result display area, a constituent tone counting result display area, and an estimation result display area, wherein the note counting result display area displays the number or proportion of notes for each pitch class included in the music, and displays a different color for each pitch class, the constituent tone counting result display area displays the number or proportion of constituent notes for each key candidate, and displays the color according to the pitch class that is the tonic of the key, and the estimation result display area displays the number or proportion of the first and fifth notes of each key for the selected pair of parallel keys, and displays the color for the first and fifth pitch classes. This allows for quantitative estimation of the key, supported by a basis consistent with music theory, and can also be represented in a visually easy-to-understand manner. 【0016】 Furthermore, in a preferred embodiment of the present invention, the computer divides the music into multiple sections of equal time intervals, and the note counting result display area, the constituent tone counting result display area, and the estimation result display area display the number or occurrence rate of notes for each section, as well as the color. This allows for the estimation of keys for each interval in a time series, and also enables a visually clear representation. 【0017】 In a preferred embodiment of the present invention, the computer divides the music into multiple sections of equal time intervals and displays a display screen based on the music information, the display screen comprising a note counting result display area and a constituent tone counting result display area, the note counting result display area displays the number or proportion of notes for each pitch class included in the music for each section and displays a different color for each pitch class, the constituent tone counting result display area groups the counting results for two parallel keys together as a key candidate and displays the number or proportion of constituent notes for each key candidate and for each section, and displays the color according to the pitch class of the estimated key's tonic among the two tonics of the parallel keys. As a result, the key can be estimated for each interval in time series and can be visually represented in an easy-to-understand manner. 【Advantages of the Invention】 【0018】 According to the present invention, it is possible to provide a novel technique for quantitatively estimating a key based on a basis that is consistent with music theory. 【Brief Description of the Drawings】 【0019】 [Figure 1] It is a functional block diagram according to an embodiment of the present invention. [Figure 2] It is a hardware configuration diagram of a key estimation device according to an embodiment of the present invention. [Figure 3] It is an overall processing flowchart of key estimation according to an embodiment of the present invention. [Figure 4] It is an example of a screen display of a display screen according to an embodiment of the present invention. [Figure 5] It is an example of a screen display of a display screen according to an embodiment of the present invention. [Figure 6] It is a processing flowchart of major key selection according to an embodiment of the present invention. [Figure 7] It is a final processing flowchart of key estimation according to an embodiment of the present invention. [Figure 8] It is an example of a screen display of a display screen according to an embodiment of the present invention. [Figure 9] It is an example of a screen display of a display screen according to an embodiment of the present invention. [Figure 10] It is an example of a screen display of a display screen according to an embodiment of the present invention. [Figure 11] It is an example of a screen display of a display screen according to an embodiment of the present invention. [Figure 12] It is an example of a screen display of a display screen according to an embodiment of the present invention. [Figure 13] It is an example of a screen display of a display screen according to an embodiment of the present invention. 【Modes for Carrying Out the Invention】 【0020】 The following describes a music key estimation device according to an embodiment of the present invention, using the drawings. Note that the embodiments shown below are examples of the present invention and the present invention is not limited to these embodiments. While the configuration and operation of the music key estimation device are described in this embodiment, similar effects can be achieved by methods executed by the key estimation device (music key estimation method), computer programs executed by the key estimation device (music key estimation program), etc. The computer program may be provided as a computer-readable, non-transient recording medium. 【0021】 As illustrated in Figure 1, the key estimation device 1 includes, as functional components to be described in detail later, a song information receiving unit 2, a section setting unit 3, a note counting unit 4, a candidate counting unit 5, a key estimation unit 6, and a display processing unit 7. 【0022】 In this embodiment, the key estimation device 1 is a computer such as a personal computer, smartphone, or tablet that functions as these functional components by executing a computer program (music key estimation program) stored on the local drive. On the other hand, the key estimation device 1 may be a server device that functions as these functional components by executing a computer program (a music key estimation program) stored on a local drive. In this case, for example, the user of the application can access this server via a network using a specific ID and password and use the application on a designated browser (a so-called web application). 【0023】 To elaborate further, the key estimation device 1 described above includes hardware components, as illustrated in Figure 2. Specifically, the key estimation device 1 comprises a CPU (Central Processing Unit) 11 as a processor, RAM (Random Access Memory) 12 as working memory, and ROM (Read Only Memory) 13 that stores the boot program for startup. 【0024】 The key estimation device 1 further comprises a non-volatile flash memory 14 for rewritable storage of the OS (Operating System), application programs, and various information (including data), a communication control unit 15, and a communication interface (IF) unit 16 such as a NIC (Network Interface Card). The key estimation device 1 also further comprises a display control unit 17, a display unit 18, and an information input / specification unit 19, etc. 【0025】 In order to logically implement the above-mentioned functional components (2-7) in the key estimation device 1, the key estimation program is installed as an application program in the flash memory 14. Then, in the key estimation device 1, upon power-on, the processor (CPU) 11 constantly loads this processing program into the RAM 12 and executes it. 【0026】 <Details on how to estimate the key of a song> Next, with reference to Figures 3-13, we will describe in more detail the key estimation device 1, which estimates the key of a song from song information. 【0027】 The display processing unit 7 processes at least one display screen (see reference numeral 41 in Figures 4, 5, 8-13) for display. The display control unit 17 receives the display processing result and causes the display unit 18 to display the display screen 41. In each figure, each component displayed on the display screen 41 is shown virtually separated by a dotted line frame. When the music key estimation device 1 is used in the form of a web application, the display processing unit 7 sends predetermined resource information written in HTML, CSS, JavaScript (registered trademark), etc., back to the web browser of the terminal device used by the application user. 【0028】 The music information receiving unit 2 receives music information (see processing S31 in Figure 3). The music information is electronically recorded music information containing the musical notes and timing information for each note, and is included, for example, in an electronic music score data file. In this embodiment, the electronic music score data file is assumed to be MIDI® data, but it is not limited to this, and may be an electronic music score data file in MusicXML or other formats, for example. 【0029】 In this embodiment, as shown in Figure 4, the display screen 41 includes a music information registration area 42, and various files stored in the flash memory 14 or the like can be accessed via the display screen 41. The user operates the music information registration area 42 to select a desired electronic music score data file to be used for key estimation, and the music information receiving unit 2 receives the selected electronic music score data file. When the music information receiving unit 2 reads the MIDI® data (electronic music score data file) for the target music, it obtains the note number representing the pitch and the time data (timing information) of the note-on event in which that sound is produced. 【0030】 The section setting unit 3 divides the song into predetermined sections, enabling key estimation, which will be described later, to be performed for these sections. Here, the section setting unit 3 divides the song into multiple sections of the same time interval, dividing the registered song into 10 time frames (sections). The section setting unit 3 divides the song into any number of time frames, but the song may also be divided into multiple time frames based on any time interval (e.g., 5 seconds) or number of measures (e.g., 4 measures). However, it is clear that key estimation is impossible if the number of pitch classes is insufficient, for example, if there is only one note. Alternatively, if there are many notes but a bias towards a particular pitch class, multiple candidates may arise, making estimation difficult, so it is not advisable to divide the time into very short timeframes. Furthermore, if there is a modulation in the middle of the time frame and multiple keys exist, the data may influence each other, potentially leading to inaccurate estimations. Therefore, estimating overly long time frames is also undesirable. Therefore, it is necessary to divide the time into time slots of appropriate length, but it is not necessary to divide it into time slots of the same length; it may be divided into time slots of different lengths. However, if it is divided into time slots of different lengths, when estimating the key of the entire piece, it is necessary to consider the length of each time slot and use the key that appears for the longest duration as the key of the entire piece. In other words, when estimating the key of a piece of music, the lengths of some or all of the time slots (intervals) may be different. In this case, the interval setting unit 3 may accept the length (time) of each time slot arbitrarily set by the user, or it may dynamically set the length (time) of each time slot according to the number of notes included in the piece. On the other hand, even if there are temporary biases in pitch class or modulations within a song, if there is a single central key within the song's timeframe and a sufficient number of samples are available, key estimation is considered possible. 【0031】 The note counting unit 4 counts the number of each note included in the music (see process S32 in Figure 3). Here, the note counting unit 4 removes octave information from the note numbers obtained from the MIDI® data (electronic musical score data file), and for the above-mentioned time frame, it sums the number of notes for each of the 12 pitch classes and divides by the total number of notes to calculate the note composition ratio for each pitch class. 【0032】 The display processing unit 7 displays the aggregation results in the corresponding time frame on the display screen. As shown in Figure 4, the display screen 41 includes a note aggregation result display area 44, which displays the composition ratio for each pitch class aggregated by the note aggregation unit 4 for the interval. In addition, a graph representing the relationship between notes and time is displayed overlaid on the number of notes or composition ratio. Here, the composition ratio is calculated by dividing each pitch class by the total number of notes, but this process is not necessarily required; the total number of notes in each category can be aggregated, compared, and estimated instead. Also, the number of divisions is not limited to 10. 【0033】 In the note counting result display area 44, time progresses downwards on the screen. In the diagram, "Time %" indicates the proportion and order of each time frame when the entire song is considered as 100, with 10 time frames arranged in 10 percent increments. The "Measure" next to the time frame indicates the measure number of the song, and in this case, the measure number of the last measure included in the time frame is displayed. In each time frame, notes are displayed as horizontal lines in a graph according to the timing of their appearance. The color of the graph (horizontal line) is displayed in a different color for each pitch class. In the embodiment shown in Figure 4, 12 colors with gradient hues (red~yellow~green~blue~purple~pink) are assigned to each of the 12 pitch classes from C to B, so that the notes appearing in each time frame are color-coded according to their pitch class. Therefore, the graph display in the note counting result display area 44 makes it possible to visually understand when notes of which pitch class appear in the song, and which pitch class notes appear frequently / infrequently / are not used. The pitch class colors are not limited to these. 【0034】 The candidate aggregation unit 5 aggregates the constituent notes for each key candidate (see processing S33 in Figure 3). Here, the candidate aggregation unit 5 sums and displays the proportion of notes that belong to the relative keys (major scale and minor scale) with 12 pitch classes as the tonic for 10 time frames. Alternatively, the total number of corresponding notes may be aggregated, compared, and estimated. Calculating all major and minor keys for the 12 pitch classes would result in 24 combinations, but since there are two keys (relative keys) with common constituent notes, 12 calculations are actually performed. Furthermore, in the following, the scales of two keys that are in a relative key relationship will be expressed in the form of "major scale (minor scale)". 【0035】 As shown in Figure 4, the display screen 41 includes a note aggregation result display area 45, which displays the note composition ratio for each note of the scale aggregated by the candidate aggregation unit 5 for each interval. For example, if we refer to the note aggregation result display area 44 for the 1st to 5th, 9th and 10th time frames (Time %=10~50, 90 and 100), we find that in each time frame only "G, A, B, C, D, E, F#" are used, and these are all notes of the G major scale (Em scale). When the candidate aggregation unit 5 sums up the composition ratio of each note in the note aggregation result display area 44, it can be determined that the notes of the G major scale (Em scale) make up 100%. However, for the third time frame (Time %=30), the sum of the constituent notes of the C major scale (Am scale) is also 100%. This is because in the 1st, 4th, 6th, 7th, 8th, 9th, and 10th time frames, the constituent ratios of the positive notes E and F# are both 0, and the pitch classes used are limited to only "C, D, G, A, B," resulting in a bias. This makes it possible to determine whether it is either the C major scale (Am scale) or the G major scale (Em scale). In such cases, if one of the candidates is the same key as the previous time frame, it is determined to be the same key as the previous time frame. If it cannot be determined in the previous time frame, the same determination is made in the key of the subsequent time frame. Furthermore, in the second and fifth timeframes (Time %=20, 50), the proportion of E is 0, but the proportion of F# is positive. Therefore, the sum of the proportions of the notes that make up the G major scale (Em scale) is 100%, and it is determined to be the G major scale (Em scale). Based on the scale of the second timeframe, the third timeframe is also determined to be on the G major scale (Em scale). Furthermore, in the 6th and 7th timeframes (Time %=60, 70), the previously used C becomes 0, while the previously unused C# becomes positive. The pitch classes used change to "C#, D, E, F#, G, A, B," and only the sum of the constituent notes of the D major scale (Bm scale) reaches 100%, suggesting that a modulation occurred. Furthermore, in the eighth time frame (Time %=80), although it includes two keys due to another modulation, the influence of the key before the modulation is small, and the total proportion of the constituent notes of the G major scale (Em scale) is the highest on its own at 94%. 【0036】 The relative key with the largest number or proportion of notes is considered the most suitable key. Therefore, the candidate aggregation unit 5 selects the key (relative key) with the largest aggregated value and displays it with a colored marker. At this time, the display processing unit 7 colors the corresponding relative key according to the pitch class of the major scale key. Thus, the key transitions within the song can be visually grasped by the color display in the constituent note aggregation result display area 45. When multiple relative keys are candidates, each candidate relative key is colored (see constituent note aggregation result display area 45 in Figure 4), but it may also be configured to color only the relative key selected by the processing described later. 【0037】 There are two keys, major and minor, which are essentially the same combination but differ in order; these are called relative keys. Therefore, we will estimate which key, major or minor, is more appropriate for the relative key selected above. 【0038】 As an example, let's explain C major and A minor, which are relative keys. In the case of C major, the main chords (primary triads) are C as the first major chord, and the notes at the 1st (C), 4th (F), and 5th (G) positions are topped with a note a third above them, twice. All three of these chords are major chords, and the fact that these chords are primarily used indicates that the key is C major. Furthermore, in the case of A minor, the main chords are those formed by stacking a note a third above the first (A), fourth (C), and fifth (E) notes twice, with A as the first major chord. All of these are minor chords. The fact that these minor chords are primarily used indicates that the music is in A minor. 【0039】 Assuming a song consists of only three main chords, its characteristics would be as follows: The three main chords of C major are CEG, FAC, and GBD. Here, only C and G appear twice, while the other notes appear only once. C and G are in the 1st and 5th positions of the C major scale. The three main chords of A minor are ACE, DFA, and EGB. Here, only A and E appear twice, while the other notes appear only once. A and E are in the 1st and 5th positions of the A minor scale. 【0040】 Therefore, if a single central key exists and a sufficient number of samples are available, the composition of its pitch class is expected to exhibit the characteristics described above. By comparing the total number of notes in the 1st and 5th positions in two parallel keys that have essentially the same combination, it is possible to estimate whether the key is major or minor. However, while key estimation is performed under the assumption that a single key exists, a song may involve modulation, meaning that multiple keys may exist. When multiple keys exist, they may influence each other, potentially leading to an incorrect key estimation, which must be avoided. Since it is impossible to determine in advance whether or not there is modulation, the song is mechanically divided into multiple time frames, the key is estimated for each time frame, and the results are combined to estimate the key of the entire song. Furthermore, when determining a key, if there are multiple candidates, the system will consider it the same candidate if it exists in the previous time frame. If it does not exist in the previous time frame, the system will consider it the same candidate if it exists in the next time frame. This is because there is continuity between time frames, and it is usually assumed that previous keys are consecutive. If there are no matches for the candidate in either the preceding or succeeding time frame, a key estimation is performed separately on a time frame that connects the target time frame and the preceding and succeeding time frames. The estimation results for the unidentified time frame and the preceding and succeeding time frames are then replaced with the estimation results for the connected time frame. This is because the reason multiple candidates appear is likely due to the short timeframe and the lack of sufficient data to estimate the key. However, even when the time frame is extended to encompass the entire piece, there is a bias in the pitch classes used, resulting in some pieces having multiple candidates for parallel keys. For example, in contrast to the diatonic scale, which consists of seven notes, the pentatonic scale, which consists of five notes, omits the fourth and seventh pitch classes in the case of a major scale, and omits the second and sixth pitch classes in the case of a minor scale. Therefore, multiple candidates for relative keys appear. Many other scales also exist. In such cases, the sum of the proportions of the first and fifth pitch classes is calculated for all of the candidate parallel keys, and the key with the largest value is estimated as the key of the song. (See processing S3410 in Figure 6 below) In this example, the measure number of the target range is entered into the text box, and the K button is pressed to activate the program. (See Figures 10 and 13 below.) As an example, we will estimate the key of a song where only five pitch classes—C, D, F, G, and A—are used throughout the entire piece. There are three possible parallel keys, and in all three pairs, the sum of the constituent notes of the key is 100%. The three pairs are the C major scale (Am scale), the F major scale (Dm scale), and the Bb major scale (Gm scale). By comparing the combined proportions of the first and fifth keys of the three pairs of parallel keys, we estimate the key to be F major. (See Figure 13 below.) This is presumably a pentatonic major scale with the 4th and 7th pitch classes omitted. 【0041】 Therefore, the key estimation unit 6 selects the parallel key with the largest aggregate result from the candidate aggregation unit 5 (see process S34 in Figure 3, and processes S3400-3403 and 3411 in Figure 6). Then, for each key, it aggregates the number or composition ratio of the first and fifth notes and estimates the key with the largest value to be the key of that time frame (see process S35 in Figure 3, and processes S351-353 and 359 in Figure 7). When selecting parallel tones, if multiple pairs of parallel tones are candidates based on the aggregation results for a given interval, the key estimation unit 6 selects a parallel tones for that interval from the estimation results of the immediately preceding or succeeding interval. If multiple pairs of parallel tones are candidates based on the aggregation results for a given time frame n (n=1, 2, 3, ...N) (see NO in process S3402 in Figure 6), the key estimation unit 6 first determines whether the parallel tones of the immediately preceding time frame (n-1) are included as candidates. If they are included (see YES in process S3404 in Figure 6), the key estimation unit 6 selects the parallel tones of time frame (n-1) as the parallel tones of time frame (n) (see process S3405 in Figure 6). On the other hand, if they are not included (see NO in process S3404 in Figure 6), the key estimation unit 6 determines whether the parallel tones of the immediately following time frame (n+1) are included as candidates (see process S3406 in Figure 6). If included (see YES in process S3406 in Figure 6), the parallel key of time frame (n+1) is selected as the parallel key of time frame (n) (see process S3407 in Figure 6). If the key estimation unit 6 cannot select a parallel key for a given interval from the estimation results of the preceding or succeeding interval (see NO in process S3406 in Figure 6), the number of the best pair of parallel keys is displayed in the Comments section of the constituent note aggregation result display area 45. Instead, the aggregation result of the notes in a given interval and the aggregation of the notes in at least one of the adjacent intervals (i.e., by connecting a given interval with an adjacent interval and expanding the length of the time frame) are added together to aggregate the constituent notes for all key candidates, and the parallel key is selected again from this aggregation result. If expanding the length of the time frame results in only one pair of candidate parallel keys (see YES in processing S3408 in Figure 6), the selected parallel key is treated as the parallel key of a certain section and the parallel key of the connected section (see processing S3409 in Figure 6). If expanding the length of the time frame does not result in only one pair of candidate parallel keys (see NO in processing S3408 in Figure 6), the composition ratio of the first and fifth pitch classes is tallied for all of the multiple parallel keys selected as candidates for the expanded time frame, and the key with the largest tally result is set as the key of the expanded time frame (see processing S3410 in Figure 6). Expanding the time frame may be done by connecting it to the previous and / or subsequent time frames, or the entire musical piece may be treated as a single time frame.If the expanded time frame exceeds 50% of the entire song, the key estimation result for the expanded time frame becomes the key for the entire song. If it is less than 50%, the process proceeds to the next step, S35. For all of the multiple parallel keys selected as candidates for time frame (n), the composition ratio of the first and fifth pitch classes may be tallied, and the key with the largest tallied result may be used as the key for that time frame. For the expanded time frame, for example, the time frame expanded by connecting time frame (n) and the immediately preceding time frame (n-1) to create ((n-1)+n), the process may be performed in stages by connecting (expanding) time frames to determine whether there is only one pair of parallel keys (see process S3402 in Figure 6), or whether the candidate for parallel keys is determined to be one pair by comparing it with the parallel keys of the preceding (n-2) and succeeding (n+1) time frames (see processes S3404-3406 in Figure 6), until the entire song becomes the expanded time frame. Alternatively, the section setting unit 3 may divide the song into longer timeframes than the current one, and the processing may be repeated by the note counting unit 4, the candidate counting unit 5, and the key estimation unit 6. As an example, the estimation results are shown for timeframes of 20% and 25%. Furthermore, when estimating a key from a pair of parallel keys, if both keys become candidate keys from the aggregate processing results for a certain interval, that is, if the aggregate results of both keys are the same or can be considered the same, the key estimation unit 6 determines the key for a certain interval from the estimation results of the interval immediately preceding or following it. If both keys become candidate keys from the aggregate results of a certain time frame n (n=1, 2, 3, ...N) (see NO in processing S353 in Figure 7), the key estimation unit 6 first determines whether the estimation result (key) of the immediately preceding time frame (n-1) is included as a candidate. If it is included (see YES in processing S354 in Figure 7), the estimation result (key) of time frame (n-1) is selected as the estimation result (key) of time frame (n) (see processing S355 in Figure 7). On the other hand, if it is not included (see NO in process S354 in Figure 7), it is determined whether the estimated result (key) of the immediately following time frame (n+1) is included as a candidate (see process S356 in Figure 7). If it is included (see YES in process S356 in Figure 7), the estimated result (key) of time frame (n+1) is selected as the estimated result (key) of time frame (n) (see process S357 in Figure 7). If the key estimation unit 6 cannot select a key for a given interval from the estimated results of the immediately preceding or immediately following interval (see NO in process S356 in Figure 7), it instead sums the total number of notes in the interval and the total number of notes in at least one of the adjacent intervals (i.e., connects the interval with an adjacent interval), and for each key, it aggregates the number or composition ratio of the first and fifth notes, and estimates the key with the largest value to be the key for that time frame. The key estimation unit 6 uses the scale determined from this aggregation result as the key of a certain section and the key of connected sections. Alternatively, the section setting unit 3 may divide the music into longer timeframes than the current one, and the processing may be repeated by the note aggregation unit 4, the candidate aggregation unit 5, and the key estimation unit 6. As an example, the estimation results are shown for timeframes of 20% and 25%. 【0042】 As shown in Figure 4, the display screen 41 includes an estimation result display area 46, which displays the number or composition ratio of the first and fifth notes aggregated by the key estimation unit 6 for each key. For example, the first and fifth notes of G major are G and D, and the first and fifth notes of E minor are E and B. Referring to the estimation result display area 46 for the first time frame (Time %=10), the major scale of G+D is 42.86, and the minor scale of E+B is 28.57, with the major scale being 14.29 larger in difference. Therefore, the G major scale is estimated as the key for the first time frame. Furthermore, in this embodiment, the estimation results of the first and last time slots are added again to the 10 time slots, and in the final estimation results for the 12 time slots, the key that appears for the longest duration on its own is estimated to be the key for the entire song. As a result, the key for the entire song is estimated to be the G major scale. Since the first and last parts of the song are more important than the other parts, the weight of the estimation results of the first and last time slots is set to twice that of the other time slots. That is, in this embodiment, the key estimation unit 6 determines the key for the entire song by summing up the number of keys, i.e., by aggregating or counting them. Since there are 10 intervals estimated to be the G major scale and 2 intervals estimated to be the D major scale, the key estimation unit 6 estimates the key for the entire song to be the G major scale. In this embodiment, the first and last time slots are weighted to twice that of the other time slots, and the key that appears most often (longest duration) on its own is estimated to be the key for the entire song (see process S360 in Figure 7). 【0043】 At this time, the display processing unit 7 displays colored graphs corresponding to the first and fifth values, respectively, overlaid on the displayed numbers. Using the aggregated results of the first and fifth notes of each scale and the corresponding colors, a stacked bar graph is displayed (see "Major 1st+5th" and "Minor 1st+5th" in the figure). In addition, if the estimated key is major, the difference between the aggregated results of both scales is displayed in "Major" in the figure, and if the key is minor, the difference is displayed in "Minor" in the figure, along with a bar graph displaying the pitch class color corresponding to the estimated key. The estimated key is also displayed in "Estimated Key" in the figure. 【0044】 In this example, referring to the constituent note summary display area 45, it can be estimated that the first half of the song is in G major, modulates to D major in the second half, and then returns to G major at the end, making the overall key of the song G major. The modulations can be confirmed by the fact that the combination of notes used changes chronologically. Furthermore, by referring to the note counting result display area 44, it can be visually confirmed that in the first half and last part of the upper graph, if the position of G is replaced with the position of C on a piano keyboard, notes with the same positional relationship as the white keys on a piano are being used. 【0045】 Furthermore, the section setting unit 3 may divide any section within the music by accepting input of measure numbers at the start and end of the time frame (section). Alternatively, it may divide any section within the music by accepting input of time instead of measure numbers. 【0046】 In this embodiment, in the interval designation area 43 shown in Figure 5, any measure number can be specified, and the key can be estimated for the time frame containing the measure numbers at the start and end of that measure. To estimate the key for an arbitrary time frame, enter the start and end measure numbers (17 and 23 in Figure 5) into the two text boxes respectively, and click the "K" button. The key for the time frame containing both measures will then be estimated, and the result will be displayed. 【0047】 When a key estimation is performed by specifying an interval, the display processing unit 7 displays the aggregation results and estimation results for only the specified interval on the display screen 41, as shown in the note aggregation result display area 44', the constituent tone aggregation result display area 45', and the estimation result display area 46' in Figure 5. 【0048】 As shown in Figure 4, the display screen 41 is equipped with a mode change area 47, and the user can change the mode of the display screen 41 by operating the mode change area 47. The display screen 41 shown in Figure 4 is in "Key 10" mode, which divides the song into 10% lengths (divided into 10 time slots) to perform key estimation. In this embodiment, however, by operating the mode change area 47, the user can also use "Key 20" mode, which divides the song into 20% lengths (divided into 5 time slots) to perform key estimation, and "Key 25" mode, which divides the song into 25% lengths (divided into 4 time slots) to perform key estimation. 【0049】 Figure 8 shows an example of the display screen 41 in "Key 20" mode when one parallel key and one key can be estimated in each time frame. The display screen 41 in "Key 20" mode has a note counting result display area 44'', a constituent tone counting result display area 45'', and an estimation result display area 46'' for key estimation using five time frames in 20 percent increments. Figure 9 shows an example of the display screen 41 in "Key 25" mode when one parallel key and one key can be estimated in each time frame. The display screen 41 in "Key 25" mode has a note counting result display area 44''', a constituent tone counting result display area 45''', and an estimation result display area 46''' for key estimation using four time frames in 25 percent increments. Figure 10 shows an example of the display screen 41 in "Key 10" mode when one parallel tone and one key could not be estimated even after expanding the length of the time frame (see NO in process S3408 in Figure 6). Figure 11 shows an example of the display screen 41 when the system switches to "Key 20" mode and attempts to estimate the song that was attempted to be estimated on the screen in Figure 10, but one relative key and one key could not be estimated. Figure 12 shows an example of the display screen 41 when the system switches to "Key 25" mode and attempts to estimate the song that was attempted to be estimated in Figures 10 and 11, but one relative key and one key could not be estimated. Figure 13 shows an example of the "Key 10" mode display screen 41 when, for the song being estimated on the screens of Figures 10 and 11, the time frame is expanded (here, the entire song is considered as one time frame), the composition ratio of the first and fifth pitch classes is tallied for all of the multiple parallel keys selected as candidates for the expanded time frame, and the key with the largest tallied result is used as the key for the expanded time frame (here, for the entire song) (see process S3410 in Figure 6). In screen 41 of Figure 10, if you specify 0 to 104 (measure numbers for the entire song) in the interval specification area 43 and click the "K" button, the key is estimated for the entire song as one time frame, and the result is displayed. As a result of the estimation, the second estimation result display area 48 (another aspect of the estimation result display area) on screen 41 in Figure 13 displays the aggregated results of the composition ratio of the first and fifth pitch classes for each of the three parallel keys estimated by the key estimation unit 6 for the expanded time frame: C major scale (Am scale), F major scale (Dm scale), and Bb major scale (Gm scale). The key estimation unit 6 estimates F major, which appeared most frequently (for the longest duration) among these, as the key of the entire song (see Conclusion in Figure 13). 【0050】 In this embodiment, the target keys are major and natural minor, but it is possible to apply this to other keys as well. 【0051】 In this embodiment, key estimation was performed using music information contained in MIDI® data, but key estimation may also be performed using music information contained in electronic music score data files of other formats. Furthermore, music information may be generated from recorded sound, and the music information receiving unit 2 may accept registration of a sound source file, perform signal processing, generate music information that records the notes of the song, and then perform key estimation. Alternatively, music information may be generated from a music score image, and the music information receiving unit 2 may accept registration of a music score image file, perform image analysis, generate music information that records the notes of the song, and then perform key estimation. 【0052】 <Experimental Results> Key estimation was performed on 60 classical music MIDI® data sets, and correct key estimation was achieved for 55 of them. The breakdown of the misestimations was as follows: parallel key 1, parallel tonic 1, dominant key 1, subdominant key 1, and other 1. Overall, the results can be considered generally favorable. 【0053】 <Effects> According to this embodiment, the key of a piece of music can be quantitatively estimated without using training data, along with a basis consistent with music theory. Furthermore, the key can be quantitatively estimated over time and represented in a visually easy-to-understand manner. [Explanation of Symbols] 【0054】 1: Key Estimation Device 2: Song Information Reception Department 3: Section setting section 4: Note Counting Section 5: Candidate-Specific Aggregation Department 6: Key Estimation Unit 7: Display Processing Unit 41:Display screen 42: Song information registration area 43: Specified area 44, 44', 44'', 44''': Note counting result display area 45, 45', 45'', 45'''': Display area for the summation of constituent tones 46, 46', 46'', 46''': Estimated result display area 47: Mode change area
Claims
[Claim 1] The computer processes the musical notes indicated by the song information, and for each candidate key, it aggregates the constituent notes. A parallel tone selection step in which at least one pair of parallel tones with the largest aggregate result is selected, A method for estimating the key of a piece of music, comprising a key estimation step of simply summing the number of occurrences of the first note and the number of occurrences of the fifth note for each key that constitutes the selected pair of parallel keys, and estimating the key with the largest sum among the keys that constitute the pair of parallel keys as the key of the music. [Claim 2] The aforementioned computer divides the music into multiple sections at arbitrary time intervals, In the key estimation step, the key is estimated for each section, and the key of the music is estimated from the estimation results for each section. The method for estimating the key of a piece of music according to claim 1, wherein, in the parallel key selection step, if two or more sets of parallel keys with the largest aggregated results are selected for a certain interval, the parallel key for a certain interval is selected using information from at least one of the intervals before or after the selected set. [Claim 3] The aforementioned computer divides the music into multiple sections at arbitrary time intervals, The method for estimating the key of music according to claim 1, wherein in the key estimation step, the estimation results of the first and last intervals are weighted twice as much as the other intervals to estimate the key for each interval, and the key of the music is estimated from the estimation results of each interval. [Claim 4] The method for estimating the key of a piece of music according to claim 1, wherein the computer, in the parallel key selection step, removes octave information from the musical notes indicated by the musical information and aggregates the constituent notes for each candidate key. [Claim 5] The computer processes the musical notes indicated by the song information, and for each candidate key, it aggregates the constituent notes. Select at least one pair of parallel keys that have the largest aggregate result, For the aforementioned pair of parallel keys, the first and fifth notes of each key are aggregated and the key with the largest single note is estimated as the musical key. A display screen based on the aforementioned song information is shown, The aforementioned display screen comprises a note counting result display area, a constituent note counting result display area, and an estimation result display area. The aforementioned note counting result display area displays the number or proportion of notes for each pitch class included in the music, and also displays a different color for each pitch class. The aforementioned constituent tone summary result display area displays the number or proportion of constituent notes for each key candidate, and also displays the color according to the pitch class that is the tonic of the key. A method for estimating the key of music, wherein the estimation result display area displays the number or proportion of the first and fifth notes of each key for the selected pair of parallel keys, and displays the colors for the first and fifth pitch classes. [Claim 6] The aforementioned computer divides the music into multiple sections of the same time interval, The method for estimating the key of a piece of music according to claim 5, wherein the note counting result display area, the constituent tone counting result display area, and the estimation result display area display the number or frequency of occurrence of notes for each interval, and also display the color. [Claim 7] The aforementioned computer divides the music into multiple sections of the same time interval, A display screen based on the aforementioned song information is shown, The aforementioned display screen includes a note counting result display area and a constituent note counting result display area, The aforementioned note counting result display area displays the number or proportion of notes of each pitch class included in the music for each section, and also displays a different color for each pitch class. The method for estimating the key of music according to claim 1, wherein the constituent tone summary result display area combines the summary results for two parallel keys as a key candidate, displays the number of notes or the proportion of constituent tones for each key candidate and for each interval, and displays the color according to the pitch class of the tonic of the estimated key among the two tonics of the parallel keys. [Claim 8] A music key estimation program that causes a computer to perform the method according to any one of claims 1 to 7. [Claim 9] A candidate aggregation unit processes the constituent notes for each candidate key, based on the musical notes indicated by the song information. A music key estimation device comprising: a key estimation unit that selects at least one pair of parallel keys with the largest aggregate result, and for each key constituting the selected pair of parallel keys, simply sums only the number of occurrences of the first note and the number of occurrences of the fifth note, and estimates the key with the largest sum among the keys constituting the pair of parallel keys as the key of the music. [Claim 10] A candidate aggregation unit processes the constituent notes for each candidate key, based on the musical notes indicated by the song information. A key estimation unit selects at least one pair of parallel keys with the largest aggregated results, aggregates the first and fifth notes of each key for that pair of parallel keys, and estimates the key with the largest single value as the key of the music. It comprises a display processing unit that displays a display screen based on the aforementioned music information, The aforementioned display screen comprises a note counting result display area, a constituent note counting result display area, and an estimation result display area. The aforementioned note counting result display area displays the number or proportion of notes for each pitch class included in the music, and also displays a different color for each pitch class. The aforementioned constituent tone summary result display area displays the number or proportion of constituent notes for each key candidate, and also displays the color according to the pitch class that is the tonic of the key. The estimation result display area displays the number or proportion of the first and fifth notes of each key for the selected pair of parallel keys, and displays the colors for the first and fifth pitch classes, in this music key estimation device.