Music analysis device, music analysis method, and program

The music analysis device enhances beat position detection accuracy by synchronously adding audio waveforms and adjusting beat positions and BPM, addressing the inaccuracies in existing techniques and improving the DJ mixing experience.

WO2026126480A1PCT designated stage Publication Date: 2026-06-18ALPHATHETA CORP

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
ALPHATHETA CORP
Filing Date
2024-12-13
Publication Date
2026-06-18

Smart Images

  • Figure JP2024044226_18062026_PF_FP_ABST
    Figure JP2024044226_18062026_PF_FP_ABST
Patent Text Reader

Abstract

Provided is a music analysis device that comprises a processor that executes processing that synchronously adds audio waveforms for a plurality of sections that include provisional beat positions that have been set within a fixed-BPM section of music such that the provisional beat positions are in order, processing that identifies correction amounts for the provisional beat positions from the time difference between the provisional beat positions and peak positions for a waveform obtained by the synchronous addition, and processing that identifies beat positions obtained by adding the correction amounts to the provisional beat positions.
Need to check novelty before this filing date? Find Prior Art

Description

Music analysis device, music analysis method, and program 【0001】 The present invention relates to a music analysis device, a music analysis method, and a program. 【0002】 Techniques for detecting beat positions and BPM (Beats Per Minute) from an audio signal of a music piece are known. For example, Patent Document 1 describes a technique for analyzing an audio signal to detect a beat position and the probability of existence of each musical instrument sound. Patent Document 2 describes a technique for obtaining the beat position of music data and the pronunciation position of a snare drum, and correcting the BPM value of the music data when it is determined that there is a deviation between the pronunciation position of the snare drum and the beat position and the pronunciation interval of the snare drum is one beat. 【0003】 Japanese Patent Application Laid-Open No. 2010-134231 International Publication No. 2019 / 053765 【0004】 The beat positions and BPM specified by the above-described techniques may be accurate enough for general purposes. On the other hand, for example, in the case of mixing music in a DJ play, a slight deviation in the beat position of each music piece may be felt as a sense of discomfort in the auditory sense, and an improvement in the accuracy of beat position detection is desired. 【0005】 Therefore, an object of the present invention is to provide a music analysis device, a music analysis method, and a program capable of further improving the accuracy of beat position detection of a music piece. 【0006】[1] A music analysis device comprising a processor that performs the following processes: a process of synchronously adding together the audio waveforms of multiple sections including a temporary beat position set in a fixed BPM section of a song, aligning the temporary beat position; a process of identifying a correction amount for the temporary beat position from the time difference between the peak position of the waveform obtained by the synchronous addition and the temporary beat position; and a process of identifying a beat position by adding the correction amount to the temporary beat position. [2] The music analysis device according to [1], wherein in the synchronous addition process, the multiple sections are synchronously added together for each beat count in the time signature of the song at the temporary beat position, and in the process of identifying the correction amount, the correction amount is selectively identified from the time difference between the peak position of the waveform obtained by the synchronous addition for each beat count and the temporary beat position. [3] The music analysis device according to [2], wherein in the process of identifying the correction amount, the correction amount is identified from the maximum value in the time difference for each beat count. [4] A music analysis device comprising a processor that performs the following steps: adding together the audio waveforms of multiple sections including a temporary beat position set according to a temporary BPM in a fixed BPM section of a song, aligning the temporary beat positions; updating the temporary BPM and re-executing the synchronous addition; identifying the temporary BPM at which the peak level of the waveform obtained by the synchronous addition is maximized as the BPM of the fixed BPM section; adding together the audio waveforms of multiple sections including the temporary beat position reset according to the BPM in the fixed BPM section, aligning the temporary beat positions; identifying a correction amount for the temporary beat position from the time difference between the peak position of the waveform obtained by the synchronous addition and the temporary beat position; and identifying a beat position by adding the correction amount to the temporary beat position. [5] The music analysis device according to [4], wherein if the peak level exceeds a threshold, the temporary BPM is not updated, and the temporary BPM at that time is identified as the BPM. [6] A music analysis method in which a processor performs the following steps: synchronously adding together audio waveforms of multiple sections including a temporary beat position set in a fixed BPM section of a song, aligning the temporary beat position; identifying a correction amount for the temporary beat position from the time difference between the peak position of the waveform obtained by the synchronous addition and the temporary beat position; and identifying a beat position by adding the correction amount to the temporary beat position.[7] A program for causing a computer processor to perform the following processes: synchronously adding together audio waveforms of multiple sections including a temporary beat position set in a fixed BPM section of a song, aligning the temporary beat position; identifying a correction amount for the temporary beat position from the time difference between the peak position of the waveform obtained by the synchronous addition and the temporary beat position; and identifying a beat position by adding the correction amount to the temporary beat position. 【0007】 This figure shows an example of the overall configuration of a system according to an embodiment of the present invention. This is a block diagram showing the schematic functional configuration of the music analysis device in the example of Figure 1. This is a conceptual diagram illustrating the process of determining the correction amount of the beat position by synchronously adding multiple sections including the false beat position. This is a conceptual diagram illustrating the corrected beat position in the example of Figure 3. This is a conceptual diagram illustrating the process of synchronously adding multiple sections including the false beat position for each beat count. This is a flowchart illustrating an example of the process for determining the beat position in the music analysis device. This is a conceptual diagram illustrating the process of determining the BPM by synchronously adding multiple sections including the false beat position. This is a diagram illustrating the process of determining the BPM by synchronously adding multiple sections including the false beat position in the example of Figure 8. This is a flowchart illustrating the first example of the process for determining the BPM in the music analysis device. This is a flowchart illustrating the second example of the process for determining the BPM in the music analysis device. 【0008】Figure 1 shows an example of the overall configuration of a system according to an embodiment of the present invention. The system 10 according to this embodiment includes a PC (Personal Computer) 100, a DJ controller 200, and a speaker 300. The PC 100 is a device that stores, processes, and plays back audio data, and is not limited to a PC; it may also be a terminal device such as a tablet or smartphone. The PC 100 includes a display 101 that displays information to the user, and an input device such as a touch panel or mouse that acquires user operation input. The DJ controller 200 is connected to the PC 100 via a communication means such as USB (Universal Serial Bus), and acquires user operation input related to music playback via channel faders, crossfaders, performance pads, jog dials, and various knobs and buttons. Audio data is played back using, for example, the speaker 300. 【0009】 In this embodiment, the PC 100 functions as a music analysis device in the system 10 described above. For example, the PC 100 performs processing on the stored audio data in response to user input during playback of the audio data. Alternatively, the PC 100 may perform processing on the audio data before playback and save the processed audio data. In this case, the DJ controller 200 and speakers 300 do not need to be connected to the PC 100 at the time the processing is performed. In this embodiment, the PC 100 functions as a music analysis device, but in other embodiments, DJ equipment such as a mixer or an all-in-one DJ system (digital audio player with communication and mixing functions) may function as a music analysis device. Furthermore, a server connected to the PC and DJ equipment via a network may function as a music analysis device. 【0010】Figure 2 is a block diagram showing the schematic functional configuration of the music analysis device in the example shown in Figure 1. The PC 100, which functions as a music analysis device, is a computer equipped with a processor such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor). The processor performs synchronous addition processing 120, correction amount identification processing 130, and beat position identification processing 140 by operating according to a program. The processor may further perform peak level determination processing 160 and BPM identification processing 170. The program is read from the storage of the PC 100 or from a tangible and non-temporary medium such as a removable recording medium, or downloaded from a server via a network and loaded into the memory of the PC 100. 【0011】 The synchronous addition process 120 synchronously adds the audio waveforms of multiple sections of a song. The input song audio data 110 is time-series data showing the audio waveform of the song, and the provisional beat positions have already been identified by a device other than the PC 100, or by another process performed by the PC 100. These are called provisional beat positions because they may be corrected in subsequent processing, but the method of identification is the same as for normal beat positions. Various known methods can be used to analyze the song audio data and identify the beat positions. Machine learning may also be used. The synchronous addition process 120 may include preprocessing for synchronously adding the waveforms. Preprocessing may include, for example, conversion from stereo to mono, or filtering to extract frequency bands of percussion sounds that are highly correlated with beat positions. 【0012】As shown in Figure 3, the synchronous addition process 120 synchronously adds together the audio waveforms of multiple sections S, including the temporary beat position set in the fixed BPM section of the song, by aligning the temporary beat position TB. Here, the fixed BPM section of the song is a section where the BPM is constant and does not change, which is commonly found in songs produced by programming, such as EDM (Electronic Dance Music). The entire song may be a fixed BPM section, and such a song is also called a fixed BPM song. The sections to be synchronously added together may be extracted from, for example, a part or the whole of a fixed BPM song, or they may be extracted from a fixed BPM section of a song that is not a fixed BPM song but contains one or more fixed BPM sections. 【0013】 The length ΔT of the section S is set to include the attack waveform that appears before and after the false beat position TB. Therefore, if the false beat position TB is the correct beat position, the peak position where the attack waveform is superimposed in the waveform obtained by synchronous summation of section S coincides with the false beat position TB. In the illustrated example, the false beat position TB is shifted from the correct beat position, so there is a time difference d between the false beat position TB and the peak position in the waveform obtained by synchronous summation. In a section of music with a fixed BPM, if the BPM is correct, the difference between the false beat position TB and the attack waveform is the same for all beats, so moving each false beat position TB by the same correction amount corresponding to the time difference d will result in the correct beat position as shown in Figure 4. 【0014】 Referring again to Figure 2, the correction amount identification process 130 identifies the time difference d between the peak position of the waveform obtained by synchronous addition by the synchronous addition process 120 and the temporary beat position TB as the correction amount for the temporary beat position TB. Specifically, -d becomes the correction amount for the time difference d between the peak position and the temporary beat position TB. In other words, a negative correction amount is identified if the time difference d is a positive value, and a positive correction amount is identified if the time difference d is a negative value. The beat position identification process 140 identifies the beat position by adding the correction amount identified by the correction amount identification process 130 to the temporary beat position TB. The beat position identification process 140 outputs the identified beat position data 150. The beat position data 150 is used, for example, to update the beat position associated as metadata with the music audio data 110. 【0015】 Figure 5 illustrates an example of synchronously adding multiple sections, including a temporary beat position, for each beat count in the time signature. In the synchronous addition process 120, multiple sections, including the temporary beat position TB, may be synchronously added for each beat count in the time signature of the music at the temporary beat position TB. In this specification, the beat count represents the position of each beat within a measure. In the case of 4 / 4 time, the beat counts from the first beat of the measure are 1, 2, 3, 4, and the beat following the beat with beat count 4, i.e., the first beat count of the next measure, is 1. In this embodiment, the music is, for example, in 4 / 4 time and the beat count is one of {1, 2, 3, 4}, but the example is not limited to this. In order to identify the beat count of the temporary beat position TB, for example, the measure position can be identified using various known methods for the music in the music audio data 110. The example is not limited to this; even if the measure position of the music is not identified, for example, if the time signature and the first beat count of the music are identified, the beat count of each temporary beat position can be identified. 【0016】 When performing synchronous addition for each beat count, multiple time differences d are calculated between the peak position of the waveform obtained by the synchronous addition for each beat count and the false beat position TB. For example, in the case of 4 / 4 time, time differences d1 is calculated by the synchronous addition of interval S where the beat count is 1, time difference d2 is calculated by the synchronous addition of interval S where the beat count is 2, time difference d3 is calculated by the synchronous addition of interval S where the beat count is 3, and time difference d4 is calculated by the synchronous addition of interval S where the beat count is 4. In such a case, the correction amount identification process 130 may selectively identify the correction amount from these time differences d1 to d4. Selectively identifying the correction amount means that instead of treating all time differences d1 to d4 equally, the correction amount is identified based on one of the values ​​selected from the time differences d1 to d4, such as the maximum value, minimum value, or mode. 【0017】Specifically, the correction amount identification process 130 may identify the correction amount from the maximum value of the time differences d1 to d4. The maximum value here includes both positive and negative values, and if there are positive and negative values ​​among the time differences d1 to d4, the maximum value is the one with the largest absolute value among the positive values. For example, if (d1, d2, d3, d4) = (1, -1, 1, 1), the maximum value is 1. In actual musical compositions, the use of instruments differs for each beat count, and some instruments are played intentionally out of sync with the beat position. Such instruments are often played ahead of the beat position. In the example shown in Figure 5, there is an instrument played ahead of the beat position when the beat count is 2, and as a result, the provisional beat position TB, which is calculated on average for all beats without considering the beat count, is ahead of the correct beat position. In this case, the time differences d1, d3, and d4 calculated by synchronous addition when the beat count is 1, 3, and 4 are positive values, and the time difference d2 calculated by synchronous addition when the beat count is 2 is a negative value. Therefore, the maximum value of the time differences d1 to d4 is one of time differences d1, d3, or d4. By identifying the correction amount from this maximum value, the correct beat position can be determined. Alternatively, the correction amount can be determined from the average value of the positive values ​​among the time differences d1 to d4 (time differences d1, d3, and d4 in the above example). 【0018】 Another example of selectively identifying the correction amount from time differences d1 to d4 is that the correction amount identification process 130 may identify the correction amount from the mode of time differences d1 to d4. As described above, if there are instruments that are played out of sync with the beat position on some beat counts, the correct beat position may also be identified by identifying the correction amount from the mode of time differences d1 to d4 (time differences d1, d3, d4 in the example of Figure 5). In addition, depending on the piece of music, temporary rhythmic changes such as syncopation, anticipation, or delayed attack may be added on certain beat counts, causing the attack waveform to shift from the correct beat position. In such cases, the correct beat position may also be identified by identifying the correction amount from the mode of time differences d1 to d4. 【0019】Figure 6 is a flowchart illustrating an example of the process for determining beat positions in a music analysis device. As described above, the temporary beat positions of the music audio data 110 are determined by a device other than the PC 100, or by another process performed by the PC 100 (step S11). The synchronous addition process 120 synchronously adds the audio waveforms of multiple sections S that include the temporary beat position TB (step S12). As shown in Figure 3, the section S is a section that includes the temporary beat position TB, and the temporary beat positions TB are aligned and synchronously added. From the time difference d calculated by the synchronous addition, the correction amount determination process 130 determines the correction amount for the temporary beat position TB (step S13), and the beat position determination process 140 determines the beat position by adding the correction amount to the temporary beat position TB (step S14). 【0020】 Referring again to Figure 2, we will further explain the case where the processor of the PC 100, which functions as a music analysis device, performs processing to determine the BPM of the music. The processing to determine the BPM is performed before the processing to correct the beat position described above. Since the processing to correct the beat position assumes that the BPM has been correctly determined in the fixed BPM section of the music, it is also possible to perform the processing to correct the beat position after more accurately determining the BPM of the music by processing as described below. Note that if the BPM has been determined with sufficient accuracy by various known methods, for example, the following processing does not necessarily have to be performed. 【0021】 The synchronous addition process 120 described above is also executed in the process for determining the BPM of the music. In this case, it is assumed that the provisional beat position and provisional BPM of the music have already been determined by a device other than PC 100, or by another process executed by PC 100. The method for determining the provisional BPM is the same as the method for determining the normal BPM, and various known methods can be used. Machine learning may also be used. 【0022】As shown in Figure 7, the synchronous addition process 120 synchronously adds together the audio waveforms of multiple sections S, which include a temporary beat position set according to the temporary BPM in the fixed BPM section of the music, by aligning the temporary beat position TB. If the temporary BPM matches the correct BPM, a high peak appears in the waveform obtained by the synchronous addition of sections S, where the attack waveform is added. In the illustrated example, the temporary beat position TB is shifted from the attack waveform, but in the fixed BPM section, if the BPM is correct, the time difference d between the temporary beat position TB and the attack waveform is the same in each section S, so a high peak level P1 appears in the waveform obtained by synchronous addition with aligned temporary beat positions TB. 【0023】 In contrast, as shown in Figure 8, if the provisional BPM is different from the correct BPM, even if the provisional beat position TB is aligned and the section S is synchronously added, no high peaks will appear. When the provisional BPM is incorrect, the difference between the provisional beat position TB and the attack waveform is different in each section S, so the waveform obtained by synchronous addition becomes a superposition of peaks with different positions relative to the provisional beat position TB, and no high peaks will appear. As shown in Figure 9, in a fixed BPM section, if the provisional BPM is incorrect, one provisional beat position TB n Even if it matches the attack waveform, the subsequent false beat position TB n+1 , TB n+2 Because the interval INT up to ... is different from the interval of the attack waveform, a discrepancy occurs between the attack waveform and other false beat positions, and as a result, the false beat positions are also incorrect. 【0024】 As shown in the example in Figure 8 above, if the provisional BPM differs from the correct BPM, it may be possible to bring the provisional BPM closer to the correct BPM by updating the provisional BPM and re-running the synchronous addition. For example, the provisional BPM may be updated by adding or subtracting a correction value of an appropriate magnitude within a predetermined range, and the synchronous addition may be re-run. As a non-limiting example, a more appropriate BPM may be searched for by increasing or decreasing the initial provisional BPM value by 0.001 increments within a range of ±0.05 and re-running the synchronous addition. When the provisional BPM is updated, the provisional beat position TB is also updated, and the interval S for synchronous addition is reset. 【0025】Referring again to Figure 2, the peak level determination process 160 determines whether the peak level of the waveform obtained by synchronous addition by the synchronous addition process 120 is the maximum, or whether the peak level exceeds a threshold. For example, the peak level determination process 160 may compare the peak levels of the synchronously added waveforms with a series of provisional BPMs updated by adding or subtracting correction values ​​within a predetermined range as described above, and identify the provisional BPM from which the largest peak level was obtained. Alternatively, the peak level determination process 160 may monitor the change in the peak level of the synchronously added waveform with respect to the updated provisional BPMs and identify the provisional BPM from which a maximum value of the peak level was observed. In this case, the provisional BPMs do not necessarily have to be updated over the entire predetermined range. 【0026】 Furthermore, the peak level determination process 160 may determine whether or not the peak level exceeds a threshold. For example, if the peak level of the synchronously added waveform relative to the provisional BPM exceeds the threshold, the peak level determination process 160 may control the BPM identification process 170 to not update the provisional BPM any further and to identify the provisional BPM at that point as the BPM for the fixed BPM section. In this case, as in the example in Figure 7 above, if the initial provisional BPM was correct, the peak level exceeds the threshold, so the provisional BPM is not updated and the synchronous addition is not re-executed. 【0027】 The BPM identification process 170 identifies the BPM of the fixed BPM section of the music according to the determination result of the peak level determination process 160. Basically, the BPM identification process 170 identifies the temporary BPM at which the peak level of the waveform is maximized in the synchronous summation performed while updating the temporary BPM as the BPM of the fixed BPM section. As described above, the process of updating the temporary BPM and re-executing the synchronous summation may be terminated if the peak level exceeds a threshold or if the peak level shows a local maximum. This minimizes the amount of processing and allows for efficient searching of an appropriate BPM. The BPM identification process 170 outputs temporary beat position data 180 updated based on the identified BPM. The temporary beat position data 180 is used in the process for correcting the beat position as described above. In addition, the BPM associated with the music audio data 110 as metadata may be updated with the identified BPM. 【0028】 The provisional beat position, based on the more accurate BPM identified by the synchronous addition process 120, peak level determination process 160, and BPM identification process 170 described above, can be corrected by the correction amount identification process 130 and beat position identification process 140 to determine a more accurate beat position. In this case, the result of the synchronous addition calculated using a provisional BPM where the waveform peak level is maximum or exceeds the peak level threshold in the process for identifying the BPM may be the same as the result of the synchronous addition used in the process for correcting the beat position. Therefore, if possible, the computational load can be reduced by using the result of the synchronous addition calculated in the process for identifying the BPM in the process for correcting the beat position. 【0029】 Figure 10 is a flowchart showing a first example of the process for determining BPM in a music analysis device. As described above, the provisional beat position and provisional BPM of the music audio data 110 are determined by a device other than the PC 100, or by another process performed by the PC 100 (step S101). The synchronous addition process 120 synchronously adds the audio waveforms of multiple sections S that include the provisional beat position TB (step S102). As shown in Figures 7 and 8, the section S is a section that includes a provisional beat position TB set according to the provisional BPM at that time, and the provisional beat positions TB are aligned and synchronously added. If the range of correction values ​​for the provisional BPM is not covered (NO in step S103), the provisional BPM is updated by adding or subtracting the correction value (step S104), and the synchronous addition is re-executed (step S102). When the predetermined range for changing the provisional BPM by adding or subtracting the correction value is covered (YES in step S103), the re-execution of the synchronous addition ends, and the peak level determination process 160 detects the provisional BPM at which the peak level is maximized (step S105). The BPM identification process 170 identifies the provisional BPM at which the peak level is maximized as the BPM for the fixed BPM section (step S106). 【0030】Figure 11 is a flowchart showing a second example of the process for determining BPM in a music analysis device. The difference from the example in Figure 10 is that in this example, the peak level determination process 160 determines whether the peak level obtained by synchronous addition exceeds a threshold. Specifically, when multiple sections S including the temporary beat position TB are synchronously added (step S102), it is determined whether the peak level exceeds a threshold (step S107). If the peak level exceeds the threshold, the temporary BPM is not updated, and the temporary BPM at that point is identified as the BPM of the fixed BPM section (step S108). If the peak level does not exceed the threshold, the temporary BPM is updated (step S103), and the synchronous addition is re-executed (step S102). If there is no temporary BPM with a peak level exceeding the threshold, the temporary BPM with the maximum peak level is detected (step S105), similar to the example in Figure 9, and identified as the BPM of the fixed BPM section (step S106). 【0031】According to the embodiments of the present invention described above, the beat position can be more accurately identified for a BPM-fixed song by performing the following processes: synchronously adding together the audio waveforms of multiple sections including a set temporary beat position in the BPM-fixed section of the song, aligning the temporary beat position; identifying the correction amount for the temporary beat position from the time difference between the peak position of the waveform obtained by synchronous addition and the temporary beat position; and identifying the beat position by adding the correction amount to the temporary beat position. Alternatively, the multiple sections may be synchronously added together for each beat count in the time signature of the song at the temporary beat position, and the correction amount may be selectively identified from the time difference between the peak position of the waveform obtained by synchronous addition for each beat count and the temporary beat position. More specifically, the correction amount may be identified from the maximum value in the time difference for each beat count. In this case, for example, even if the use of instruments differs for each beat count, and some instruments are played intentionally out of sync with the beat position, the correct beat position can still be identified. Before determining the beat position, the following steps may be performed: Synchronously adding together the audio waveforms of multiple sections containing a provisional beat position set according to a provisional BPM within the fixed BPM section of the music, aligning the provisional beat positions; updating the provisional BPM and re-executing the synchronous addition; and identifying the provisional BPM that maximizes the peak level of the waveform obtained by the synchronous addition as the BPM of the fixed BPM section. In this case, because the provisional beat position is set according to a more accurate BPM, the correction amount for the provisional beat position can be determined more accurately, thereby determining the beat position. Furthermore, if the result of the synchronous addition calculated in the process for determining the BPM can also be used in the process for correcting the beat position, the computational load can be reduced. In the above case, if the peak level of the synchronously added waveform exceeds a threshold, the provisional BPM may not be updated, and the provisional BPM at that point may be identified as the BPM. In this case, the amount of processing can be minimized, and the appropriate BPM can be efficiently searched for. 【0032】 10...System, 101...Display, 110...Music audio data, 120...Synchronization addition process, 130...Correction amount determination process, 140...Beat position determination process, 150...Beat position data, 160...Peak level determination process, 170...BPM determination process, 180...Temporary beat position data, 200...DJ controller, 300...Speaker.

Claims

1. A music analysis device comprising a processor that performs the following processes:

1. Synchronous addition of audio waveforms of multiple sections including a set temporary beat position in a fixed BPM section of a song, aligning the temporary beat position; 2. Identifying a correction amount for the temporary beat position from the time difference between the peak position of the waveform obtained by the synchronous addition and the temporary beat position; and 3. Identifying a beat position by adding the correction amount to the temporary beat position.

2. The music analysis apparatus according to claim 1, wherein in the synchronous addition process, the plurality of sections are synchronously added together for each beat count in the time signature of the music at the temporary beat position, and in the process of determining the correction amount, the correction amount is selectively determined from the time difference between the peak position of the waveform obtained by the synchronous addition for each beat count and the temporary beat position.

3. The music analysis apparatus according to claim 2, wherein the process for determining the correction amount determines the correction amount from the maximum value in the time difference for each beat count.

4. A music analysis device comprising a processor that performs the following processes: adding together audio waveforms of multiple sections including a temporary beat position set according to a temporary BPM in a fixed BPM section of a song, aligning the temporary beat positions; updating the temporary BPM and re-executing the synchronous addition; identifying the temporary BPM at which the peak level of the waveform obtained by the synchronous addition is maximized as the BPM of the fixed BPM section; adding together audio waveforms of multiple sections including the temporary beat position reset according to the BPM in the fixed BPM section, aligning the temporary beat positions; identifying a correction amount for the temporary beat position from the time difference between the peak position of the waveform obtained by the synchronous addition and the temporary beat position; and identifying a beat position by adding the correction amount to the temporary beat position.

5. The music analysis device according to claim 4, wherein if the peak level exceeds a threshold, the provisional BPM is not updated, and the provisional BPM at that time is identified as the BPM.

6. A music analysis method in which a processor performs the following steps: 1) synchronously adding together audio waveforms of multiple sections including a set temporary beat position in a fixed BPM section of a song, aligning the temporary beat position; 2) identifying a correction amount for the temporary beat position from the time difference between the peak position of the waveform obtained by the synchronous addition and the temporary beat position; and 3) identifying a beat position obtained by adding the correction amount to the temporary beat position.

7. A program for causing a computer processor to perform the following processes: 1) synchronously adding together audio waveforms of multiple sections, including a temporary beat position set in a fixed BPM section of a song, while aligning the temporary beat position; 2) identifying a correction amount for the temporary beat position from the time difference between the peak position of the waveform obtained by the synchronous addition and the temporary beat position; and 3) identifying a beat position by adding the correction amount to the temporary beat position.