Information processing program, information processing device, and information processing method
The information processing program synchronizes time-series data by extracting noise-free intervals and adjusting time shifts, addressing synchronization challenges due to noise and waveform differences.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- KK TOSHIBA
- Filing Date
- 2024-12-02
- Publication Date
- 2026-06-12
AI Technical Summary
Existing techniques struggle to synchronize multiple time-series data with high accuracy, especially when noise is present and waveforms differ significantly.
An information processing program that extracts noise-free intervals in time-series data based on reference interval information, calculates a second difference, and shifts the data to synchronize it with high precision.
Synchronizes multiple time-series data with high accuracy by identifying noise-free intervals and adjusting the time shift, overcoming synchronization challenges due to noise and waveform differences.
Smart Images

Figure 2026095990000001_ABST
Abstract
Description
【Technical Field】 【0001】 Embodiments of the present invention relate to an information processing program, an information processing apparatus, and an information processing method. 【Background Art】 【0002】 Techniques for synchronizing the times of a plurality of time-series data sensed in the same environment have been disclosed. For example, a technique is disclosed in which one of the media signals is shifted by a coarse time shift based on the approximate mismatch between two media signals, a matching time shift between the other media signal and the shifted one of the media signals is obtained, and the shifted one of the media signals is further shifted. 【0003】 However, in the prior art, when noise is included in at least one of the two media signals and the waveforms are extremely different, it may be difficult to synchronize the two media signals with high accuracy. That is, in the prior art, it may be difficult to synchronize a plurality of time-series data such as media signals with high accuracy. 【Prior Art Documents】 【Patent Documents】 【0004】 【Patent Document 1】 Japanese Patent Application Laid-Open No. 2013-84334 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0005】 The problem to be solved by the present invention is to provide an information processing program, an information processing apparatus, and an information processing method that can synchronize a plurality of time-series data with high accuracy. 【Means for Solving the Problems】 【0006】 The information processing program of the embodiment is an information processing program that causes a computer to execute the following steps: an effective interval extraction step, which extracts a section in the second time series data in which the scene sections are arranged in the order represented by the reference interval information, based on reference interval information that includes a reference time which is the start time of the scene section and reference signal information representing the order of the scene sections, for each of a plurality of scene sections included in the first time series data and the second time series data, respectively, and identifies a second estimated time estimated based on the reference interval information for each of the plurality of scene sections included in the effective interval as an effective interval time; a second difference calculation step, which calculates a second difference between the effective interval time of the scene section included in the effective interval of the second time series data and the reference time of the corresponding scene section represented by the reference interval information; and a second shift step, which generates a third time series data obtained by shifting the second time series data by a time corresponding to the second difference. [Brief explanation of the drawing] 【0007】 [Figure 1] A schematic diagram of an information processing device according to an embodiment. [Figure 2] A diagram illustrating the data structure of time-series data. [Figure 3] Diagram illustrating the extraction of valid intervals. [Figure 4] An explanatory diagram for estimating the second estimated time using speech recognition. [Figure 5] A flowchart showing the flow of information processing performed by the information processing device of the embodiment. [Figure 6A] A flowchart showing the flow of information processing performed by the information processing device of the embodiment. [Figure 6B] A flowchart showing the flow of information processing performed by the information processing device of the embodiment. [Figure 7] A schematic diagram of an information processing device according to an embodiment. [Figure 8] A schematic diagram of an information processing device according to an embodiment. [Figure 9]A flowchart showing the flow of information processing performed by the information processing device of the embodiment. [Figure 10A] A flowchart showing the flow of information processing performed by the information processing device of the embodiment. [Figure 10B] A flowchart showing the flow of information processing performed by the information processing device of the embodiment. [Figure 11] A schematic diagram of an information processing device according to an embodiment. [Figure 12] A flowchart showing the flow of information processing performed by the information processing device of the embodiment. [Figure 13] Hardware configuration diagram of the information processing device according to the embodiment. [Modes for carrying out the invention] 【0008】 The information processing program, information processing apparatus, and information processing method of this embodiment will be described in detail below with reference to the attached drawings. 【0009】 In the following embodiments, the same reference numerals are used for the same functions, and detailed explanations are omitted. 【0010】 (First embodiment) Figure 1 is a schematic diagram of an example of the information processing device 10A of this embodiment. 【0011】 Information processing device 10A is an example of information processing device 10. Information processing device 10 is composed of one or more dedicated or general-purpose computers. 【0012】 The information processing device 10 performs a process to synchronize the timestamps of multiple time-series data. 【0013】 Time-series data is data that is continuous in time series. Time-series data is, for example, data obtained by sensing an object included in real space over time using sensors such as microphones and imaging devices. The object may be any of various sensorable elements included in real space. Specifically, time-series data is, for example, audio data, moving image data, and the like. 【0014】 Multiple time-series data is data obtained by sensing the same environment over time using different sensors at the same timing. Here, it should be noted that it is technically difficult to completely synchronize (match) the sensing start time of each sensor and the time of the built-in clock of the sensor. In the present embodiment and other embodiments, a form in which the information processing apparatus 10 executes a process of synchronizing the times of two time-series data, namely, first time-series data and second time-series data, as the multiple time-series data will be described. 【0015】 The information processing apparatus 10A includes a storage unit 12, a UI (user interface) unit 14, a communication unit 16, and a control unit 20. The storage unit 12, the UI unit 14, the communication unit 16, and the control unit 20 are communicably connected via a bus 18 or the like. 【0016】 The storage unit 12 stores various data. The storage unit 12 may be provided outside the information processing apparatus 10A. Also, at least one of one or a plurality of functional units included in the storage unit 12 and the control unit 20 described later may be configured to be mounted on an external information processing apparatus communicably connected to the information processing apparatus 10 via a network or the like. 【0017】 The UI unit 14 has a display function for displaying various information and an input function for receiving operation inputs by the user. The display function is, for example, a display, a projection device, or the like. The input function is, for example, a pointing device such as a mouse and a touch pad, a keyboard, or the like. It may also be a touch panel in which the display function and the input function are integrally configured. 【0018】 The UI unit 14 can be configured to be connected to the control unit 20 via wired or wireless means in a manner that allows communication. The UI unit 14 may be located outside the information processing device 10A, and the UI unit 14 and the control unit 20 may be connected via a network or the like. 【0019】 The communication unit 16 is a communication interface for communicating with external information processing equipment, etc., of the information processing device 10A. 【0020】 The control unit 20 performs various processes in the information processing device 10A. The control unit 20 includes an acquisition unit 20A, an effective interval extraction unit 20B, a second difference calculation unit 20C, a second shift unit 20D, and an output unit 20E. The acquisition unit 20A, the effective interval extraction unit 20B, the second difference calculation unit 20C, the second shift unit 20D, and the output unit 20E are implemented by, for example, one or more processors. For example, each of the above units may be implemented by having a processor such as a CPU (Central Processing Unit) execute a program, i.e., by software. Each of the above units may also be implemented by a processor such as a dedicated IC (Integrated Circuit) or circuit, i.e., by hardware. Each of the above units may also be implemented by using a combination of software and hardware. When multiple processors are used, each processor may implement one of the above units, or two or more of the above units. 【0021】 The acquisition unit 20A acquires the first time series data, the second time series data, and reference interval information. The acquisition unit 20A acquires this information from an external information processing device or the like via the communication unit 16. Alternatively, the acquisition unit 20A may acquire this information by reading it from the storage unit 12. 【0022】 The first time series data and the second time series data are two time series data that are synchronized. When referring to the first time series data and the second time series data collectively, they are simply referred to as time series data. In this embodiment, the explanation assumes that these time series data are in the form of audio data. 【0023】 Figure 2 is an explanatory diagram of an example of the data structure of time series data 30. The first time series data 30A and the second time series data 30B, which are examples of time series data 30, have the same data structure as the time series data 30 shown in Figure 2. 【0024】 The time-series data 30 contains multiple scene intervals. A scene interval represents a series of scenes or situations that occur consecutively in time. Figure 2 shows, as an example, the scene intervals for "Scene 5" and "Scene 6". It goes without saying that the time-series data 30 contains even more scene intervals. 【0025】 Estimated signal information 32 is included immediately before, or immediately before and after, each scene interval in the time-series data 30. 【0026】 The estimated signal information 32 is information that represents at least the order of scene sections that are consecutive (adjacent) to the estimated signal information 32 in time series. The order of scene sections represents the arrangement order of multiple scene sections included in the time series data 30 along with the time series. The estimated signal information 32 may also further include information that represents the content or overview of scene sections that are consecutive (adjacent) to the estimated signal information 32 in time series. Figure 2 illustrates an example configuration in which estimated signal information 32 representing the start signal of each scene section is placed immediately before each scene section, and estimated signal information 32 representing the end signal of each scene section is placed immediately after each scene section. Note that the estimated signal information 32 may be at least one of the estimated signal information 32 representing the start signal of each scene section and the estimated signal information 32 representing the end signal of each scene section. 【0027】 Specifically, for example, on the set of a drama or movie, immediately before each scene segment, a user or other person speaks an audio signal indicating the start of filming for a particular scene segment, and the estimated cue information 32 is picked up by a sensor such as a microphone. Furthermore, after the speaking of the estimated cue information 32, filming of the scene segment is performed, and when filming of the scene segment ends, an audio signal indicating the end of filming for that scene segment is spoken, and the estimated cue information 32 is picked up by a sensor such as a microphone. By repeating these routines, time-series data 30 is obtained in which the estimated cue information 32 is placed before and after each scene segment. 【0028】 Furthermore, the start time (see "beginning" of (start signal) in Figure 2) or end time (see "end" of (end signal) in Figure 2) of the estimated signal information 32, which represents the start signal of a scene section, is treated as the estimated reference time 34, which represents the time in the time series data 30 of the estimated signal information 32. 【0029】 Returning to Figure 1, let's continue the explanation. As mentioned above, the acquisition unit 20A further acquires reference interval information. 【0030】 The reference interval information includes, for each of the multiple scene intervals contained in the first time series data 30A and the second time series data 30B, the reference time which is the start time of the scene interval and the reference signal information which represents the order of the scene intervals. 【0031】 Reference signal information refers to information that represents at least the correct order of scene intervals included in the time series data 30. The correct order of scene intervals refers to information that represents the correct order of each of the multiple scene intervals included in the time series data 30. In other words, reference signal information is information that represents the correct order of scene intervals relative to the estimated signal information 32 included in the time series data 30. Furthermore, the reference signal information may also include information that represents the content or overview of the scene intervals, similar to the estimated signal information 32. 【0032】 The reference time is information that represents the start time of a scene section. More specifically, the reference time is information that represents the correct start time of a scene section. The start time of a scene section may be either the start time of the reference signal information that represents the start signal of the scene section (see "Beginning" in Figure 2), or the end time of the reference signal information that represents the start signal of the scene section (see "End" in Figure 2), i.e., the start time of the scene section). 【0033】 In other words, the acquisition unit 20A acquires reference interval information (reference signal information, reference time) for multiple scene intervals, which is information representing the correct estimated signal information 32 and estimated reference time 34 for each of the multiple scene intervals, in addition to the estimated signal information 32 and estimated reference time 34 contained in the first time series data 30A and the second time series data 30B. 【0034】 In this embodiment, the estimated reference time 34 and estimated signal information 32 for each of the multiple scene intervals included in the first time series data 30A will be described assuming that they match the respective reference time and reference signal information for each of the multiple scene intervals. 【0035】 The acquisition unit 20A outputs the reference interval information and the second time series data 30B to the valid interval extraction unit 20B. The acquisition unit 20A also outputs the reference interval information to the second difference calculation unit 20C. The acquisition unit 20A also outputs the second time series data 30B to the second shift unit 20D. The acquisition unit 20A also outputs the first time series data and the reference interval information to the output unit 20E. 【0036】 The valid interval extraction unit 20B extracts, based on the reference interval information, the intervals in which the scene intervals are arranged in the order represented by the reference interval information contained in the second time series data 30B, as noise-free valid intervals. The valid interval extraction unit 20B then identifies the second estimated time, estimated based on the reference interval information, for each of the multiple scene intervals contained in the extracted valid interval, as the valid interval time. 【0037】 First, the valid interval extraction unit 20B identifies the estimated signal information 32 contained in the second time series data 30B by identifying intervals that match or are similar to the reference signal information contained in each of the reference interval information for each scene interval from the second time series data 30B. 【0038】 For example, as described above, let's assume that the second time series data 30B is audio data, and the section corresponding to the estimated signal information 32 included in the second time series data 30B is also audio data. Let's also assume that the reference signal information included in the reference section information is audio data. 【0039】 In this case, the effective interval extraction unit 20B calculates a correlation function between the reference signal information included in the reference interval information and the second time series data 30B for each of the multiple scene intervals. Then, for each of the multiple scene intervals, the effective interval extraction unit 20B estimates the interval in the second time series data 30B that has the maximum correlation function with the reference signal information as the estimated signal information 32. That is, the effective interval extraction unit 20B estimates the interval in the second time series data 30B that matches or is most similar to the reference signal information of each scene interval as the estimated signal information 32 for each scene interval. 【0040】 Furthermore, the effective interval extraction unit 20B estimates the estimated reference time 34 of the estimated signal information 32 for each scene interval included in the second time series data 30B, which has been estimated for each scene interval, as the second estimated time for each scene interval. 【0041】 In the example shown in Figure 2, for example, the valid interval extraction unit 20B estimates the first time t1 or the last time t2 of the estimated signal information 32 for the start signal of scene section "Scene 5" included in the second time series data 30B as the second estimated time for scene section "Scene 5". Similarly, the valid interval extraction unit 20B estimates the first time t5 or the last time t6 of the estimated signal information 32 for the start signal of scene section "Scene 6" included in the second time series data 30B as the second estimated time for scene section "Scene 6". Below, we will explain an example of how the valid interval extraction unit 20B estimates the first time (time t1 and time t5 in Figure 2) of the estimated signal information 32 for the start signal as the second estimated time for each scene section. 【0042】 The effective interval extraction unit 20B then extracts, from among the multiple scene intervals included in the second time series data 30B, the intervals in which the scene intervals are arranged according to the order represented by the reference interval information, as noise-free effective intervals. 【0043】 As described above, the reference signal information included in the reference interval information of each scene interval includes information representing the correct order of the scene intervals represented by the reference signal information. Therefore, the effective interval extraction unit 20B identifies intervals in which the order of each scene interval obtained by analyzing the reference signal information of each scene interval using a known analysis method matches the arrangement order of multiple scene intervals represented by multiple estimated signal information 32 for each scene interval included in the second time series data 30B. The effective interval extraction unit 20B then extracts the identified interval in the second time series data 30B as a noise-free effective interval. In other words, the effective interval extraction unit 20B extracts intervals in the second time series data 30B in which the scene intervals are arranged in the correct order as noise-free effective periods. 【0044】 Figure 3 is an explanatory diagram illustrating an example of extracting a valid interval of 50. 【0045】 For example, let sk be the reference signal information for the start signal of the k-th scene section from the beginning, represented by the reference interval information 40, and let ek be the reference signal information for the end signal of the k-th scene section from the beginning (where k is an integer greater than or equal to 1). Then, consider a scenario where the estimated reference times 34 for the start signals and end signals of each estimated scene section contained in the second time series data 30B are arranged in the order s1, s5, e1, s2, e2, s3, e3, s4, e4, s6, e5, e6 from the beginning of the second time series data 30B. In this case, the intervals s2, e2, s3, e3, s4, e4 are arranged in the correct order. Therefore, in this case, the valid interval extraction unit 20B extracts the scene intervals from scene interval 2 to scene interval 4, which are the scene intervals of the reference signal information s2, e2, reference signal information s3, e3, and reference signal information s4, e4 in the second time series data 30B, as the valid interval 50. In this case, the scene intervals of the reference signal information s1, s5, e1, s6, e5, e6 become the interval 51 which is considered to contain noise. 【0046】 Let us further explain using Figure 2. For example, let's assume that the valid interval extraction unit 20B identifies the intervals containing Scene 5 and Scene 6, which are included in the second time series data 30B, as the intervals in which the scene intervals are arranged according to the order represented by the reference interval information. In this case, for example, the valid interval extraction unit 20B extracts the interval from time t1, which is the estimated reference time 34 of scene interval "Scene 5", to time t8, which is the end time of the estimated reference time 34 adjacent to scene interval "Scene 6", as the valid interval 50. The start time of the valid interval 50 may be the start time t2 of scene interval "Scene 5". The end time of the valid interval 50 may be the end time t7 of scene interval "Scene 6". 【0047】 The valid interval extraction unit 20B then identifies the second estimated time included in the valid interval 50 as the valid interval time for each scene interval included in the valid interval 50. In the example shown in Figure 2, the valid interval extraction unit 20B identifies the second estimated time t1 for scene interval "Scene 5" and the second estimated time t5 for scene interval "Scene 6", both included in the valid interval 50, as the valid intervals for each scene interval. 【0048】 Furthermore, as mentioned above, we can assume that the second time series data 30B is audio data, and that the section corresponding to the estimated signal information 32 included in the second time series data 30B is also audio data. We can also assume that the reference signal information included in the reference section information is text data. 【0049】 In this case, the valid interval extraction unit 20B performs speech recognition on the second time series data 30B to obtain the user's utterance intervals included in the second time series data 30B and the transcript data representing the speech recognition results of those utterance intervals as text data. For speech recognition, known techniques such as those described in Reference 1 may be used. 【0050】 (Reference 1) Bain, Max, et al. "WhisperX:Time-accurate speech transcription of long-form audio." ArXiv preprint arXiv:2303.00747 (2023). 【0051】 The effective interval extraction unit 20B then estimates the second estimated time for each scene interval included in the second time series data 30B based on the speech recognition result of the utterance interval and the reference signal information included in the reference interval information of each of the multiple scene intervals. 【0052】 Figure 4 is an explanatory diagram illustrating an example of estimating the second estimated time using speech recognition. 【0053】 For example, let's assume that the reference signal information 42 included in each reference section information 40 of the scene section is the text data shown in Figure 4. Let's also assume that the transcript data of the speech section included in the first time series data 30A is the multiple transcript data shown in Figure 4. Each transcript data is assigned a time in the second time series data 30B. 【0054】 In this case, the valid interval extraction unit 20B identifies the matching or most similar transcribed data as the estimated signal information 32 for each of the multiple reference signal information 42. The valid interval extraction unit 20B can identify the corresponding estimated signal information 32 for each of the reference signal information 42 by identifying the transcribed data with the closest edit distance. 【0055】 The effective interval extraction unit 20B then uses the time of the identified estimated signal information 32 as the estimated reference time 34, and estimates this estimated reference time 34 as the second estimated time for each scene interval included in the second time series data 30B. 【0056】 Then, the valid interval extraction unit 20B, in the same way as when the reference interval information 40 is audio data, extracts from among the multiple scene intervals included in the second time series data 30B the interval in which the scene intervals are arranged in the order represented by the reference interval information 40, as a noise-free valid interval 50. 【0057】 Furthermore, the valid interval extraction unit 20B identifies the second estimated time (estimated estimated reference time 34) included in the valid interval 50 as the valid interval time for each scene interval included in the valid interval 50. 【0058】 Returning to Figure 1, we continue the explanation. 【0059】 The valid interval extraction unit 20B outputs the valid interval time for each of the valid intervals 50 and the scene intervals included in the valid interval 50, which are extracted from the second time series data 30B, to the second difference calculation unit 20C. 【0060】 The second difference calculation unit 20C calculates the second difference between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B and the reference time of the corresponding scene interval represented by the reference interval information 40. 【0061】 In this embodiment, the second difference calculation unit 20C calculates the average of one or at least some of the multiple differences between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B and the reference time of the corresponding scene interval represented by the reference interval information 40, as the second difference. 【0062】 In other words, the second difference calculation unit 20C calculates the average of one or more differences among the differences for each scene section, which are the effective section time in the second time series data 30B and the reference time, which is the exact time of the corresponding same scene section represented by the reference section information 40, within the effective section 50 extracted as a noise-free section in the second time series data 30B, as the second difference. 【0063】 The second difference calculation unit 20C outputs the calculated second difference to the second shift unit 20D. 【0064】 The second shift unit 20D generates a third time series data by shifting the second time series data 30B by a time corresponding to the second difference. 【0065】 As described above, in this embodiment, the estimated reference time 34 and estimated signal information 32 for each of the multiple scene intervals included in the first time series data 30A are described assuming that they match the respective reference time and reference signal information for each of the multiple scene intervals. For this reason, the third time series data generated by the time shift processing of the second time series data 30B by the second shift unit 20D is corrected to the time series data 30 synchronized with the first time series data 30A so that the same scene intervals are at the same time. 【0066】 The second shift unit 20D outputs the generated third time series data to the output unit 20E. 【0067】 The output unit 20E extracts and outputs the scene intervals contained in the first time series data 30A received from the acquisition unit 20A and the third time series data received from the second shift unit 20D, for each pair of corresponding scene intervals between the first time series data 30A and the third time series data. 【0068】 As described above, in this embodiment, the estimated reference time 34 and estimated signal information 32 for each of the multiple scene intervals included in the first time series data 30A are described assuming that they match the respective reference time and reference signal information for each of the multiple scene intervals. Furthermore, the third time series data is time series data 30 synchronized with the first time series data 30A so that the same scene intervals are at the same time. 【0069】 Therefore, the output unit 20E can use the reference time included in the reference interval information 40 for each scene interval to identify the same scene interval from the first time series data 30A and the third time series data, and extract pairs of the identified scene intervals. The output unit 20E can then output the pairs of scene intervals extracted from the two time series data 30 (first time series data 30A and third time series data) by storing each pair in the storage unit 12, displaying them on the UI unit 14, or transmitting them to an external information processing device via the communication unit 16. For example, the output unit 20E may display pairs of the same scene intervals extracted from the two time series data 30 (first time series data 30A and third time series data) side by side in different areas on the display screen of the UI unit 14. 【0070】 Next, an example of the information processing flow performed by the information processing device 10A of this embodiment will be described. 【0071】 Figure 5 is a flowchart showing an example of the information processing flow performed by the information processing device 10A of this embodiment. 【0072】 The acquisition unit 20A acquires the first time series data 30A, the second time series data 30B, and reference interval information (step S100). 【0073】 The valid interval extraction unit 20B extracts a noise-free valid interval 50 based on the reference interval information obtained in step S100, which is an interval in which scene intervals are arranged in the order represented by the reference interval information contained in the second time series data 30B obtained in step S100 (step S102). The valid interval extraction unit 20B also identifies the second estimated time, estimated based on the reference interval information, for each of the multiple scene intervals contained in the extracted valid interval 50 as the valid interval time (step S102). 【0074】 The second difference calculation unit 20C calculates the second difference between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B extracted in step S102, and the reference time of the corresponding scene interval represented by the reference interval information 40 (step S104). 【0075】 The second shift unit 20D generates a third time series data by shifting the second time series data 30B acquired in step S100 by a time corresponding to the second difference calculated in step S104 (step S106). 【0076】 The output unit 20E extracts and outputs the scene intervals contained in the first time series data 30A acquired in step S100 and the third time series data generated in step S106, for each pair of corresponding scene intervals between the first time series data 30A and the third time series data (step S108). Then, this routine ends. 【0077】 Next, we will explain in detail the process of extracting valid intervals and identifying valid interval times, which is performed in step S102 of Figure 5. 【0078】 Figure 6A is a flowchart illustrating an example of the information processing flow performed by the information processing device 10A of this embodiment. Figure 6A shows an example of the detailed processing flow of step S102 in Figure 5. Figure 6A also shows the case where the second time series data 30B is audio data, the section corresponding to the estimated signal information 32 included in the second time series data 30B is also audio data, and the reference signal information included in the reference section information 40 is also audio data. 【0079】 The valid interval extraction unit 20B calculates a correlation function between the reference signal information 42 included in the reference interval information 40 and the second time series data 30B for each of the multiple scene intervals (step S200). 【0080】 Then, the effective interval extraction unit 20B estimates, for each of the multiple scene intervals, the interval in the second time series data 30B where the cross-correlation function with the reference signal information is maximized, as the estimated signal information 32 (step S202). 【0081】 Next, the valid interval extraction unit 20B estimates the estimated reference time 34 of the estimated signal information 32 for each scene interval included in the second time series data 30B, which has been estimated for each scene interval, as the second estimated time for each scene interval (step S204). 【0082】 Then, the valid interval extraction unit 20B extracts from among the multiple scene intervals included in the second time series data 30B the interval in which the scene intervals are arranged in the order represented by the reference interval information 40, as a noise-free valid interval 50 (step S206). 【0083】 Furthermore, the valid interval extraction unit 20B identifies the second estimated time included in the valid interval 50 as the valid interval time for each scene interval included in the valid interval 50 (step S208). Then, this routine ends. 【0084】 Next, we will explain in detail the process of extracting valid intervals and identifying valid interval times, which is performed in step S102 of Figure 5. 【0085】 Figure 6B is a flowchart illustrating an example of the information processing flow performed by the information processing device 10A in this embodiment. Figure 6B shows an example of the detailed processing flow of step S102 in Figure 5. Figure 6B also assumes that the second time series data 30B is audio data, and that the section corresponding to the estimated signal information 32 included in the second time series data 30B is also audio data. Furthermore, it assumes that the reference signal information included in the reference section information is text data. 【0086】 The valid interval extraction unit 20B performs speech recognition on the second time series data 30B to obtain the user's utterance intervals included in the second time series data 30B and the transcript data representing the speech recognition results of those utterance intervals as text data (step S300). 【0087】 The valid interval extraction unit 20B estimates the second estimated time for each scene interval included in the second time series data 30B based on the speech recognition result of the utterance interval and the reference signal information included in the reference interval information of each of the multiple scene intervals (step 302). 【0088】 The valid interval extraction unit 20B extracts, as in the case where the reference interval information 40 is audio data, the interval in which the scene intervals are arranged in the order represented by the reference interval information 40 from among the multiple scene intervals included in the second time series data 30B, as a noise-free valid interval 50 (step S304). 【0089】 The valid interval extraction unit 20B then identifies the second estimated time (estimated estimated reference time 34) included in the valid interval 50 as the valid interval time for each scene interval included in the valid interval 50 (step S306). Then, this routine ends. 【0090】 As described above, the information processing device 10A of this embodiment comprises an effective interval extraction unit 20B, a second difference calculation unit 20C, and a second shift unit 20D. The effective interval extraction unit 20B extracts a section in the second time series data 30B in which scene sections are arranged in the order represented by the reference interval information 40, based on reference interval information 40 which includes a reference time that is the start time of the scene section and reference signal information 42 that represents the order of the scene sections, for each of the multiple scene sections included in the first time series data 30A and the second time series data 30B, as a noise-free effective interval 50. The effective interval extraction unit 20B also identifies a second estimated time, estimated based on the reference interval information 40, for each of the multiple scene sections included in the effective interval 50 as the effective interval time. The second difference calculation unit 20C calculates a second difference between the effective interval time of the scene section included in the effective interval 50 of the second time series data 30B and the reference time of the corresponding scene section represented by the reference interval information 40. The second shift unit 20D generates a third time series data by shifting the second time series data 30B by a time corresponding to the second difference. 【0091】 In conventional technology, if noise is present in at least one of the two medium signals, causing the waveforms to differ drastically, it can be difficult to synchronize the two medium signals with high precision. For example, time-series data sensed in a factory may contain noise due to the influence of sensors such as imaging equipment and microphones, as well as environmental fluctuations. In such cases, conventional technology can make it difficult to synchronize multiple time-series data 30. 【0092】 On the other hand, in this embodiment, the information processing device 10A extracts the section in which the scene sections are arranged in the order represented by the reference section information 40, which is included in the second time series data 30B, as a noise-free effective section 50. The information processing device 10A then generates third time series data by shifting the second time series data 30B by a time corresponding to the second difference between the effective section time of each of the multiple scene sections included in the effective section 50 and the reference time of the corresponding scene section represented by the reference section information 40. As a result, the third time series data becomes time series data 30 in which the second time series data 30B has been shifted in time to synchronize with the first time series data 30A based on the noise-free effective section 50. 【0093】 Thus, in this embodiment, the information processing device 10A performs time shifting using an effective interval 50 that does not contain noise, making it possible to synchronize multiple time-series data 30 with high accuracy. 【0094】 Therefore, the information processing device 10A of this embodiment can synchronize multiple time-series data 30 with high precision. 【0095】 (Second embodiment) In this embodiment, a method for calculating the second difference using a method different from the above embodiment will be described as an example. 【0096】 In this embodiment, as in the above embodiment, the description will assume that the first time series data 30A and the second time series data 30B are audio data. Also, as in the above embodiment, the description will assume that the estimated reference time 34 and estimated cue information 32 for each of the multiple scene sections included in the first time series data 30A match the respective reference time and reference cue information for each of the multiple scene sections. 【0097】 Figure 7 is a schematic diagram of an example of the information processing device 10B of this embodiment. The information processing device 10B is an example of the information processing device 10. 【0098】 The information processing device 10B comprises a storage unit 12, a UI unit 14, a communication unit 16, and a control unit 22. The storage unit 12, the UI unit 14, the communication unit 16, and the control unit 22 are communicated with each other via a bus 18 or the like. The information processing device 10B is the same as the information processing device 10A of the above embodiment, except that it includes a control unit 22 instead of a control unit 20. 【0099】 The control unit 22 performs various processes in the information processing device 10B. The control unit 22 includes an acquisition unit 22A, an effective interval extraction unit 20B, a second difference calculation unit 22C, a second shift unit 20D, and an output unit 20E. The control unit 22 is the same as the information processing device 10A in the above embodiment, except that it includes an acquisition unit 22A and a second difference calculation unit 22C instead of the acquisition unit 20A and the second difference calculation unit 20C. 【0100】 The acquisition unit 22A acquires the first time series data 30A, the second time series data 30B, and the reference interval information 40, similar to the acquisition unit 20A in the above embodiment. 【0101】 The acquisition unit 22A outputs the reference interval information and the second time series data 30B to the valid interval extraction unit 20B, similar to the acquisition unit 20A. The acquisition unit 22A also outputs the second time series data 30B to the second shift unit 20D, similar to the acquisition unit 20A. The acquisition unit 22A also outputs the first time series data and reference interval information to the output unit 20E, similar to the acquisition unit 20A. In this embodiment, the acquisition unit 22A outputs the first time series data 30A to the second difference calculation unit 22C. 【0102】 The second difference calculation unit 22C calculates the second difference between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B and the reference time of the corresponding scene interval represented by the reference interval information 40. 【0103】 In this embodiment, the second difference calculation unit 22C calculates a second difference between the effective interval time of a scene interval included in the effective interval 50 of the second time series data 30B and the reference time of the corresponding scene interval of the first time series data 30A, which is the scene interval (the scene with the greatest correlation of estimated signal information between the two data = the corresponding scene). 【0104】 As described above, in this embodiment, the estimated reference time 34 and estimated signal information 32 for each of the multiple scene intervals included in the first time series data 30A will be explained assuming that they match the respective reference time and reference signal information for each of the multiple scene intervals. For this reason, the second difference calculation unit 22C uses the reference time included in the reference interval information 40 of the scene interval as the time of the corresponding scene interval, which is a scene interval in the first time series data 30A that corresponds to a scene interval included in the effective interval 50 of the second time series data 30B. 【0105】 The scene segments in the first time series data 30A that correspond to the scene segments included in the effective interval 50 of the second time series data 30B can be identified using the cross-correlation function. 【0106】 In detail, the second difference calculation unit 22C calculates a cross-correlation function between each scene interval included in the effective interval 50 of the second time series data 30B and the first time series data 30A. Then, for each scene interval included in the effective interval 50 of the second time series data 30B, the second difference calculation unit 22C identifies the interval included in the first time series data 30A that maximizes the cross-correlation function with that scene interval as the corresponding scene interval to the scene interval included in the effective interval 50. That is, the second difference calculation unit 22C uses the cross-correlation function to identify the interval in the first time series data 30A that matches or is most similar to each scene interval included in the effective interval 50 of the second time series data 30B as the corresponding scene interval to each of those scene intervals. 【0107】 The second difference calculation unit 22C then calculates the second difference between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B and the reference time of the identified corresponding scene interval. 【0108】 The second difference calculation unit 22C calculates the average of one or at least some of the differences between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B and the reference time of the corresponding scene interval in the first time series data 30A, as the second difference. 【0109】 In other words, the second difference calculation unit 22C calculates the average of one or more differences among the differences for each scene section, which are the effective section time in the second time series data 30B for each scene section included in the effective section 50 extracted as a noise-free section in the second time series data 30B, and the reference time which is the estimated reference time 34 of the corresponding scene section in the first time series data 30A, which includes estimated signal information 32 and estimated reference time 34 that match the accurate reference section information 40, as the second difference. The second difference calculation unit 22C outputs the calculated second difference to the second shift unit 20D. 【0110】 The information processing performed in the information processing device 10B is the same as the information processing performed in the information processing device 10A of the above embodiment, except that the second difference calculation unit 22C of this embodiment performs the calculation of the second difference group instead of the second difference calculation processing performed by the second difference calculation unit 20C. 【0111】 As described above, the information processing device 10B of this embodiment, similar to the information processing device 10A of the above embodiment, extracts the section in which the scene sections are arranged in the order represented by the reference section information 40 included in the second time series data 30B as a noise-free effective section 50. The information processing device 10B then generates third time series data by shifting the second time series data 30B by a time corresponding to the second difference between the effective section time of the scene section included in the effective section 50 of the second time series data 30B and the reference time of the corresponding scene section of the first time series data 30A. Therefore, the third time series data is time series data 30 in which the second time series data 30B has been shifted in time to synchronize with the first time series data 30A based on the noise-free effective section 50. 【0112】 Thus, in the information processing device 10B of this embodiment, similar to the above embodiment, time shifting is performed using an effective interval 50 that does not contain noise, so multiple time-series data 30 can be synchronized with high accuracy. 【0113】 Therefore, the information processing device 10B of this embodiment can synchronize multiple time-series data 30 with high accuracy, similar to the embodiment described above. 【0114】 (Third embodiment) In this embodiment, similar to the above embodiment, the description will assume that the first time series data 30A and the second time series data 30B are audio data. Furthermore, in this embodiment, the description will assume that at least a portion of the estimated reference times 34 of each of the multiple scene intervals included in both the first time series data 30A and the second time series data 30B do not match the reference times of each of the multiple scene intervals. 【0115】 Figure 8 is a schematic diagram of an example of the information processing device 10C of this embodiment. The information processing device 10C is an example of the information processing device 10. 【0116】 The information processing device 10C comprises a storage unit 12, a UI unit 14, a communication unit 16, and a control unit 23. The storage unit 12, the UI unit 14, the communication unit 16, and the control unit 23 are communicated together via a bus 18 or the like. The information processing device 10C is the same as the information processing device 10A of the above embodiment, except that it includes a control unit 23 instead of a control unit 20. 【0117】 The control unit 23 performs various processes in the information processing device 10C. The control unit 23 includes an acquisition unit 23A, an effective interval extraction unit 20B, a second difference calculation unit 20C, a second shift unit 20D, an output unit 23E, a first difference calculation unit 23F, and a first shift unit 23G. 【0118】 The control unit 23 is the same as the information processing device 10A of the above embodiment, except that it includes an acquisition unit 23A and an output unit 23E instead of the acquisition unit 20A and output unit 23E, and further includes a first difference calculation unit 23F and a first shift unit 23G. 【0119】 The acquisition unit 23A acquires the first time series data 30A, the second time series data 30B, and the reference interval information 40, similar to the acquisition unit 20A in the above embodiment. 【0120】 The acquisition unit 23A outputs the reference interval information and the second time series data 30B to the effective interval extraction unit 20B, similar to the acquisition unit 20A. The acquisition unit 23A also outputs the second time series data 30B to the second shift unit 20D, similar to the acquisition unit 20A. In this embodiment, the acquisition unit 23A outputs the reference interval information 40 to the output unit 23E. In this embodiment, the acquisition unit 23A also outputs the first time series data 30A and the reference interval information 40 to the first difference calculation unit 23F. The acquisition unit 23A also outputs the first time series data 30A to the first shift unit 23G. 【0121】 The first difference calculation unit 23F calculates the first difference between the first estimated time, which is estimated based on the reference interval information 40 for each scene interval included in the first time series data 30A, and the reference time of the corresponding scene interval included in the first time series data 30A, which is represented by the reference interval information 40. 【0122】 For example, let's assume that the first time series data 30A is audio data, and the section corresponding to the estimated signal information 32 included in the first time series data 30A is also audio data. Let's also assume that the reference signal information 42 included in the reference section information 40 is audio data. 【0123】 In this case, the first difference calculation unit 23F calculates the cross-correlation function between the reference signal information 42 for each of the multiple scene intervals and the first time series data 30A. Then, for each of the multiple scene intervals, the first difference calculation unit 23F estimates the interval in the first time series data 30A that has the maximum cross-correlation function with the reference signal information 42 as the estimated signal information 32 for the first time series data 30A. That is, the first difference calculation unit 23F identifies the interval in the first time series data 30A that matches or is most similar to each of the reference signal information 42 for each scene interval as the estimated signal information 32 for each scene interval. 【0124】 Furthermore, the first difference calculation unit 23F estimates the estimated reference time 34 of the estimated signal information 32 for each scene interval included in the first time series data 30A, which has been estimated for each scene interval, as the first estimated time for each scene interval included in the first time series data 30A. 【0125】 The first difference calculation unit 23F then calculates the first difference as the average of one or at least some of the differences between the first estimated time of each of the multiple scene intervals included in the first time series data 30A and the reference time of the corresponding scene interval, which is represented by the reference interval information 40 and is included in the first time series data 30A. 【0126】 Furthermore, as mentioned above, we assume that the first time series data 30A is audio data, and the section corresponding to the estimated signal information 32 included in the first time series data 30A is also audio data. We also assume that the reference signal information 42 included in the reference section information 40 is text data. 【0127】 In this case, the first difference calculation unit 23F performs speech recognition on the first time series data 30A to obtain the user's utterance intervals included in the first time series data 30A and the transcript data representing the speech recognition results of those utterance intervals as text data. For speech recognition, known techniques such as those described in Reference 1 may be used. 【0128】 The first difference calculation unit 23F then estimates the first estimated time for each scene segment included in the first time series data 30A based on the speech recognition result of the utterance segment and the reference signal information included in the reference segment information of each of the multiple scene segments. Specifically, the first difference calculation unit 23F estimates the estimated reference time of the estimated signal information 32, which is the speech recognition result of the utterance segment corresponding to the reference signal information 42, as the first estimated time for the scene segment. 【0129】 The first difference calculation unit 23F then calculates the first difference as the average of one or at least some of the differences between the first estimated time of each of the multiple scene intervals included in the first time series data 30A and the reference time of the corresponding scene interval, which is represented by the reference interval information 40 and is included in the first time series data 30A. 【0130】 The first difference calculation unit 23F outputs the calculated first difference to the first shift unit 23G. 【0131】 The first shift unit 23G generates fourth time series data by time-shifting the first time series data 30A by a time corresponding to the first difference. Therefore, the fourth time series data becomes time series data 30 generated by time-shifting the first time series data 30A so that its estimated reference time 34 matches the reference time. The first shift unit 23G then outputs the generated fourth time series data to the output unit 23E. 【0132】 As described in the above embodiment, the third time series data generated by the time shift processing of the second time series data 30B by the second shift unit 20D is data obtained by time shifting the second time series data 30B so that the estimated reference time of the included scene section coincides with the reference time. Similarly, the fourth time series data generated by the time shift processing of the first time series data 30A by the first shift unit 23G is data obtained by time shifting the included scene section so that the estimated reference time coincides with the reference time. Therefore, the first shift unit 23G and the second shift unit 20D synchronize the first time series data 30A and the second time series data 30B so that the same included scene section is at the same time. 【0133】 The output unit 23E extracts and outputs the scene intervals contained in the fourth time series data received from the first shift unit 23G and the third time series data received from the second shift unit 20D, for each pair of corresponding scene intervals between the fourth time series data and the third time series data. 【0134】 The output unit 23E can use the reference time contained in the reference interval information 40 for each scene interval to identify the same scene interval from the fourth time series data and the third time series data, and extract pairs of the identified scene intervals. The output unit 23E can then output the pairs of scene intervals extracted from the two time series data 30 (fourth time series data and third time series data) by storing each pair in the storage unit 12, displaying them on the UI unit 14, or transmitting them to an external information processing device via the communication unit 16. For example, the output unit 23E may display pairs of the same scene intervals extracted from the two time series data 30 (fourth time series data and third time series data) side by side in different areas on the display screen of the UI unit 14. 【0135】 Next, an example of the information processing flow performed by the information processing device 10C of this embodiment will be described. 【0136】 Figure 9 is a flowchart showing an example of the information processing flow performed by the information processing device 10C of this embodiment. 【0137】 The acquisition unit 23A acquires the first time series data 30A, the second time series data 30B, and reference interval information (step S400). 【0138】 Then, the control unit 23 executes the processing of steps S402 to S406 in the same manner as steps S102 to S106 in Figure 5 of the above embodiment. 【0139】 Specifically, the valid interval extraction unit 20B extracts the valid interval 50 included in the second time series data 30B acquired in step S400 based on the reference interval information acquired in step S400, and identifies the valid interval time of the scene interval included in the valid interval 50 (step S402). Then, the second difference calculation unit 20C calculates the second difference between the valid interval time of the scene interval included in the valid interval 50 of the second time series data 30B extracted in step S402, and the reference time of the corresponding scene interval represented by the reference interval information 40 (step S404). The second shift unit 20D generates third time series data by shifting the second time series data 30B acquired in step S400 by a time corresponding to the second difference calculated in step S404 (step S406). 【0140】 Meanwhile, the first difference calculation unit 23F calculates the first difference between the first estimated time, which is estimated based on the reference interval information 40 for each scene interval included in the first time series data 30A acquired in step S400, and the reference time of the corresponding scene interval included in the first time series data 30A, which is represented by the reference interval information 40 (step S408). 【0141】 The first shift unit 23G generates a fourth time series data by time-shifting the first time series data 30A acquired in step S400 by a time corresponding to the first difference (step S410). 【0142】 The output unit 23E extracts and outputs the scene intervals contained in the third time series data generated in step S406 and the fourth time series data generated in step S410, for each pair of corresponding scene intervals between the third and fourth time series data (step S412). Then, this routine ends. 【0143】 Next, we will explain the details of the first difference calculation process performed in step S408 of Figure 9. 【0144】 Figure 10A is a flowchart showing an example of the information processing flow performed by the information processing device 10C of this embodiment. Figure 10A assumes that the first time series data 30A is audio data, and that the section corresponding to the estimated signal information 32 included in the first time series data 30A is also audio data. It also assumes that the reference signal information 42 included in the reference section information 40 is audio data. 【0145】 In this case, the first difference calculation unit 23F calculates the cross-correlation function between the reference signal information 42 for each of the multiple scene intervals and the first time series data 30A (step S500). Then, for each of the multiple scene intervals, the first difference calculation unit 23F estimates the interval in the first time series data 30A where the cross-correlation function with the reference signal information 42 is maximized as the estimated signal information 32 in the first time series data 30A (step S502). 【0146】 Furthermore, the first difference calculation unit 23F estimates the estimated reference time 34 of the estimated signal information 32 for each scene section included in the first time series data 30A, which has been identified for each scene section, as the first estimated time for each scene section included in the first time series data 30A (step S504). 【0147】 The first difference calculation unit 23F then calculates the average of one or at least some of the differences between the first estimated time of each of the multiple scene intervals included in the first time series data 30A and the reference time of the corresponding scene interval, which is represented by the reference interval information 40 and is included in the first time series data 30A, as the first difference (step S506). Then, this routine ends. 【0148】 Figure 10B is a flowchart showing an example of the information processing flow performed by the information processing device 10C of this embodiment. Figure 10B assumes that the first time series data 30A is audio data, and that the section corresponding to the estimated signal information 32 included in the first time series data 30A is also audio data. Furthermore, it assumes that the reference signal information 42 included in the reference section information 40 is text data. 【0149】 In this case, the first difference calculation unit 23F performs speech recognition on the first time series data 30A to obtain the user's utterance intervals included in the first time series data 30A and the transcript data representing the speech recognition results of those utterance intervals as text data (step S600). 【0150】 Then, the first difference calculation unit 23F estimates the first estimated time for each scene segment included in the first time series data 30A based on the speech recognition result of the utterance segment and the reference signal information included in the reference segment information of each of the multiple scene segments (step S602). 【0151】 The first difference calculation unit 23F then calculates the average of one or at least some of the differences between the first estimated time of each of the multiple scene intervals included in the first time series data 30A and the reference time of the corresponding scene interval included in the first time series data 30A, which is represented by the reference interval information 40, as the first difference (step S604). Then, this routine ends. 【0152】 As described above, the information processing device 10C of this embodiment, similar to the information processing device 10A of the above embodiment, extracts the section in which the scene sections are arranged in the order represented by the reference section information 40 included in the second time series data 30B as a noise-free effective section 50. The information processing device 10B then generates third time series data by shifting the second time series data 30B by a time corresponding to the second difference between the effective section time of the scene section included in the effective section 50 of the second time series data 30B and the reference time of the corresponding scene section of the first time series data 30A. 【0153】 Furthermore, the information processing device 10C of this embodiment generates a fourth time series data by shifting the first time series data 30A by a time corresponding to the first difference between the first estimated time of each scene interval included in the first time series data 30A and the reference time of the scene interval. 【0154】 In this embodiment, the information processing device 10C extracts and outputs the scene intervals contained in the fourth time series data and the third time series data, for each pair of corresponding scene intervals between the fourth time series data and the third time series data. 【0155】 Therefore, in the information processing device 10C of this embodiment, even if at least a portion of the reference times included in both the first time series data 30A and the second time series data 30B do not match the reference times included in the reference interval information 40, the multiple time series data 30 can be synchronized with high accuracy. 【0156】 (Fourth embodiment) In this embodiment, a method for calculating the second difference, different from that of the third embodiment described above, will be explained as an example. 【0157】 In this embodiment, as in the above embodiment, the description will assume that the first time series data 30A and the second time series data 30B are audio data. Also, as in the third embodiment, the description will assume that at least a portion of the estimated reference times 34 of each of the multiple scene intervals included in both the first time series data 30A and the second time series data 30B do not match the reference times of the corresponding scene intervals. 【0158】 Figure 11 is a schematic diagram of an example of the information processing device 10D of this embodiment. The information processing device 10D is an example of the information processing device 10. 【0159】 The information processing device 10D comprises a storage unit 12, a UI unit 14, a communication unit 16, and a control unit 24. The storage unit 12, the UI unit 14, the communication unit 16, and the control unit 24 are communicated together via a bus 18 or the like. The information processing device 10D is the same as the information processing device 10A of the above embodiment, except that it includes a control unit 24 instead of a control unit 20. 【0160】 The control unit 24 performs various processes in the information processing device 10D. The control unit 24 includes an acquisition unit 23A, an effective interval extraction unit 20B, a second difference calculation unit 24C, a second shift unit 20D, an output unit 23E, a first difference calculation unit 23F, and a first shift unit 24G. 【0161】 The control unit 24 is the same as the control unit 23 of the information processing device 10C of the third embodiment, except that it includes a first shift unit 24G instead of the first shift unit 23G, and a second difference calculation unit 24C instead of the second difference calculation unit 20C. 【0162】 The first shift unit 24G is the same as the first shift unit 23G in the third embodiment described above, except that it outputs the generated fourth time series data to the second difference calculation unit 24C. 【0163】 The second difference calculation unit 24C calculates the second difference between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B and the reference time of the corresponding scene interval represented by the reference interval information 40. 【0164】 In this embodiment, the second difference calculation unit 22C calculates a second difference between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B and the reference time of the corresponding scene interval, which is the corresponding scene interval included in the fourth time series data represented by the reference interval information 40. 【0165】 As described above, the fourth time series data is time series data 30 generated by time-shifting the first time series data 30A so that its estimated reference time 34 matches the reference time. Therefore, the second difference calculation unit 22C uses the time of the corresponding scene section (estimated reference time 34), which is a scene section in the fourth time series data, and uses the reference time included in the reference section information 40 of that scene section. 【0166】 The scene segments in the fourth time series data that correspond to the scene segments included in the effective interval 50 of the second time series data 30B can be identified using the cross-correlation function. 【0167】 In detail, the second difference calculation unit 24C calculates a cross-correlation function between each of the scene intervals included in the effective interval 50 of the second time series data 30B and the fourth time series data. Then, for each of the scene intervals included in the effective interval 50 of the second time series data 30B, the second difference calculation unit 24C identifies the interval included in the fourth time series data that maximizes the cross-correlation function with that scene interval as the corresponding scene interval to the scene interval included in the effective interval 50. That is, the second difference calculation unit 24C uses the cross-correlation function to identify the interval of the fourth time series data that matches or is most similar to each of the scene intervals included in the effective interval 50 of the second time series data 30B as the corresponding scene interval to each of those scene intervals. 【0168】 The second difference calculation unit 24C then calculates a second difference between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B and the reference time, which is the estimated reference time 34 of the identified corresponding scene interval. 【0169】 The second difference calculation unit 24C calculates the average of one or at least some of the differences between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B and the reference time of the corresponding scene interval in the first time series data 30A, as the second difference. 【0170】 In other words, the second difference calculation unit 24C calculates the average of one or more differences among the differences for each scene section, which are the effective section time in the second time series data 30B and the reference time, which is the estimated reference time 34 of the corresponding scene section in the fourth time series data, within the effective section 50 extracted as a noise-free section in the second time series data 30B. The second difference calculation unit 24C outputs the calculated second difference to the second shift unit 20D. The processing of the second shift unit 20D and the output unit 23E is the same as in the above embodiment. 【0171】 Next, an example of the information processing flow performed by the information processing device 10D of this embodiment will be described. 【0172】 Figure 12 is a flowchart showing an example of the information processing flow performed by the information processing device 10D of this embodiment. 【0173】 The control unit 24 of the information processing device 10D executes the processing in steps S800 to S802 in the same manner as steps S400 to S402 shown in Figure 9. Specifically, the acquisition unit 23A acquires the first time series data 30A, the second time series data 30B, and reference interval information (step S800). The valid interval extraction unit 20B extracts the valid interval 50 included in the second time series data 30B acquired in step S800 based on the reference interval information acquired in step S800, and identifies the valid interval time of the scene interval included in the valid interval 50 (step S802). 【0174】 Then, the control unit 24 of the information processing device 10D executes the processes of steps S804 to S806 in the same manner as steps S408 to S410 shown in Figure 9. Specifically, the first difference calculation unit 23F calculates the first difference between the first estimated time, which is estimated based on the reference interval information 40 for each scene interval included in the first time series data 30A acquired in step S800, and the reference time of the corresponding scene interval included in the first time series data 30A, which is represented by the reference interval information 40 (step S804). The first shift unit 23G generates fourth time series data by time-shifting the first time series data 30A acquired in step S800 by a time corresponding to the first difference (step S806). 【0175】 The second difference calculation unit 24C calculates the second difference between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B extracted in step S802 and the reference time of the corresponding scene interval included in the fourth time series data generated in step S806 (step S808). 【0176】 Then, the control unit 24 of the information processing device 10D executes the processing in steps S810 and S812 in the same manner as steps 406 and S412 shown in Figure 9. Specifically, the second shift unit 20D generates third time series data by shifting the second time series data 30B acquired in step S800 by a time corresponding to the second difference calculated in step S808 (step S810). The output unit 23E extracts and outputs the scene intervals contained in the fourth time series data generated in step S806 and the third time series data generated in step S810, for each pair of corresponding scene intervals between the fourth time series data and the third time series data (step S812). Then, this routine ends. 【0177】 As described above, the information processing device 10D of this embodiment, similar to the information processing device 10A of the above embodiment, extracts the section in which the scene sections are arranged in the order represented by the reference section information 40 included in the second time series data 30B as a noise-free effective section 50. Furthermore, the information processing device 10C of this embodiment generates a fourth time series data by shifting the first time series data 30A by a time corresponding to the first difference between the first estimated time of each scene section included in the first time series data 30A and the reference time of the scene section. 【0178】 Then, the information processing device 10D generates a third time series data by shifting the second time series data 30B by a time corresponding to the second difference between the effective interval time of the scene interval included in the effective interval 50 of the second time series data 30B and the reference time of the corresponding scene interval of the fourth time series data. 【0179】 In this embodiment, the information processing device 10D extracts and outputs the scene intervals included in the fourth time series data and the third time series data, for each pair of corresponding scene intervals between the fourth time series data and the third time series data. 【0180】 Therefore, in the information processing device 10D of this embodiment, even if at least a portion of the reference times included in both the first time series data 30A and the second time series data 30B do not match the reference times included in the reference interval information 40, the multiple time series data 30 can be synchronized with high accuracy. 【0181】 Next, an example of the hardware configuration of the information processing device 10 (information processing devices 10A to 10D) of the above embodiment will be described. 【0182】 Figure 13 is a hardware configuration diagram of an example of the information processing device 10 of the above embodiment. 【0183】 The information processing device 10 of the above embodiment includes a control device such as a CPU (Central Processing Unit) 90D, storage devices such as a ROM (Read Only Memory) 90E, RAM (Random Access Memory) 90F, and HDD (Hard Disk Drive) 90G, an I / F unit 90B which serves as an interface to various devices, an output unit 90A which outputs various information, an input unit 90C which accepts user operations, and a bus 90H which connects each unit, and has a hardware configuration that uses a normal computer. 【0184】 In the information processing device 10 of the above embodiment, each of the above components is realized on the computer by the CPU 90D reading a program from the ROM 90E onto the RAM 90F and executing it. 【0185】 The program for executing each of the above processes performed by the information processing device 10 in the above embodiment may be stored in the HDD 90G. Alternatively, the program for executing each of the above processes performed by the information processing device 10 in the above embodiment may be pre-installed and provided in the ROM 90E. 【0186】 Furthermore, the program for executing the above-described process performed by the information processing device 10 of the above embodiment may be stored in an installable or executable file format on a computer-readable storage medium such as a CD-ROM, CD-R, memory card, DVD (Digital Versatile Disc), or flexible disk (FD), and provided as a computer program product. Alternatively, the program for executing the above-described process performed by the information processing device 10 of the above embodiment may be stored on a computer connected to a network such as the Internet and provided by allowing download via the network. Alternatively, the program for executing the above-described process performed by the information processing device 10 of the above embodiment may be provided or distributed via a network such as the Internet. 【0187】 Although this embodiment has been described above, it is presented as an example and is not intended to limit the scope of the invention. This novel embodiment can be implemented in various other forms, and various omissions, substitutions, and modifications can be made without departing from the spirit of the invention. This embodiment and its variations are included in the scope and spirit of the invention, as well as in the claims of the invention and its equivalents. [Explanation of Symbols] 【0188】 10, 10A, 10B, 10C, 10D Information Processing Device 20C, 22C, 24C 2nd difference calculation section 20B Effective interval extraction unit 20D Second Shift Section 20E, 23E Output Section 23F 1st difference calculation section 23G, 24G First Shift Section
Claims
[Claim 1] Based on reference interval information for each of the multiple scene intervals contained in the first time series data and the second time series data, which includes a reference time that is the start time of the scene interval and reference signal information that represents the order of the scene intervals, the intervals in the second time series data in which the scene intervals are arranged in the order represented by the reference interval information are extracted as valid intervals free of noise. A valid interval extraction step, which identifies a second estimated time, estimated based on the reference interval information, for each of the multiple scene intervals included in the valid interval, as the valid interval time; A second difference calculation step, which calculates a second difference between the effective interval time of the scene interval included in the effective interval of the second time series data and the reference time of the corresponding scene interval represented by the reference interval information, A second shift step generates a third time series data obtained by shifting the second time series data by a time corresponding to the second difference, An information processing program that causes a computer to execute something. [Claim 2] Output step of extracting and outputting the scene intervals included in each of the first time series data and the third time series data for each pair of corresponding scene intervals between the first time series data and the third time series data, The information processing program according to claim 1, further comprising: [Claim 3] The above-mentioned effective interval extraction step is, For each of the multiple scene intervals, the estimated reference time of the estimated signal information included in the second time series data is estimated as the second estimated time for each of the multiple scene intervals, such that the cross-correlation function between the reference signal information included in the reference interval information and the second time series data is maximized. From among the multiple scene intervals included in the second time series data, the interval in which the scene intervals are arranged in the order represented by the reference interval information is extracted as the noise-free effective interval. The second estimated time for each of the multiple scene intervals included in the effective interval is identified as the effective interval time number. The information processing program according to claim 1. [Claim 4] The above-mentioned effective interval extraction step is, Based on the reference signal information included in the reference section information of each of the multiple scene sections and the speech recognition results of the speech sections included in the second time series data including audio data, the second estimated time of the scene section included in the second time series data is estimated. The information processing program according to claim 1. [Claim 5] The second difference calculation step described above is: The second difference is calculated as the average of one or at least some of the multiple differences between the effective interval time of each of the multiple scene intervals included in the effective interval of the second time series data and the reference time of the corresponding scene interval represented by the reference interval information. The information processing program according to claim 1. [Claim 6] The second difference calculation step described above is: The second difference is calculated between the effective interval time of the scene interval included in the effective interval of the second time series data and the reference time of the corresponding scene interval, which is the corresponding scene interval of the first time series data. The information processing program according to claim 1. [Claim 7] The second difference calculation step described above is: The second difference is calculated between the effective interval time of the scene interval included in the effective interval of the second time series data and the reference time of the corresponding scene interval included in the first time series data, which is the scene interval included in the first time series data that maximizes the cross-correlation function with the scene interval for each of the scene intervals included in the effective interval of the second time series data. The information processing program according to claim 6. [Claim 8] A first difference calculation step of calculating a first difference between a first estimated time estimated based on the reference interval information for each of the scene intervals included in the first time series data, and the reference time of the corresponding scene interval included in the first time series data, which is represented by the reference interval information; A first shift step generates a fourth time series data obtained by shifting the first time series data by a time corresponding to the first difference, An output step of extracting and outputting the scene intervals included in each of the fourth time series data and the third time series data for each pair of corresponding scene intervals between the fourth time series data and the third time series data, The information processing program according to claim 1, which includes the following: [Claim 9] The first difference calculation step described above is: The estimated reference time of the estimated signal information in the first time series data, which maximizes the cross-correlation function between the reference signal information of each of the multiple scene intervals and the first time series data, is estimated as the first estimated time for each of the multiple scene intervals included in the first time series data. The first difference is calculated as the average of one or at least some of the differences between the first estimated time of each of the multiple scene intervals included in the first time series data and the reference time of the corresponding scene interval, which is represented by the reference interval information and is included in the first time series data. The information processing program according to claim 8. [Claim 10] The first difference calculation step described above is: Based on the reference signal information included in the reference section information of each of the multiple scene sections and the speech recognition result of the utterance section included in the first time series data including the audio data, the first estimated time of the scene section included in the first time series data is estimated. The first difference is calculated as the average of one or at least some of the differences between the first estimated time of each of the multiple scene intervals included in the first time series data and the reference time of the corresponding scene interval, which is represented by the reference interval information and is included in the first time series data. The information processing program according to claim 8. [Claim 11] The second difference calculation step described above is: The second difference is calculated between the effective interval time of the scene interval included in the effective interval of the second time series data and the reference time of the corresponding scene interval, which is the corresponding scene interval included in the fourth time series data represented by the reference interval information. The information processing program according to claim 8. [Claim 12] The second difference calculation step described above is: The effective interval time of the scene interval included in the effective interval of the second time series data, The second difference is calculated between the reference time of the corresponding scene interval, which is the scene interval in the fourth time series data that maximizes the cross-correlation function between the effective interval of the second time series data and the fourth time series data. The information processing program according to claim 11. [Claim 13] At least one of the first time series data and the second time series data is It is audio data or video data. The information processing program according to claim 1. [Claim 14] Based on reference interval information for each of the multiple scene intervals contained in the first time series data and the second time series data, which includes a reference time that is the start time of the scene interval and reference signal information that represents the order of the scene intervals, the intervals in the second time series data in which the scene intervals are arranged in the order represented by the reference interval information are extracted as valid intervals free of noise. A valid interval extraction unit identifies a second estimated time, estimated based on the reference interval information, for each of the multiple scene intervals included in the valid interval, as the valid interval time. A second difference calculation unit calculates a second difference between the effective interval time of the scene section included in the effective interval of the second time series data and the reference time of the corresponding scene section represented by the reference interval information, A second shift unit generates a third time series data obtained by shifting the second time series data by a time corresponding to the second difference, An information processing device equipped with the following features. [Claim 15] An information processing method performed by an information processing device, Based on reference interval information for each of the multiple scene intervals contained in the first time series data and the second time series data, which includes a reference time that is the start time of the scene interval and reference signal information that represents the order of the scene intervals, the intervals in the second time series data in which the scene intervals are arranged in the order represented by the reference interval information are extracted as valid intervals free of noise. A valid interval extraction step, which identifies a second estimated time, estimated based on the reference interval information, for each of the multiple scene intervals included in the valid interval, as the valid interval time; A second difference calculation step, which calculates a second difference between the effective interval time of the scene interval included in the effective interval of the second time series data and the reference time of the corresponding scene interval represented by the reference interval information, A second shift step generates a third time series data obtained by shifting the second time series data by a time corresponding to the second difference, Information processing methods including