Information processing method and information processing system
By using computer processing methods and neural network models, based on instrument information and sound information, the image information of the performer can be determined and transmitted, solving the problem of image information determination and transmission in training and improving the efficiency of instrument performance training.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- YAMAHA CORP
- Filing Date
- 2021-09-03
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to effectively identify and transmit image information of instrumentalists required for training, especially when instruction is being given between students and teachers in different locations.
Using computer-based information processing methods, based on instrument and sound information, the system identifies the performer's focus areas, acquires image information of these areas, and utilizes neural network models and correspondence tables to achieve image identification and transmission of the performer.
It enables the effective identification and transmission of performer image information required for training between students and teachers in different locations, supports self-study and instruction, and improves the efficiency of instrument performance training.
Smart Images

Figure CN116324932B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to information processing methods and information processing systems. Background Technology
[0002] Patent document 1 discloses a performance evaluation device that automatically evaluates performance.
[0003] Patent Document 1: Japanese Patent Application Publication No. 10-63175 Summary of the Invention
[0004] When training in playing a musical instrument is conducted using images, it is important to identify the images of the performers required for the training.
[0005] The purpose of this invention is to provide a technique for identifying images of performers required for training.
[0006] One aspect of the present invention relates to an information processing method executed by a computer, which, based on instrument information representing an instrument, determines a part of interest from the body of a performer playing the instrument as shown by the instrument information, and obtains image information representing an image of the determined part of interest.
[0007] Other aspects of the present invention relate to information processing methods executed by a computer, which, based on sound information representing the sound output from a musical instrument, determine a part of interest from the body of a performer playing the instrument, and obtain image information representing an image of the determined part of interest.
[0008] Another aspect of the present invention relates to an information processing system comprising: a decision unit that determines a part of interest from the body of a performer playing the instrument as indicated by the instrument information, based on instrument information representing the instrument; and an acquisition unit that acquires image information representing an image of the part of interest determined by the decision unit.
[0009] Another aspect of the present invention relates to an information processing system comprising: a decision unit that determines a part of interest from the body of a performer playing the instrument based on sound information representing sound output from the instrument; and an acquisition unit that acquires image information representing an image of the part of interest determined by the decision unit. Attached Figure Description
[0010] Figure 1 This is a diagram representing an example of information provision system 1.
[0011] Figure 2 This is a diagram representing an example of a student training system 100.
[0012] Figure 3This is a diagram representing an example of the corresponding table Ta.
[0013] Figure 4 This diagram is used to illustrate the operation of the student training system 100.
[0014] Figure 5 This is a graph representing student image G3.
[0015] Figure 6 This diagram is used to illustrate the operation of the student training system 100.
[0016] Figure 7 This is a diagram representing an example of the corresponding table Ta1.
[0017] Figure 8 This is a diagram representing the student training system 101.
[0018] Figure 9 It is a diagram used to illustrate cropped images representing a part of a performer's body.
[0019] Figure 10 This is a diagram representing the student training system 102.
[0020] Figure 11 This is a diagram representing an example of a TAB spectrum.
[0021] Figure 12 This is a diagram representing an example of guitar chord progressions.
[0022] Figure 13 This is a diagram representing an example of a drum score.
[0023] Figure 14 This is a diagram representing an example of a combined musical score.
[0024] Figure 15 This is a diagram illustrating an example of a musical note produced simultaneously by multiple sounds.
[0025] Figure 16 This is a diagram showing an example of the progress information.
[0026] Figure 17 These are other examples of graphs showing the progress information.
[0027] Figure 18 This is a diagram representing the student training system 103.
[0028] Figure 19 This is a diagram representing the student training system 104.
[0029] Figure 20 This is a diagram representing an example of a user interface.
[0030] Figure 21 This is a diagram representing the student training system 105.
[0031] Figure 22 This is a diagram representing an example of the learning processing unit 191.
[0032] Figure 23 This is a diagram representing an example of learning processing.
[0033] Figure 24 This is a diagram showing other examples of the processing device 180. Detailed Implementation
[0034] A: Implementation Method 1
[0035] A1: Information Provision System 1
[0036] Figure 1 This is a diagram illustrating an example of the information providing system 1 of the present invention. The information providing system 1 is an example of an information processing system. The information providing system 1 includes a student training system 100 and a teacher guidance system 200. The student training system 100 and the teacher guidance system 200 are able to communicate with each other via a network NW. The structure of the teacher guidance system 200 is the same as that of the student training system 100.
[0037] The student training system 100 is used by students 100B who learn to play musical pieces using instrument 100A. The student training system 100 is installed in a student room located in a music classroom. The student training system 100 can also be installed in a different location than the student room in the music classroom, such as the student 100B's home.
[0038] Instrument 100A is a piano or a flute. The piano and flute are examples of both types of instruments and examples of instruments, respectively. Hereafter, the phrase "type of instrument" can be replaced with "instrument". Student 100B is an example of a performer. The location where student 100B performs on instrument 100A is predetermined in a room equipped with the student training system 100. Therefore, student 100B performing, student 100B just before performing, and student 100B immediately after performing can be photographed by a fixed camera.
[0039] The teacher guidance system 200 is used by teacher 200B who provides instruction on the performance of a piece of music using instrument 200A. The type of instrument 200A is the same as that of instrument 100A. For example, if instrument 100A is a piano, then instrument 200A is also a piano. The teacher guidance system 200 is installed in a teacher's room located in a music classroom. The teacher guidance system 200 can also be installed in a location other than the teacher's room in the music classroom, such as teacher 200B's home.
[0040] Teacher 200B is an example of a performer. The place where Teacher 200B performs on instrument 200A is predetermined in a room equipped with teacher instruction system 200. Therefore, Teacher 200B during a performance, Teacher 200B before a performance, and Teacher 200B immediately after a performance can be photographed by a fixed camera.
[0041] The student training system 100 sends student performance information a to the teacher guidance system 200. Student performance information a indicates the performance status of student 100B on instrument 100A. Student performance information a includes student image information a1 and student voice information a2.
[0042] Student image information a1 shows an image representing the state of student 100B playing instrument 100A (hereinafter referred to as "student image"). Student sound information a2 shows the sound output from instrument 100A in the state of student 100B playing instrument 100A (hereinafter referred to as "student playing sound").
[0043] The teacher guidance system 200 receives student performance information a from the student training system 100. Based on the student image information a1 contained in the student performance information a, the teacher guidance system 200 displays the student image. Based on the student voice information a2 contained in the student performance information a, the teacher guidance system 200 outputs the student's performance sound.
[0044] The teacher guidance system 200 sends teacher performance information b to the student training system 100. Teacher performance information b shows the performance status of teacher 200B on instrument 200A. Teacher performance information b includes teacher image information b1 and teacher voice information b2.
[0045] Teacher image information b1 shows an image representing the state of teacher 200B playing instrument 200A (hereinafter referred to as "teacher image"). Teacher sound information b2 shows the sound of the music output from instrument 200A in the state of teacher 200B playing instrument 200A (hereinafter referred to as "teacher playing sound").
[0046] The student training system 100 receives teacher performance information b from the teacher guidance system 200. Based on the teacher image information b1 contained in the teacher performance information b, the student training system 100 displays the teacher image. Based on the teacher voice information b2 contained in the teacher performance information b, the student training system 100 outputs the teacher performance sound.
[0047] A2: Student Training System 100
[0048] Figure 2This is a diagram illustrating an example of a student training system 100. The student training system 100 includes cameras 111-115, a microphone 120, a display unit 130, a speaker 140, an operation unit 150, a communication unit 160, a storage device 170, and a processing device 180.
[0049] Cameras 111 to 115 each include an image sensor that converts light into electrical signals. The image sensor is, for example, a CCD (Charge Coupled Device) image sensor or a CMOS (Complementary Metal Oxide Semiconductor) image sensor.
[0050] Camera 111 generates student finger information a11 by photographing each finger of the hand of student 100B who operates instrument 100A. Student finger information a11 represents the fingers of student 100B's hand and instrument 100A in an image.
[0051] Camera 112 generates student foot information a12 by photographing the two feet of student 100B operating instrument 100A. Student foot information a12 represents the two feet of student 100B operating instrument 100A and instrument 100A in an image.
[0052] Camera 113 generates full-body student information a13 by taking a picture of the full body of student 100B operating instrument 100A. Full-body student information a13 represents the full body of student 100B operating instrument 100A and instrument 100A in an image.
[0053] Camera 114 generates student mouth information a14 by photographing the mouth of student 100B operating instrument 100A. Student mouth information a14 represents the mouth of student 100B operating instrument 100A and instrument 100A in an image.
[0054] Camera 115 generates student upper body information a15 by photographing the upper body of student 100B operating instrument 100A. Student upper body information a15 represents the upper body of student 100B operating instrument 100A and instrument 100A in an image.
[0055] At least one of the following is included in the student image information a1: student finger information a11, student foot information a12, student full body information a13, student mouth information a14, and student upper body information a15. The orientation and posture of cameras 111-115 can be adjusted. Cameras 111-115 are also referred to as imaging units.
[0056] Microphone 120 picks up the student's playing notes. Microphone 120 generates student voice information a2 based on the student's playing notes. Microphone 120 is also called a pickup unit.
[0057] Display unit 130 is a liquid crystal display (LCD). Display unit 130 is not limited to an LCD; for example, it could be an OLED (Organic Light Emitting Diode) display. Display unit 130 can be a touch panel. Display unit 130 displays various types of information. For example, display unit 130 displays a teacher image based on teacher image information b1. Display unit 130 can also display student images based on student image information a1.
[0058] Speaker 140 outputs various sounds. For example, speaker 140 outputs a teacher's playing sound based on teacher's voice information b2. Speaker 140 can also output a student's playing sound based on student's voice information a2.
[0059] The operation unit 150 is a touch panel. The operation unit 150 is not limited to a touch panel; for example, it can be various operation buttons. The operation unit 150 receives various information from the user, such as student 100B. The operation unit 150 receives, for example, student instrument information c1 from the user. Student instrument information c1 indicates the type of instrument 100A. Student instrument information c1 is an example of instrument information indicating the type of instrument.
[0060] The communication unit 160 communicates with the teacher guidance system 200 via the network NW in a wired or wireless manner. The communication unit 160 can also communicate with the teacher guidance system 200 via a wired or wireless manner without using the network NW. The communication unit 160 sends student performance information a to the teacher guidance system 200. The communication unit 160 receives teacher performance information b from the teacher guidance system 200.
[0061] Storage device 170 is a computer-readable recording medium (e.g., a computer-readable non-transitory recording medium). Storage device 170 includes one or more memories. Storage device 170 includes, for example, non-volatile memory and volatile memory. Non-volatile memory includes, for example, ROM (Read Only Memory), EPROM (Erasable Programmable Read Only Memory), and EEPROM (Electrically Erasable Programmable Read Only Memory). Volatile memory includes, for example, RAM (Random Access Memory).
[0062] Storage device 170 stores the processing program, the calculation program, and various data. The processing program defines the actions of the student training system 100. The calculation program defines the operations performed on the output Y1 from the input X1.
[0063] Storage device 170 can store processing programs and arithmetic programs that can be read from the storage device of a server not shown in the figure. In this case, the server's storage device is an example of a computer-readable recording medium (e.g., a computer-readable non-transitory recording medium). Various data include multiple variables K1 described later.
[0064] The processing device 180 includes one or more CPUs (Central Processing Units). One or more CPUs are an example of one or more processors. The processing device, processor, and CPU are each an example of a computer. Some or all of the functions of the processing device 180 can be implemented by circuits such as DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), PLD (Programmable Logic Device), and FPGA (Field Programmable Gate Array).
[0065] The processing device 180 reads the processing program and the calculation program from the storage device 170. The processing device 180 functions as a determination unit 181, a decision unit 183, an acquisition unit 184, a transmission unit 185, and an output control unit 186 by executing the processing program. The processing device 180 functions as a trained model 182 by executing the calculation program and using multiple variables K1. The processing device 180 is an example of an information processing device.
[0066] The determination unit 181 uses the student voice information a2 to determine the student instrument information c2. The student instrument information c2 represents the type of instrument 100A. The student instrument information c2 is an example of instrument information representing the type of instrument. The instrument information representing the type of instrument (e.g., piano) is an example of instrument information representing the instrument (e.g., piano). The student voice information a2 is an example of association information associated with the type of instrument. The association information associated with the type of instrument (e.g., piano) is an example of association information related to the instrument (e.g., piano). In the case where the student voice information a2 represents the sound of a piano, the determination unit 181 determines the student instrument information c2, which is shown as the type of instrument 100A, specifically a piano. The determination unit 181 determines the student instrument information c2, for example, by using a trained model 182.
[0067] The trained model 182 is composed of a neural network. For example, the trained model 182 is composed of a deep neural network (DNN). The trained model 182 can also be composed of a convolutional neural network (CNN). Deep neural networks and convolutional neural networks are examples of neural networks. The trained model 182 can be composed of a combination of various neural networks. The trained model 182 can have additional features such as self-attention. The trained model 182 can also be composed of a hidden Markov model (HMM) or a support vector machine (SVM) instead of a neural network.
[0068] The trained model 182 has learned the relationship between first information associated with the type of musical instrument and second information representing the type of musical instrument associated with the first information. The first information is an example of learning association information related to the musical instrument. The second information is an example of learning instrument information representing the musical instrument determined based on the learning association information. The trained model 182 uses output sound information representing the sound output by the musical instrument as the first information. The trained model 182 uses information representing the type of musical instrument, whose output is represented by the output sound information, as the second information. The trained model 182 is an example of a first trained model.
[0069] The multiple variables K1 used to achieve the trained model 182 are determined through machine learning using multiple training data T1. Training data T1 contains a combination of input data and output data for training. Training data T1 includes first information as input data for training. Training data T1 includes second information as output data for training. An example of training data T1 is a combination of output sound information representing the sound produced by an instrument (first information) and information representing the type of instrument that produces the sound represented by the output sound information (second information).
[0070] The trained model 182 generates an output Y1 corresponding to the input X1. The trained model 182 uses "association information associated with the type of instrument (e.g., student voice information a2)" as input X1 and "information representing the type of instrument that produces the sound represented by the association information" as output Y1.
[0071] Furthermore, multiple training data sets T1 may only have training input data (first information) without training output data (second information). In this case, multiple variables K1 are determined by machine learning by dividing the multiple training data sets T1 into multiple clusters based on their similarity. Moreover, in the trained model 182, for each cluster, the second information suitable for that cluster is associated with a person. The trained model 182 determines the cluster corresponding to the input X1 and generates the second information corresponding to the determined cluster as the output Y1.
[0072] The decision unit 183 determines the body part of the student using the instrument (student instrument information c1 or c2) based on the instrument information (e.g., student 100B). The instrument used is an example of a player performing the instrument represented by the instrument information. The body part of the student is the part of the body that the teacher focuses on in relation to the instrument type represented by the instrument information. The decision unit 183 determines the body part of the student by referring to a correspondence table Ta that represents the correspondence between the instrument type and the body part (body part of the student). The body part of the student may include, for example, at least one of the following: the fingers of student 100B's hand, the two feet of student 100B, the entire body of student 100B, the mouth of student 100B, and the upper body of student 100B. The correspondence table Ta is stored in the storage device 170.
[0073] The acquisition unit 184 acquires various types of information. For example, the acquisition unit 184 acquires image information representing images of the parts of interest determined by the decision unit 183. The acquisition unit 184 acquires image information representing images of the parts of interest determined by the decision unit 183, among student finger information a11, student foot information a12, student whole body information a13, student mouth information a14, and student upper body information a15, as object image information. Object image information is an example of image information. The acquisition unit 184 generates student image information a1 by using the object image information. For example, the acquisition unit 184 generates student image information a1 that includes object image information.
[0074] The sending unit 185 sends the student image information a1 generated by the acquisition unit 184 from the communication unit 160 to the teacher guidance system 200. The teacher guidance system 200 is an example of a sending target. The sending target is an example of an external device.
[0075] The output control unit 186 controls the display unit 130 and the speaker 140. For example, the output control unit 186 displays a teacher image on the display unit 130 based on teacher image information b1. In this case, firstly, the acquisition unit 184 acquires the teacher image information b1 from the communication unit 160. The acquisition unit 184 provides the teacher image information b1 to the output control unit 186. The output control unit 186 uses the teacher image information b1 to display the teacher image on the display unit 130.
[0076] The output control unit 186 can display a student image on the display unit 130 based on the student image information a1. In this case, the acquisition unit 184 provides the student image information a1 to the output control unit 186. The output control unit 186 uses the student image information a1 to display the student image on the display unit 130. In this case, even if the teacher 200B is not present, the student 100B can learn to play the instrument 100A while observing the student image (image of the area of interest) represented by the student image information a1. In addition, if there is no teacher guidance system 200 but at least a student training system 100, the student 100B can learn to play the instrument 100A while observing the student image (image of the area of interest) represented by the student image information a1.
[0077] The output control unit 186 can display the teacher image and student image on the display unit 130 in an alternating arrangement based on the teacher image information b1 and the student image information a1. In this case, the acquisition unit 184 acquires each of the teacher image information b1 and the student image information a1 in the manner described above. The acquisition unit 184 provides the teacher image information b1 and the student image information a1 to the output control unit 186. The output control unit 186 displays the teacher image and student image on the display unit 130 in an alternating arrangement based on the teacher image information b1 and the student image information a1.
[0078] The output control unit 186 outputs the teacher's playing sound to the speaker 140 based on the teacher's voice information b2. In this case, firstly, the acquisition unit 184 acquires the teacher's voice information b2 from the communication unit 160. The acquisition unit 184 provides the teacher's voice information b2 to the output control unit 186. The output control unit 186 uses the teacher's voice information b2 to output the teacher's playing sound to the speaker 140.
[0079] The output control unit 186 can output the student's playing tone to the speaker 140 based on the student's voice information a2. In this case, firstly, the acquisition unit 184 acquires the student's voice information a2 from the microphone 120. The acquisition unit 184 provides the student's voice information a2 to the output control unit 186. The output control unit 186 uses the student's voice information a2 to output the student's playing tone to the speaker 140.
[0080] The output control unit 186 can alternately output the teacher's playing tone and the student's playing tone to the speaker 140 based on the teacher's voice information b2 and the student's voice information a2. In this case, the acquisition unit 184 acquires each of the teacher's voice information b2 and the student's voice information a2 in the manner described above. The acquisition unit 184 provides the teacher's voice information b2 and the student's voice information a2 to the output control unit 186. The output control unit 186, based on the teacher's voice information b2 and the student's voice information a2, alternately outputs the teacher's playing tone and the student's playing tone to the speaker 140.
[0081] A3: Teacher Guidance System 200
[0082] The teacher guidance system 200 differs from the student training system 100 in that it is staffed by teachers 200B, not students 100B. The structure of the teacher guidance system 200 is the same as that of the student training system 100, as described above.
[0083] The main explanation of the structure of the teacher guidance system 200 is achieved by making the following substitutions in the explanation of the aforementioned student training system 100: "Instrument 100A" is replaced with "Instrument 200A". "Student 100B" is replaced with "Teacher 200B". "Student performance information a" is replaced with "Teacher performance information b". "Student image information a1" is replaced with "Teacher image information b1". "Student finger information a11" is replaced with "Teacher finger information b11". "Student foot information a12" is replaced with "Teacher foot information b12". "Student full body information a13" is replaced with "Teacher full body information b13". "Student mouth information a14" is replaced with "Teacher mouth information b14". "Student upper body information a15" is replaced with "Teacher upper body information b15". "Student voice information a2" is replaced with "Teacher voice information b2". "Student instrument information c1, c2" are replaced with "Teacher instrument information d1, d2". "Teacher performance information b" is replaced with "Student performance information a". "Teacher image information b1" is renamed "student image information a1". "Teacher voice information b2" is renamed "student voice information a2". Therefore, a detailed description of the structure of the teacher guidance system 200 is omitted.
[0084] A4: Corresponding Table Ta
[0085] Figure 3 This is a diagram representing an example of the correspondence table Ta. Correspondence table Ta shows the correspondence between types of musical instruments and body parts (areas of focus). The column for "types of musical instruments" in correspondence table Ta indicates the type of instrument being trained. Correspondence table Ta shows "piano" and "flute" as musical instrument types. The column for "body parts (areas of focus)" in correspondence table Ta shows the body parts of the performer required to represent the instrument in training, as indicated by the image for that instrument.
[0086] In piano training, students face the piano in their preferred posture, pressing the keys with their fingers and operating the pedals with their feet. Teachers focus on the student's fingers, feet, and overall body (e.g., posture) to guide them. For example, a teacher focuses on the finger movements of the student's hands to guide the passage of a piece. A teacher focuses on the student's feet to guide the operation of the pedals. A teacher focuses on the positional relationship between the student's fingers and the keys to guide correct keystrokes. A teacher focuses on the student's overall posture to guide their playing. Teachers guide students by demonstrating at least one of their own hand's fingers, feet, or overall body posture. Therefore, in the correspondence table Ta, the instrument type "piano" is associated with the body parts "fingers of the hands, feet, and overall body."
[0087] In flute training, students position the flute near their upper body, blowing air into it through their mouths and manipulating the keys with their fingers. Teachers focus on the student's mouth and upper body to guide them (e.g., student posture, the angle between the student and the flute, and finger placement). For example, a teacher focuses on the student's mouth to guide lip shape during playing. A teacher focuses on the student's upper body to guide the positional relationship between the student and the flute. Teachers guide students by showing them at least one aspect of their own body, either their mouth or their upper body. Therefore, in the correspondence table Ta, the instrument type "flute" is associated with the body parts "mouth and upper body."
[0088] A5: Student Training System 100 Actions
[0089] Figure 4 This diagram illustrates the action of sending student performance information a to the student training system 100. Furthermore, the storage device 170 stores object information representing the subjects photographed by each of the cameras 111-115.
[0090] Student 100B plays instrument 100A in order for student training system 100 to determine the type of instrument 100A. In step S101, microphone 120 generates student voice information a2 based on the sound output from instrument 100A.
[0091] Next, in step S102, the determination unit 181 uses the student voice information a2 to determine the student instrument information c2 representing the type of instrument 100A.
[0092] In step S102, the determination unit 181 first inputs the student voice information a2 into the trained model 182. Then, the determination unit 181 determines the information output by the trained model 182 corresponding to the input of the student voice information a2 as the student instrument information c2.
[0093] Next, in step S103, the decision unit 183 determines the part of interest based on the student's instrument information c2 from the body of the performer, i.e., student 100B.
[0094] In step S103, the decision unit 183 determines the body parts in the correspondence table Ta that correspond to the type of instrument represented by the student instrument information c2 as the parts of interest. For example, if the student instrument information c2 represents a piano, the decision unit 183 determines each finger of the student 100B's hand, the student 100B's two feet, and the student 100B's whole body as the parts of interest of the student 100B.
[0095] Furthermore, when the operation unit 150 receives student instrument information c1 indicating the type of instrument 100A from a user such as student 100B, in step S103, the decision unit 183 may also determine the body part of student 100B to focus on based on the student instrument information c1.
[0096] Next, in step S104, the acquisition unit 184 determines the camera (hereinafter referred to as "the camera used") from cameras 111 to 115 for taking pictures of student 100B based on the area of interest.
[0097] In step S104, the acquisition unit 184 determines which of the cameras 111 to 115 that is shooting the area of interest by referring to the shooting object information of each of the cameras 111 to 115.
[0098] Next, in step S105, the acquisition unit 184 acquires the information generated by the camera as object image information.
[0099] Next, in step S106, the acquisition unit 184 generates student image information a1 by using object image information.
[0100] For example, when cameras 114 and 115 are each using cameras, the acquisition unit 184 generates student image information a1, which includes student mouth information a14 generated by camera 114 and student upper body information a15 generated by camera 115. Figure 5 This is a diagram representing an example of a student image G3 shown by student image information a1. Student image G3 comprises each of image G1 represented by student mouth information a14 and image G2 represented by student upper body information a15.
[0101] Next, in Figure 4 In step S107, the sending unit 185 sends the student performance information a, including student image information a1 and student voice information a2, from the communication unit 160 to the teacher guidance system 200.
[0102] The teacher guidance system 200 also sends the teacher performance information b to the student training system 100 by performing the same actions as the student training system 100.
[0103] Figure 6 This diagram illustrates the actions of the student training system 100 in outputting teacher images and teacher playing sounds based on teacher performance information b.
[0104] In step S201, the communication unit 160 receives teacher performance information b. Teacher performance information b includes teacher image information b1 and teacher voice information b2.
[0105] Next, in step S202, the output control unit 186 displays the teacher image based on the teacher image information b1 on the display unit 130.
[0106] Next, in step S203, the output control unit 186 outputs the teacher's playing tone based on the teacher's voice information b2 from the speaker 140. Furthermore, the timing of executing step S203 can be earlier than the timing of executing step S202.
[0107] The teacher guidance system 200 also displays the student image based on the student image information a1 by performing the same actions as the student training system 100, and outputs the student playing sound based on the student voice information a2.
[0108] According to this embodiment, the image of the performer (student or teacher) required for training in playing an instrument can be determined corresponding to the type of instrument. Furthermore, this embodiment can transmit the image of the performer required for training to a sending target. Therefore, even if the teacher 200B is in a different room than the room where the student 100B plays the instrument 100A, they can still observe the image of the student 100B required for instructing the student in playing the instrument 100A. Even if the student 100B is in a different room than the room where the teacher 200B plays the instrument 200A, they can still see the image of the performance model of the instrument 200A, i.e., the performance by the teacher 200B.
[0109] The decision unit 183 of the student training system 100 can replace the student's instrument information c1 or c2 and use the teacher's instrument information d1 or d2 to determine the focus area. For example, the communication unit 160 of the teacher guidance system 200 sends the teacher's instrument information d1 or d2 to the student training system 100. The decision unit 183 of the student training system 100 obtains the teacher's instrument information d1 or d2 via the communication unit 160 of the student training system 100. In this case, the determination unit 181 and the trained model 182 can be omitted in the student training system 100.
[0110] The decision unit 183 of the teacher guidance system 200 can replace the teacher's instrument information d1 or d2 and use the student's instrument information c1 or c2 to determine the focus area. For example, the communication unit 160 of the student training system 100 sends the student's instrument information c1 or c2 to the teacher guidance system 200. The decision unit 183 of the teacher guidance system 200 obtains the student's instrument information c1 or c2 via the communication unit 160 of the teacher guidance system 200. In this case, the determination unit 181 and the trained model 182 can be omitted in the teacher guidance system 200.
[0111] B: Variation Example
[0112] The following shows variations of the above embodiments. Two or more embodiments selected from the following methods may be appropriately combined within the scope of not contradicting each other.
[0113] B1: First Variation
[0114] In the above embodiments, the types of musical instruments are not limited to piano and flute, but may include two or more. For example, the types of musical instruments may include two or more of the following: piano, flute, electronic keyboard (electone) (registered trademark), violin, guitar, saxophone, and drums. Piano, flute, electronic keyboard, violin, guitar, saxophone, and drums are examples of musical instruments.
[0115] Figure 7 This is a diagram showing an example of the correspondence table Ta1 used when the types of musical instruments are piano, flute, keyboard, violin, guitar, saxophone and drums.
[0116] For example, in electronic keyboard training, students operate the keyboard in the following ways: Students face the keyboard in a posture they prefer. Students operate the upper and lower keyboards using their fingers. Students operate the pedals using their feet (toes and heels). Students operate the expression pedals using their right foot.
[0117] In electronic keyboard training, teachers focus on the student's fingers, feet (especially the right foot), and overall posture (e.g., body position) to guide the student. Teachers instruct students by demonstrating at least one of these elements: the teacher's fingers, the teacher's feet (especially the right foot), or the teacher's overall posture.
[0118] Therefore, in the correspondence table Ta1, the type of musical instrument "electronic keyboard" is associated with the body parts "the fingers of the hand, both feet, the right foot and the whole body".
[0119] In violin training, students operate the violin as follows: They support the violin with their chin, shoulder, and left hand, holding the bow with their right hand. They press the violin strings with their left hand fingers. The student plays the violin while simultaneously changing the angles of the violin relative to themselves, the bow relative to the violin, and the positions of their left and right fingers relative to the strings.
[0120] In violin training, teachers focus on the student's upper body (positional relationship between the student and the violin) and the student's left hand to guide the student. Teachers guide the student by showing them at least one of their upper body (positional relationship between the teacher and the violin) and their left hand.
[0121] Therefore, in the correspondence table Ta1, the type of musical instrument "violin" is associated with the body parts "upper body and left hand".
[0122] In guitar training, students press the strings with their left hand and pluck them with their right. The teacher focuses on both the student's right and left hands to guide them. The teacher instructs the student by demonstrating at least one of their own hands (either right or left).
[0123] Therefore, in the correspondence table Ta1, the type of musical instrument "guitar" is associated with the body parts "left hand and right hand".
[0124] In saxophone training, students position the saxophone near their upper body, holding the reed in their mouth and manipulating the keys and levers with their fingers. The instructor focuses on the student's mouth and upper body to guide them (e.g., how the reed is held, the mouth's contact with the mouthpiece, the student's posture, the angle between the student and the saxophone, and the student's finger movements). The instructor guides the student by demonstrating at least one of their own mouth or upper body posture.
[0125] Therefore, in the correspondence table Ta1, the type of musical instrument "saxophone" is associated with the body parts "mouth and upper body".
[0126] In drum training, students use their hands and feet to play the drums. The teacher focuses on the students' hands, feet, and entire body to guide them (e.g., instructing on the timing of hand and foot movements). The teacher guides the students by demonstrating the movements of their own hands and feet and their entire body.
[0127] Therefore, in the correspondence table Ta1, the type of musical instrument "drum" is associated with the body parts "hands, feet and whole body".
[0128] In addition, the student training system 100 and the teacher guidance system 200 each have a camera for taking pictures of the body parts shown in the corresponding table Ta1.
[0129] According to the first variation, it is possible to switch the image of the performer required for training in playing the instrument, which is different from the type of instrument such as the piano and the flute, and to transmit the image to the target.
[0130] B2: Second Variation
[0131] In the above-described embodiments and the first variation, the decision unit 183 can determine the part of the performer's body that is of interest without using either of the correspondence tables Ta and Ta1. For example, the decision unit 183 can determine the part of the performer's body that is of interest by using a trained model that has learned the relationship between the type of instrument and body parts.
[0132] Figure 8 This is a diagram representing a student training system 101 containing a trained model 187 that has learned the relationship between the types of musical instruments and body parts.
[0133] The trained model 187 is composed of neural networks. For example, the trained model 187 is composed of deep neural networks. The trained model 187 can be composed of convolutional neural networks, for example. The trained model 187 can also be composed of a combination of multiple neural networks. The trained model 187 can have additional features such as self-attention. The trained model 187 can also be composed of hidden Markov models or support vector machines instead of neural networks.
[0134] The processing device 180 operates as a trained model 187 based on a prescribed computational procedure for determining the output Y1 from the input X1 and a combination of multiple variables K2. The multiple variables K2 are determined through machine learning using multiple training data T2. The training data T2 contains a combination of information representing the type of musical instrument (input data for training) and information representing body parts (output data for training). The information representing the type of musical instrument in the training data T2 includes, for example, information representing... Figure 7The types of musical instruments shown. Information representing body parts in training data T2, for example, represents... Figure 7 The body parts shown. In training data T2, the combination of information representing the type of musical instrument and information representing the body parts... Figure 7 The types of musical instruments shown correspond to the combinations of body parts. Therefore, the information representing body parts in training data T2 indicates the parts of the body of a player using the type of musical instrument shown in the training input data of training data T2 that are focused on by the teacher of that instrument (focused parts).
[0135] The decision unit 183 inputs the student's instrument information c1 or c2 into the trained model 187. Then, the decision unit 183 determines the part of the performer's body to focus on based on the part represented by the information output by the trained model 187 corresponding to the input of the student's instrument information c1 or c2.
[0136] Furthermore, multiple training data sets T2 may only have input data for training and no output data. In this case, multiple variables K2 are determined through machine learning by grouping the multiple training data sets T2 into multiple clusters based on their similarity. Moreover, in the trained model 187, for each cluster, information representing body parts (areas of interest) suitable for that cluster is associated with the individual. The trained model 187 determines the clusters corresponding to the input X1 and generates information corresponding to the determined clusters as the output Y1.
[0137] According to the second variation, the decision unit 183 can determine the body part of the performer without using either of the correspondence tables Ta and Ta1.
[0138] B3: Third Variation
[0139] In the above-described embodiments and the first to second variations, when the part of interest is part of the body (e.g., the two feet), the acquisition unit 184 can acquire image information representing the part of interest from full-body image information representing the performer's whole body.
[0140] Figure 9 This diagram illustrates an example of the relationship between image G11, which is shown as full-body image information, and image G12, which represents a part of the performer's body. Image G12 shows the performer's two feet as part of the performer's body. Image G12, as part of the performer's body, can also show parts of the performer that are different from the two feet.
[0141] The position of image G12 in image G11 is preset in pixels for each type of musical instrument. Therefore, the position of image G12 in image G11 can be changed corresponding to the type of musical instrument. The acquisition unit 184 acquires the preset portion corresponding to the type shown in the student instrument information c1 or c2 from the full-body image information representing image G11, and uses it as image information representing image G12.
[0142] The position of image G12 in image G11 does not need to be preset for each type of instrument. For example, the acquisition unit 184 first determines the part representing the area of interest from image G11 using image recognition technology. Then, the acquisition unit 184 acquires the part representing the area of interest from the whole-body image information.
[0143] The acquisition unit 184 can determine the position of image G12 in image G11 using image recognition technology, specifically for instruments such as flutes, violins, guitars, and saxophones where the positional relationship between the player and the instrument is easily variable. In this case, compared to a structure where the position of image G12 in image G11 is fixed, it is easier to acquire image information representing the area of interest.
[0144] For instruments such as pianos, keyboards, and drums, where the positional relationship between the player and the instrument is not easily changed, the acquisition unit 184 acquires a pre-defined portion corresponding to the type shown in the student instrument information c1 or c2 from the full-body image information, as image information representing image G12. In this case, the acquisition unit 184 can easily determine the position of image G12 without using image recognition technology.
[0145] According to the third variation, the number of cameras can be reduced compared to a structure that corresponds one-to-one with multiple body parts (parts of interest).
[0146] B4: Fourth Variation
[0147] In the above embodiments and the first to third modifications, the target of the teacher's performance information b is not limited to the student training system 100, but may be an electronic device used by the guardian of student 100B (e.g., the parent of student 100B). The electronic device may be, for example, a smartphone, tablet, or laptop computer. The target of the teacher's performance information b may also be both the student training system 100 and the electronic device used by the guardian of student 100B.
[0148] According to the fourth variation, the guardian of student 100B is able to guide student 100B while watching the teacher's video.
[0149] B5: Fifth Variation
[0150] In the above embodiments and the first to fourth modifications, the association information associated with the type of musical instrument (musical instrument-related association information) is not limited to student voice information a2. The association information may be image information representing musical instrument 100A (image information showing an image representing musical instrument 100A).
[0151] In the structure where the image information representing the musical instrument 100A is used as the associated information, the determination unit 181 determines the instrument information (student instrument information c2) by using a trained model that has learned the relationship between the information representing the musical instrument in an image and the information representing the type of musical instrument shown in the image by that information.
[0152] Figure 10 This is a diagram representing a student training system 102 containing a trained model 188, which learns the relationship between information representing musical instruments as images and information representing the types of musical instruments. The trained model 188 is an example of a first trained model.
[0153] The trained model 188 is composed of neural networks. For example, the trained model 188 is composed of deep neural networks. The trained model 188 can be composed of convolutional neural networks, for example. The trained model 188 can be composed of a combination of various neural networks, for example. The trained model 188 can have additional features such as self-attention. The trained model 188 can also be composed of hidden Markov models or support vector machines instead of neural networks.
[0154] The processing device 180 operates as a trained model 188 based on a prescribed computational procedure for determining the output Y1 from the input X1 and a combination of multiple variables K3. The multiple variables K3 are determined through machine learning using multiple training data T3. The training data T3 contains a combination of information representing musical instruments as images (training input data) and information representing the types of musical instruments shown in images from the training input data (training output data).
[0155] The determination unit 181 inputs the image information representing the musical instrument 100A into the trained model 188. Then, the determination unit 181 determines the student instrument information c2 as the information output by the trained model 188 corresponding to the input of the image information representing the musical instrument 100A.
[0156] Furthermore, multiple training data sets T3 may only have input data for training and no output data. In this case, multiple variables K3 are determined through machine learning by grouping the multiple training data sets T3 into multiple clusters based on their similarity. Moreover, in the trained model 188, for each cluster, the information representing the "type of musical instrument" suitable for that cluster is associated with people. The trained model 188 determines the cluster corresponding to the input X1 and generates information corresponding to the determined cluster as the output Y1.
[0157] According to the fifth variation, the image information representing the musical instrument 100A can be used as associated information representing the musical instrument.
[0158] B6: Sixth Variation
[0159] In the fifth variation, the determining unit 181 may use information generated by any of the cameras 111 to 115 (hereinafter referred to as "camera image information") as image information representing the musical instrument 100A.
[0160] In addition to instrument 100A and student 100B, camera image information sometimes represents a different type of instrument than instrument 100A. When camera image information representing multiple types of instruments is input into the trained model 188, the information output from the trained model 188 may not represent the type of instrument 100A. Therefore, the determination unit 181 first extracts only a portion of the image information representing instrument 100A from the camera image information. Then, the determination unit 181 inputs this portion of the image information into the trained model 188.
[0161] For example, the determination unit 181 first identifies a person (student 100B) from the image shown by the camera image information. People are easier to identify than musical instruments. Next, the determination unit 181 identifies the object with the shortest distance to the person (student 100B) in the image shown by the camera image information as musical instrument 100A. Then, the determination unit 181 extracts partial image information from the camera image information, representing only the object identified as musical instrument 100A. Finally, the determination unit 181 inputs this partial image information into the trained model 188.
[0162] According to the sixth variation, camera image information generated by any of the cameras 111-115 can be used as association information related to the type of musical instrument. Therefore, any of the cameras 111-115 can be used as a device for generating association information.
[0163] B7: 7th Variation
[0164] In the above embodiments and the first to sixth modifications, the association information associated with the type of musical instrument can be sheet music information representing the score corresponding to the type of musical instrument. A score corresponding to the type of musical instrument (e.g., guitar) is an example of a score corresponding to a musical instrument (e.g., guitar). The score is also called a musical notation. The sheet music information is generated, for example, by a camera that captures the score. When the sheet music information is generated by any of cameras 111 to 115, any of cameras 111 to 115 can be used as a device for generating the sheet music information.
[0165] The determination unit 181 determines the student's instrument information c2 based on the musical piece shown in the score information. For example, the determination unit 181 determines the student's instrument information c2 based on the type of score.
[0166] When the score represented by the musical notation is TAB notation, the determination section 181 determines the student instrument information c2, which is shown as the type of instrument, specifically the guitar. TAB notation is as follows: Figure 11 The guitar strings are represented by six parallel lines. Therefore, when the score represented by the musical notation information is composed of six parallel lines, the determination unit 181 determines that the score represented by the musical notation information is TAB (tab).
[0167] When the score represented by the musical notation information is a guitar chord chart, the determination section 181 determines the student instrument information c2, which shows the guitar as the type of instrument. Guitar chord charts are as follows: Figure 12 As shown, the guitar chords are arranged along the lyrics. Therefore, when the score represented by the score information is a guitar chord, the determination unit 181 determines that the score represented by the score information is a guitar chord chart.
[0168] When the score represented by the musical notation information is a drum score, the determination section 181 determines the student instrument information c2, which shows a drum as the type of instrument. The drum score is as follows: Figure 13 As shown, the notation corresponds to each instrument in the drum set. Therefore, when the score represented by the score information is represented by notation corresponding to each instrument in the drum set, the determination unit 181 determines that the score represented by the score information is a drum score.
[0169] When the score represented by the musical notation is a duet score, the determination section 181 determines the student instrument information c2, which is displayed as a type of instrument, specifically a piano. For example, a duet score... Figure 14 As shown, it is represented by the symbol 14a indicating a duet performance. Therefore, when the score represented by the musical notation is 14a indicating a duet performance, the determination unit 181 determines that the score represented by the musical notation is a duet score.
[0170] The determination unit 181 can determine the student's instrument information c2 based on the arrangement of notes in the musical score represented by the musical score information. For example... Figure 15 As shown, when the score represented by the musical notation information is represented by notes 15a indicating the simultaneous sounding of multiple sounds, the determination unit 181 determines that the score represented by the musical notation information is a score for a keyboard instrument (e.g., a piano or an electronic keyboard). In this case, the determination unit 181 determines the student instrument information c2, which indicates a piano or an electronic keyboard as the type of instrument.
[0171] When the musical score, represented by the musical notation information, displays symbols that identify the type of instrument (e.g., a string representing the instrument name, or a symbol related to the type of instrument), the determination unit 181 can determine the information indicating the type of instrument identified by those symbols as student instrument information c2. For example, if the storage device 170 stores an instrument table showing the correspondence between information indicating the type of instrument and symbols related to the type of instrument, the determination unit 181, by referring to the instrument table, determines the information corresponding to the symbols shown in the musical score (information indicating the type of instrument) as student instrument information c2. In this case, the symbols related to the type of instrument are an example of association information. The instrument table is an example of a table showing the correspondence between information associated with the type of instrument and information indicating the type of instrument. The information associated with the type of instrument is an example of reference association information related to the instrument. The information indicating the type of instrument is an example of reference instrument information indicating the instrument.
[0172] The sheet music information is not limited to information generated by a camera that captures the sheet music; it can also be so-called electronic sheet music. When the electronic sheet music has category data indicating the type of instrument, the determination unit 181 can determine the category data as student instrument information c2.
[0173] According to the seventh variation, musical score information can be used as association information related to the type of musical instrument.
[0174] B8: Eighth Variation
[0175] In the above embodiments and the first to seventh modifications, when the progress information representing the progress of student 100B shows the type of musical instrument, the progress information can be used as association information associated with the type of musical instrument. If the progress information shows a combination of the type of musical instrument and the training progress of that type of instrument, it can also show the progress of any one of student 100B, teacher 200B, the student room of the music classroom, and the teacher room of the music classroom. A combination of the type of musical instrument (e.g., piano) and the training progress of that type of musical instrument (e.g., piano) is an example of a combination of the type of musical instrument (e.g., piano) and the training progress of that type of musical instrument (e.g., piano).
[0176] Figure 16 This is a diagram showing an example of progress as indicated by progress information. In Figure 16 In this document, the type of instrument (piano, flute, or violin) for each time period of the training (course) is displayed. The determination unit 181 first uses progress information to determine the training time period including the current moment. Next, the determination unit 181 determines the type of instrument for the trainee corresponding to the determined time period. Then, the determination unit 181 defines the information representing the determined type of instrument for the trainee as student instrument information c2.
[0177] Figure 17 These are other examples of graphs showing the progress information. Figure 17 The system displays the type of instrument for each training date. The determination unit 181 first uses progress information to determine the type of instrument for the training subject corresponding to the current date. Then, the determination unit 181 defines the information representing the determined type of instrument for the training subject as student instrument information c2.
[0178] According to the 8th variation, progress information can also be used as association information related to the type of musical instrument.
[0179] B9: 9th Variation
[0180] In the above embodiments and the 1st to 8th variations, the decision unit 183 can determine the part of interest based on the student's instrument information c1 or c2 and the student's voice information a2.
[0181] In piano training, teacher 200B focuses primarily on the movement of each finger of student 100B's hand, especially during fast-paced sections of the piece used in instruction. Therefore, during piano training, when the student's playing notes, as indicated by student voice information a2, represent the section immediately preceding the fast-paced part of the piece, the decision unit 183 only identifies the fingers of the hand as the area of focus. However, if the student's playing notes, as indicated by student voice information a2, represent the section immediately following the fast-paced part, the decision unit 183 identifies the fingers of the performer's hand, the performer's feet, and the performer's entire body as areas of focus.
[0182] In this case, the storage device 170 stores the score data representing the immediately preceding portion and the immediately following portion of the fast-paced section. The decision unit 183 generates note data representing the student's playing note based on the student's voice information a2. If the note data matches the immediately preceding portion of the fast-paced section in the score data, the decision unit 183 determines that the student's playing note represents the immediately preceding portion of the fast-paced section. Furthermore, if the consistency between the note data and the immediately preceding portion is at least a first threshold (e.g., 90%), the decision unit 183 can determine that the student's playing note represents the immediately preceding portion. The first threshold is not limited to 90% and can be appropriately changed. If the note data matches the immediately following portion of the fast-paced section in the score data, the decision unit 183 determines that the student's playing note represents the immediately following portion of the fast-paced section. Furthermore, if the consistency between the note data and the immediately following portion is at least a second threshold (e.g., 90%), the decision unit 183 can determine that the student's playing note represents the immediately following portion. The second threshold is not limited to 90% and can be changed appropriately.
[0183] Regarding the piano, the timing for changing the point of focus is not limited to the moment when the student plays the note immediately preceding or following a fast section of the melody; appropriate changes are permissible. The shifting of the point of focus on the piano is not limited to the aforementioned shifts; appropriate changes are permissible.
[0184] For instruments different from the piano, the decision unit 183 can also determine the focus area based on the student's instrument information c1 or c2 and the student's voice information a2.
[0185] For example, in flute training, teacher 200B focuses primarily on student 100B's lip shape at the beginning of a piece. Therefore, in flute training, if the student's playing note, indicated by student voice information a2, represents the beginning of a piece, the decision unit 183 only determines the lip as the area of focus. However, if the student's playing note, indicated by student voice information a2, represents the immediately following section of the beginning of a piece, the decision unit 183 determines the performer's lip and upper body as the areas of focus.
[0186] In this case, the storage device 170 stores the score data representing the beginning of the piece and the immediately following portion of the beginning. The decision unit 183 generates note data representing the student's playing notes based on the student's voice information a2. If the note data matches the beginning of the piece in the score data, the decision unit 183 determines that the student's playing notes represent the beginning of the piece. Furthermore, the decision unit 183 can also determine that the student's playing notes represent the beginning of the piece if the consistency between the note data and the beginning is a third threshold (e.g., 90%). The third threshold is not limited to 90% and can be appropriately changed. If the note data matches the immediately following portion of the beginning of the piece in the score data, the decision unit 183 determines that the student's playing notes represent the immediately following portion of the beginning of the piece. Furthermore, the decision unit 183 can also determine that the student's playing notes represent the immediately following portion of the beginning of the piece if the consistency between the note data and the immediately following portion of the beginning is a fourth threshold (e.g., 90%) or higher. The fourth threshold is not limited to 90% and can be changed appropriately.
[0187] Regarding the flute, the timing for changing the point of focus is not limited to when the student's playing note indicates the beginning of a piece, or when the student's playing note indicates the immediately following part of the beginning of a piece; appropriate changes are permissible. Regarding the flute, the shift in point of focus is not limited to the aforementioned shifts; appropriate changes are permissible.
[0188] The decision unit 183 can use a trained model to determine the focus areas. This trained model learns the relationship between information including instrument type information (representing the type of instrument) and instrument sound information (representing the sound output from the instrument of the type indicated by the instrument type information), and information representing the focus areas of the performer's body. Instrument type information is an example of learning instrument information. Instrument sound information is an example of learning sound information (representing the sound output from the instrument of the type indicated by the learning instrument information). Information including instrument type information and instrument sound information is an example of learning input information. Information representing focus areas in the performer's body refers to the parts of the performer's body that are focused on by the teacher of that instrument when the instrument of the type indicated by the instrument type information outputs the sound shown by the instrument sound information. Information representing focus areas in the performer's body is an example of learning output information representing the parts of the performer's body that are focused on when playing the instrument of the type indicated by the learning instrument information, i.e., the instrument that outputs the sound shown by the learning sound information.
[0189] Figure 18This diagram represents a student training system 103 containing a trained model 189, which learns the correspondence between combinations of instrument type information and instrument sound information and information representing parts of interest. The trained model 189 is an example of a second trained model.
[0190] The trained model 189 is composed of neural networks. For example, the trained model 189 is composed of deep neural networks. The trained model 189 can be composed of convolutional neural networks, for example. The trained model 189 can be composed of a combination of various neural networks, for example. The trained model 189 can have additional features such as self-attention. The trained model 189 can also be composed of hidden Markov models or support vector machines instead of neural networks.
[0191] The processing device 180 operates as a trained model 189 based on a prescribed computational procedure for determining the output Y1 according to the input X1 and a combination of multiple variables K4. The multiple variables K4 are determined through machine learning using multiple training data T4. The training data T4 includes a combination of instrument type information and instrument sound information (input data for training) and attention part information representing the body parts of interest (output data for training). The attention part information refers to the parts of the player's body that are of interest to the teacher of that instrument when the player outputs a sound from an instrument of the type indicated by the instrument type information and the sound indicated by the instrument sound information.
[0192] Instrument sound information is used for each measure of the music being played. However, instrument sound information is not limited to each measure; for example, it can also be used for every four measures. The focus area information (training output data) indicates the body focus of the player, as shown by the instrument type information, when playing the measure immediately following the measure indicated by the instrument sound information in the training input data.
[0193] For each measure, the decision unit 183 inputs the student instrument information c1 or c2 and the student voice information a2 into the trained model 189. Furthermore, the decision unit 183 generates note data representing the student's playing notes based on the student voice information a2, and determines one measure of the student voice information a2 based on the arrangement of this note data. Next, the decision unit 183 determines the parts represented by the information output by the trained model 189 corresponding to the input of the student instrument information c1 or c2 and the student voice information a2 as parts of interest.
[0194] Furthermore, multiple training data sets T4 may only have input data for training and no output data. In this case, multiple variables K4 are determined through machine learning by grouping the multiple training data sets T4 into multiple clusters based on their similarity. Moreover, in the trained model 189, for each cluster, information representing "body parts (areas of interest)" suitable for that cluster is associated with the individual. The trained model 189 determines the clusters corresponding to the input X1 and generates information corresponding to the determined clusters as the output Y1.
[0195] According to the 9th variation, the image required for instruction can be determined based on the playing tone, indicating the type of instrument shown by the student instrument information c1 or c2.
[0196] B10: 10th Variation
[0197] In the ninth variation, the student training system 100 and the teacher guidance system 200 can be used for training in playing a single type of musical instrument (e.g., a piano). The single type of instrument is not limited to the piano and can be appropriately varied. In this case, the decision unit 183 determines the body part of the performer to focus on based on the student's voice information a2. For example, for each measure, the decision unit 183 inputs the student's voice information a2 into a model (trained model) that has learned the combination of instrument sound information (training input data) and focus area information (training output data) representing the body part to focus on. In this case, the focus area information (training output data) represents the body part (focus area) of the performer using the instrument whose output is shown by the instrument sound information (training input data) and is focused on by the teacher of that instrument. Then, the decision unit 183 determines the body part represented by the information output by the trained model corresponding to the input of the student's voice information a2 as the focus area. According to the tenth variation, it is possible to determine the image required for instrument guidance based on the played sound.
[0198] B11: 11th Variation
[0199] In the above-described embodiments and the 1st to 10th modifications, the decision unit 183 can determine the body part to focus on based on the correspondence between the student's voice information a2 and the score information representing the musical score. The correspondence between the student's voice information a2 and the score information is an example of the relationship between the student's voice information a2 and the score information.
[0200] The decision unit 183 determines the consistency between the sound shown by the student's voice information a2 and the sound shown by the musical score information.
[0201] For example, in piano training, when a student's playing is chaotic, the teacher 200B mostly focuses on the movement of the student's fingers. In piano training, when the consistency is below a threshold, the decision unit 183 only identifies the fingers of the hand as the focus area. When the consistency is above the threshold, the decision unit 183 identifies the player's fingers, the player's feet, and the player's entire body as the focus areas.
[0202] In flute training, when students' playing is disorganized, teacher 200B primarily focuses on the student's lip and upper body. In flute training, when the consistency is below a threshold, decision unit 183 designates the lip and upper body as the areas of focus. When the consistency is above the threshold, decision unit 183 designates the performer's upper body as the area of focus.
[0203] The decision unit 183 can use a trained model to determine the parts of interest. This trained model learns the relationship between information including output sound information representing the sound output from the instrument and score relation information representing the score, and information representing parts of the performer's body. Output sound information is an example of learning sound information representing the sound output from the instrument. Score relation information is an example of learning score information representing the score. Information including output sound information and score relation information is an example of learning input information. Information representing parts of the performer's body represents the parts of the performer's body that are in focus (parts of interest) when the sound represented by the output sound information is output from the instrument according to the score shown by the score relation information. Information representing parts of the performer's body is an example of learning output information representing the parts of the performer's body that are in focus when playing an instrument that outputs the sound represented by the learning sound information according to the score shown by the learning score information.
[0204] Figure 19 This diagram represents a student training system 104 containing a trained model 190, which learns the relationship between sets of output sound information and score relation information and information representing the body parts of the performer's attention. The trained model 190 is an example of a third trained model.
[0205] The trained model 190 is composed of neural networks. For example, the trained model 187 is composed of deep neural networks. The trained model 190 may be composed of convolutional neural networks, for example. The trained model 190 may be composed of a combination of various neural networks, for example. The trained model 190 may have additional features such as self-attention. The trained model 190 may not be composed of neural networks but of hidden Markov models or support vector machines.
[0206] The processing device 180 operates as a trained model 190 based on a prescribed computational procedure for determining the output Y1 according to the input X1 and a combination of multiple variables K5. The multiple variables K5 are determined through machine learning using multiple training data T5. The training data T5 is a combination of a set of output sound information and musical score relation information (input data for training) and attention part information representing the body's focus areas (output data for training). The attention part information (output data for training) represents the body part of the performer whose sound, as indicated by the output sound information, is output from the instrument according to the musical score relation information and is focused on by the teacher of that instrument.
[0207] The output sound information is used for each measure of the music being played. However, the output sound information is not limited to each measure; for example, it can also be used for every four measures. The focus information (output data for training) indicates the focus in the measure immediately following the measure shown by the output sound information within the training input data.
[0208] For each measure, the decision unit 183 inputs a set of student voice information a2 and musical score information into the trained model 190. The set of student voice information a2 and musical score information is an example of input information containing both voice information and musical score information. Furthermore, the decision unit 183 generates note data representing the student's playing notes based on the student voice information a2, and determines one measure of student voice information a2 based on the arrangement of this note data. Next, the decision unit 183 determines the parts represented by the information output by the trained model 190 corresponding to the input of the set of student voice information a2 and musical score information as parts of interest.
[0209] Furthermore, multiple training data sets T5 may only have input data for training and no output data. In this case, multiple variables K5 are determined through machine learning by grouping the multiple training data sets T5 into multiple clusters based on their similarity. Moreover, in the trained model 190, for each cluster, information representing "body parts (areas of interest)" suitable for that cluster is associated with the individual. The trained model 190 determines the clusters corresponding to the input X1 and generates information corresponding to the determined clusters as the output Y1.
[0210] According to the 11th variation, the images required for instruction can be switched in accordance with the correspondence between the student's playing notes and the musical score.
[0211] B12: 12th Variation
[0212] In the above-described embodiments and the 1st to 11th modifications, the decision unit 183 of the student training system 100 can further determine the body parts to focus on based on the recorded information. The recorded information represents precautions to be recorded for the performance. The precautions can be represented by characters or symbols. The recorded information is an example of attention information representing precautions to be recorded for the performance.
[0213] For example, the decision unit 183 of the student training system 100 determines the areas of focus based on the teacher's input information. The teacher's input information represents notes entered into the score by the teacher 200B. This information is generated by any of the cameras 111-115 of the teacher guidance system 200, which photographs the score containing the notes. The communication unit 160 of the teacher guidance system 200 sends the teacher's input information to the student training system 100. The decision unit 183 of the student training system 100 receives the teacher's input information via the communication unit 160. The storage device 170 of the student training system 100 stores a table of notes indicating the correspondence between notes and body parts. The decision unit 183 of the student training system 100 further determines the body parts in the table of notes that correspond to the notes shown in the teacher's input information as areas of focus.
[0214] The decision unit 183 of the student training system 100 can determine the body parts to focus on based on the position of the notes in the musical score. In this case, the storage device 170 of the student training system 100 stores a position table in advance, which shows the correspondence between the positions in the musical score and the body parts. The decision unit 183 of the student training system 100 further determines the body parts in the position table that correspond to the positions of the notes in the musical score as the body parts to focus on.
[0215] Notes can be written on a different object than the sheet music (e.g., sticky notes, notebooks, or whiteboards).
[0216] According to the 12th variation, additional areas of attention can be added based on the notes noted for the performance.
[0217] B13: 13th Variation
[0218] In the above-described embodiments and the 1st to 12th modifications, the decision unit 183 of the student training system 100 can further determine the body parts to focus on based on performer information related to the performer. Performer information, for example, is the identification information of the teacher 200B.
[0219] In musical instrument training, the areas of focus may differ for each teacher 200B. For example, in piano training, teacher 200B1 may focus on student 100B's right wrist in addition to the fingers, feet, and whole body of the student; or teacher 200B1 may focus on student 100B's left wrist in addition to the fingers, feet, and whole body of the student. Therefore, the decision unit 183 of the student training system 100 further determines the areas of focus based on the teacher 200B's identification information (e.g., identification code).
[0220] Teacher 200B's identification information is input by users such as student 100B from the operation unit 150. Teacher 200B's identification information can be sent from the teacher guidance system 200 to the student training system 100. The storage device 170 of the student training system 100 stores an identification information table showing the correspondence between teacher 200B's identification information and body parts. The decision unit 183 of the student training system 100 further determines the body parts in the identification information table that correspond to teacher 200B's identification information as the areas of interest.
[0221] The performer information is not limited to the teacher 200B's identification information; for example, it could be movement information indicating the teacher 200B's movement. For instance, movement information can be generated by any of the cameras 111-115 of the teacher guidance system 200 capturing images of the teacher 200B. The communication unit 160 of the teacher guidance system 200 sends the movement information to the student training system 100. The decision unit 183 of the student training system 100 receives the movement information via the communication unit 160. The storage device 170 of the student training system 100 stores a movement table showing the correspondence between a person's movement and body parts. The decision unit 183 of the student training system 100 further determines the body parts in the movement table that correspond to the movement shown in the movement information as the focus areas. Therefore, the teacher 200B can specify the focus areas in correspondence with the teacher 200B's movement. The performer information can be the student 100B's identification information or movement information indicating the student 100B's movement. In this situation, the decision unit 183 is able to determine the area of concern in relation to the student 100B.
[0222] According to the 13th variation, it is possible to add body parts of the performer based on performer information related to the performer.
[0223] B14: 14th Variation
[0224] In the above embodiments and the 1st to 13th modifications, the touch panel, i.e., the operation unit 150, may have the following characteristics: Figure 20The user interface shown serves as the interface for receiving student instrument information c1. Touching the piano button 151 indicates input of student instrument information c1, which displays the piano as the instrument type. Touching the flute button 152 indicates input of student instrument information c1, which displays the flute as the instrument type. The user interface for receiving student instrument information c1 is not limited to... Figure 20 The user interface is shown. According to Variation 14, the user can intuitively input student instrument information c1.
[0225] B15: 15th Variation
[0226] In the above-described embodiments and the 1st to 14th modifications, the communication unit 160 of the teacher guidance system 200 can send the teacher's instrument information d1 or d2 to the student training system, and the decision unit 183 of the student training system can determine the area of interest based on the teacher's instrument information d1 or d2. Furthermore, the communication unit 160 of the student training system can send the student's instrument information c1 or c2 to the teacher guidance system, and the decision unit 183 of the teacher guidance system can determine the area of interest based on the student's instrument information c1 or c2. Additionally, the structure of the teacher guidance system 200 can be the same as that of any of the student training systems 101 to 105.
[0227] B16: 16th Variation
[0228] In the above-described embodiments and the 1st to 15th variations, the processing device 180 can generate a trained model 182.
[0229] Figure 21 This is a diagram illustrating the student training system 105 involved in the 16th variation. The student training system 105, in having a learning processing unit 191, is similar to... Figure 19 The student training system 104 shown is different. The learning processing unit 191 is implemented by a processing device 180 that executes machine learning programs. The machine learning programs are stored in a storage device 170.
[0230] Figure 22 This diagram illustrates an example of a learning processing unit 191. The learning processing unit 191 includes a data acquisition unit 192 and a training unit 193. The data acquisition unit 192 acquires multiple training data sets T1. For example, the data acquisition unit 192 acquires the multiple training data sets T1 via an operation unit 150 or a communication unit 160. When the storage device 170 stores the multiple training data sets T1, the data acquisition unit 192 acquires the multiple training data sets T1 from the storage device 170.
[0231] The training unit 193 generates a trained model 182 by performing a process using multiple training data T1 (hereinafter referred to as "learning process"). The learning process is a teacher-guided machine learning process using multiple training data T1. The training unit 193 trains the learning object model 182a using multiple training data T1, thereby transforming the learning object model 182a into the trained model 182.
[0232] The learning object model 182a is generated by a processing device 180 using a set of provisional variables K1 and an operational program. The provisional variables K1 are stored in a storage device 170. The learning object model 182a differs from the trained model 182 in that it uses the provisional variables K1. The learning object model 182a generates information (output data) corresponding to the input information (input data).
[0233] The training unit 193 determines the value of the loss function L, which represents the error between the output data generated by the learning object model 182a and the output data of the training data T1 when the input data of the training data T1 is input to the learning object model 182a. The training unit 193 updates a set of provisional variables K1 in a way that reduces the value of the loss function L. The training unit 193 performs the update process of the provisional variables K1 for each set of training data T1. The variables K1 are determined upon completion of the training performed by the training unit 193. The learning object model 182a trained by the training unit 193, i.e., the trained model 182, outputs statistically reasonable output data for unknown input data.
[0234] Figure 23 This is a diagram illustrating an example of learning processing. For instance, learning processing might begin with a user instruction.
[0235] In step S301, the data acquisition unit 192 acquires the unacquired training data T1 from among the multiple training data T1. Next, in step S302, the training unit 193 uses the training data T1 to train the learning object model 182a. In step S302, the training unit 193 updates the provisional multiple variables K1 in a manner that reduces the value of the loss function L determined using the training data T1. For example, the process of updating the provisional multiple variables K1 corresponding to the value of the loss function L can be performed using the error backpropagation method.
[0236] Next, in step S303, the training unit 193 determines whether the termination condition related to the learning process is met. The termination condition is, for example, that the value of the loss function L is less than a predetermined threshold, or that the change in the value of the loss function L is less than a predetermined threshold. If the termination condition is not met, the process returns to step S301. Therefore, until the termination condition is met, the acquisition of training data T1 and the updating of multiple provisional variables K1 using this training data T1 are repeated. If the termination condition is met, the learning process ends.
[0237] The learning processing unit 191 can be implemented in a processing device different from the processing device 180. The processing device different from the processing device 180 includes at least one computer.
[0238] The data acquisition unit 192 can acquire multiple training data sets different from the multiple training data sets T1, for example, multiple training data sets T2, T3, T4, and T5, and one or more of these four types of training data sets. The training unit 193 trains the learning object model corresponding to the types of training data sets acquired by the data acquisition unit 192. The learning object model corresponding to the multiple training data sets T2 is a learning object model generated by the processing device 180 using provisional multiple variables K2 and an operation program. The learning object model corresponding to the multiple training data sets T3 is a learning object model generated by the processing device 180 using provisional multiple variables K3 and an operation program. The learning object model corresponding to the multiple training data sets T4 is a learning object model generated by the processing device 180 using provisional multiple variables K4 and an operation program. The learning object model corresponding to the multiple training data sets T5 is a learning object model generated by the processing device 180 using provisional multiple variables K5 and an operation program.
[0239] The data acquisition unit 192 can be configured for each type of multiple training data. In this case, each data acquisition unit 192 acquires the corresponding multiple training data.
[0240] The training unit 193 can be set up for each category of multiple training data. In this case, each training unit 193 uses the corresponding multiple training data to train the learning object model corresponding to the corresponding multiple training data.
[0241] According to the 16th variation, the learning processing unit 241 is able to generate at least one trained model.
[0242] B17: 17th Variation
[0243] In the above embodiments and the 1st to 16th modifications, the processing device 180 can be as follows: Figure 24 The part shown only functions as the decision unit 183 and the acquisition unit 184. Figure 24 The decision unit 183 shown determines the part of the body of the player using the instrument of the type indicated by the instrument information based on the instrument information. Figure 24 The acquisition unit 184 shown acquires image information representing an image of the area of interest determined by the decision unit 183. According to the 17th variation, it is possible to determine an image of a performer required for training in playing a musical instrument, corresponding to the type of musical instrument.
[0244] B18: 18th Variation
[0245] In the 17th variation, Figure 24 The determination unit 183 shown can determine the part of interest based not on instrument information indicating the type of instrument, but on sound information indicating the sound output from the instrument, from the body of the player using the instrument. Furthermore, in the 17th variation, Figure 24 The acquisition unit 184 shown can acquire image information representing an image of a region of interest determined by the determination unit 183, based on sound information representing the sound output from the musical instrument. According to the 18th variation, it is possible to determine an image of a performer required for training in playing the musical instrument, corresponding to the sound output from the musical instrument.
[0246] C: The methods that can be mastered based on the above approach
[0247] You can master the following methods based on at least one of the above methods.
[0248] C1: Method 1
[0249] The first aspect of the present invention relates to an information processing method executed by a computer, which, based on instrument information representing an instrument, determines a part of interest from the body of a performer playing the instrument as shown in the instrument information, and obtains image information representing the determined part of interest. According to this method, it is possible to determine an image of a performer required for training the playing of an instrument, corresponding to the instrument.
[0250] C2: Method 2
[0251] In an example of the first method (the second method), the acquired image information is further sent to an external device. According to this method, it is possible to transmit images of the performer required for training in playing an instrument to an external device.
[0252] C3: The Third Method
[0253] In an example of the first or second method (the third method), the instrument information is further determined using association information related to the instrument, and the determination of the area of interest includes determining the area of interest based on the determined instrument information. According to this method, it is possible to determine an image of a performer required for training in playing an instrument based on association information related to the instrument.
[0254] C4: Method 4
[0255] In an example of the third method (the fourth method), the associated information is information representing the sound output by the instrument, information representing an image of the instrument, information representing a musical score corresponding to the instrument, or information representing a combination of the instrument and its training progress. According to this method, various types of information can be used as associated information.
[0256] C5: Method 5
[0257] In an example of the third or fourth method (the fifth method), the determination of the instrument information includes: inputting the association information into a first trained model, which has learned the relationship between learning association information related to the instrument and learning instrument information representing the instrument determined based on the learning association information; and determining the information output by the first trained model corresponding to the association information as the instrument information. According to this method, the instrument information is determined using a trained model, thus enabling the instrument information to represent the instrument played by the performer with high precision.
[0258] C6: Method 6
[0259] In an example of the fifth method (the sixth method), the association information and the learning association information represent the sound output by the musical instrument, and the learning instrument information represents the musical instrument that outputs the sound indicated by the learning association information. According to this method, the musical instrument can be determined based on the sound output by the musical instrument.
[0260] C7: The 7th Method
[0261] In the fifth example (seventh example), the association information and the learning association information represent an image showing the musical instrument, the learning instrument information represents the musical instrument, and the musical instrument is shown by the image shown by the learning association information. According to this method, the musical instrument can be determined based on the image showing the musical instrument.
[0262] C8: Method 8
[0263] In the example of the third method (the eighth method), the determination of the instrument information includes: determining the reference instrument information corresponding to the association information as the instrument information by referring to a table showing the correspondence between reference association information related to the instrument and reference instrument information representing the instrument. According to this method, the instrument information can be determined without using a trained model.
[0264] C9: The 9th Method
[0265] In any example of methods 1 through 8 (method 9), the determination of the area of interest includes determining the area of interest based on sound information representing the sound output from the instrument as shown by the instrument information and the instrument information itself. According to this method, it is possible to determine an image of the performer required for training in playing the instrument based on the sound output from the instrument.
[0266] C10: Method 10
[0267] In an example of the 9th method (the 10th method), the determination of the area of interest includes: inputting input information, including instrument information and sound information, to a second trained model that has learned the relationship between learning input information and learning output information. The learning input information includes learning instrument information representing the instrument and learning sound information representing the sound output from the instrument shown by the learning instrument information. The learning output information represents the area of interest in the body of the performer playing the instrument shown by the learning instrument information, i.e., the instrument that outputs the sound shown by the learning sound information; and determining the area of interest based on the output information output by the second trained model corresponding to the input information. According to this method, by using a trained model to determine the area of interest, it is possible to determine, with high accuracy, an image of the performer required for training in playing an instrument based on the sound output from the instrument.
[0268] C11: Method 11
[0269] The method of the present invention (the 11th method) involves an information processing method executed by a computer, which determines a part of interest from the body of a performer playing the instrument based on sound information representing the sound output from the instrument, and obtains image information representing an image of the determined part of interest. According to this method, it is possible to determine an image of a performer required for training in playing the instrument, corresponding to the sound output from the instrument.
[0270] C12: Method 12
[0271] In examples of the 9th or 11th methods (the 12th method), the determination of the area of interest includes determining the area of interest based on the relationship between the score information representing the musical score and the sound information. According to this method, it is possible to determine an image of the performer required for training in playing an instrument based on the relationship between the score information and the sound information.
[0272] C13: Method 13
[0273] In an example of the 11th method (the 13th method), the determination of the area of interest includes: inputting input information, including musical score information representing a musical score and the sound information, into a third trained model that has learned the relationship between learning input information and learning output information. The learning input information includes learning sound information representing the sound output from the instrument and learning musical score information representing the musical score. The learning output information indicates the area of interest in the body of a performer playing an instrument that outputs the sound represented by the learning sound information according to the musical score shown by the learning musical score information; and determining the area of interest based on the output information output by the third trained model corresponding to the input information. According to this method, the area of interest is determined using a trained model, thus enabling the determination of an image of a performer required for training in playing an instrument with high accuracy.
[0274] C14: Method 14
[0275] In any example of methods 1 through 13 (method 14), the determination of the focus area includes determining the focus area based on attention information indicating considerations for performance. According to this method, it is possible to switch the image of the performer required for training in playing an instrument, corresponding to considerations for performance.
[0276] C15: Method 15
[0277] In any example of methods 1 through 14 (method 15), the determination of the focus area includes determining the focus area based on performer information associated with the performer. According to this method, it is possible to switch the image of the performer required for training in playing an instrument, corresponding to the performer information associated with the performer.
[0278] C16: Method 16
[0279] The information processing system according to the present invention (the 16th aspect) includes: a determination unit that determines a part of interest from the body of a performer playing the instrument as shown in the instrument information, based on instrument information representing the instrument; and an acquisition unit that acquires image information representing an image of the part of interest determined by the determination unit. According to this method, it is possible to determine an image of a performer required for training in playing an instrument, corresponding to the instrument.
[0280] C17: Method 17
[0281] The information processing system according to the present invention (the 17th aspect) includes: a determination unit that determines a part of interest from the body of a performer playing the instrument based on sound information representing sound output from the instrument; and an acquisition unit that acquires image information representing an image of the part of interest determined by the determination unit. According to this method, an image of a performer required for training in playing the instrument can be determined in correspondence with the sound output from the instrument.
[0282] Explanation of the label
[0283] 1…Information providing system, 100…Student training system, 100A…Musical instrument, 100B…Student, 111-115…Camera, 120…Microphone, 130…Display unit, 140…Speaker, 150…Operation unit, 160…Communication unit, 170…Storage device, 180…Processing device, 181…Determination unit, 182…Trained model, 182a…Learning object model, 183…Decision unit, 184…Acquisition unit, 185…Transmission unit, 186…Output control unit, 187-190…Trained model, 191…Learning processing unit, 192…Data acquisition unit, 193…Training unit, 200…Teacher guidance system, 200A…Musical instrument, 200B…Teacher.
Claims
1. An information processing method, which is an information processing method executed by a computer. Based on instrument information representing the instrument, the body part of interest is determined from the body of the performer playing the instrument as indicated by the instrument information. Obtain image information, which represents an image of the determined region of interest. The determination of the area of interest includes: The second trained model, which has learned the relationship between learning input information and learning output information, is input to include instrument information and sound information representing the sound output from the instrument as indicated by the instrument information. This learning input information includes learning instrument information representing the instrument and learning sound information representing the sound output from the instrument as indicated by the learning instrument information. The learning output information represents the body part of the performer who is focused on playing the instrument as indicated by the learning instrument information, i.e., the instrument that outputs the sound as indicated by the learning sound information; and The region of interest is determined based on the output information of the second trained model, which corresponds to the input information.
2. The information processing method according to claim 1, wherein, Furthermore, the acquired image information is sent to an external device.
3. The information processing method according to claim 1 or 2, wherein, Furthermore, the instrument information is determined using association information related to the instrument. The determination of the area of interest includes determining the area of interest based on the determined instrument information.
4. The information processing method according to claim 3, wherein, The associated information is, Information representing the sound output by the musical instrument, Information representing the image showing the musical instrument, Information representing the musical score corresponding to the instrument, or Information indicating the combination of the musical instrument and the training progress for that instrument.
5. The information processing method according to claim 3, wherein, The determination of the instrument information includes: The association information is input into the first trained model, which learns the relationship between the learning association information related to the instrument and the learning instrument information representing the instrument determined based on the learning association information. as well as The information output by the first trained model, corresponding to the associated information, is determined as the instrument information.
6. The information processing method according to claim 5, wherein, The association information and the learning association information represent the sound output by the musical instrument. The learning instrument information represents the instrument whose sound is output by the learning association information.
7. The information processing method according to claim 5, wherein, The associated information and the learning-based associated information represent the image showing the musical instrument. The learning instrument information represents the instrument, which is shown by the image displayed in the learning association information.
8. The information processing method according to claim 3, wherein, The determination of the instrument information includes: By referring to a table showing the correspondence between reference association information related to the musical instrument and reference instrument information representing the musical instrument, the reference instrument information corresponding to the association information is determined as the musical instrument information.
9. The information processing method according to claim 1 or 2, wherein, The determination of the area of interest includes: The area of interest is determined based on the sound information and the instrument information.
10. The information processing method according to claim 9, wherein, The determination of the area of interest includes: The area of interest is determined based on the relationship between the musical score information and the sound information.
11. The information processing method according to claim 1 or 2, wherein, The determination of the area of interest includes: The areas of focus are determined based on attention information indicating considerations for the performance.
12. The information processing method according to claim 1 or 2, wherein, The determination of the area of interest includes: The area of focus is determined based on performer information related to the performer.
13. An information processing method, which is an information processing method executed by a computer. Based on sound information representing the sound output from the instrument, the area of focus is determined from the body of the performer playing the instrument. Obtain image information, which represents an image of the determined region of interest. The determination of the area of interest includes: The third trained model, which has learned the relationship between learning input information and learning output information, is input to include musical score information representing the score and the sound information. The learning input information includes learning sound information representing the sound output from the instrument and learning musical score information representing the score. The learning output information represents the body part of the performer who is playing an instrument that outputs the sound represented by the learning sound information according to the score shown in the learning musical score information; and The region of interest is determined based on the output information of the third trained model, which corresponds to the input information.
14. The information processing method according to claim 13, wherein, The determination of the area of interest includes: The area of interest is determined based on the relationship between the musical score information and the sound information.
15. The information processing method according to claim 13 or 14, wherein, The determination of the area of interest includes: The areas of focus are determined based on attention information indicating considerations for the performance.
16. The information processing method according to claim 13 or 14, wherein, The determination of the area of interest includes: The area of focus is determined based on performer information related to the performer.
17. An information processing system comprising: The decision unit, based on instrument information representing the instrument, determines the part of interest from the body of the performer playing the instrument as indicated by the instrument information; and The acquisition unit acquires image information representing the region of interest determined by the decision unit. The decision unit inputs learning information into a second, trained model that has learned the relationship between learning input information and learning output information. This learning input information includes learning instrument information representing the instrument and learning sound information representing the sound output from the instrument shown in the learning instrument information. The learning output information represents the body part of the performer who is focused on playing the instrument shown in the learning instrument information, i.e., the instrument that outputs the sound shown in the learning sound information. The region of interest is determined based on the output information of the second trained model, which corresponds to the input information.
18. An information processing system comprising: The decision unit, based on sound information representing the sound output from the instrument, determines the part of interest from the body of the performer playing the instrument; and The acquisition unit acquires image information representing the region of interest determined by the decision unit. The decision unit inputs the following information to the third trained model, which has learned the relationship between the learning input information and the learning output information: the learning input information includes musical score information representing the score and the sound information. The learning input information includes learning sound information representing the sound output from the instrument and learning musical score information representing the score. The learning output information represents the parts of the performer's body that are of interest when playing an instrument that outputs the sound represented by the learning sound information according to the score shown by the learning musical score information. The region of interest is determined based on the output information of the third trained model, which corresponds to the input information.