Data processing device, data processing method, and data processing program

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The data processing system enhances learning efficiency by using earphones with synchronized vibrations to highlight important content points, addressing the limitations of one-way audio delivery in environments without visual information.

JP2026096290APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

AI Technical Summary

Technical Problem

Conventional learning methods using earphones are limited to one-way audio information delivery, especially in environments where visual information is unavailable, leading to inefficiencies in learning and retention of content.

Method used

A data processing system that utilizes earphones equipped with vibration generating units, which synchronize vibrations with important parts of learning content playback, combining auditory and tactile senses to enhance learning efficiency.

Benefits of technology

Enables effective learning by emphasizing important content points through synchronized vibrations, improving retention and engagement, especially in environments lacking visual information.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096290000001_ABST

Patent Text Reader

Abstract

This invention provides a data processing device, a data processing method, and a program that enable efficient learning using learning content. [Solution] The data processing device comprises an input unit for acquiring user data, a processing unit for performing specific processing using a data generation model that generates predetermined inference results according to the user data, and an output unit that uses the results of the specific processing to play sound from two earphones equipped with a vibration generating unit and worn on the user's ears. The input unit acquires learning content including learning content information as user data, the processing unit performs processing to generate vibration information by inputting a prompt to the data generation model that instructs the vibration generating unit to vibrate in sync with the timing of playing back important learning parts of the learning content indicated by the learning content information in sound, and the output unit plays the learning content from the earphones and vibrates the vibration generating unit using the vibration information.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a data processing device, a data processing method, and a data processing program.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In conventional learning using earphones, it is limited to one-way information provision by voice, and there is room for improvement in efficiently performing learning in environments where visual information cannot be used, such as during movement. The object of the present disclosure is to provide a data processing device, a data processing method, and a data processing program capable of efficiently performing learning using learning content.

Means for Solving the Problems

[0005] A first aspect of the technology of this disclosure is a data processing device comprising: an input unit for acquiring user data; a processing unit for performing a specific processing using a data generation model that generates a predetermined inference result corresponding to the user data; and an output unit for playing sound from the speakers of two earphones, each equipped with a vibration generating unit on at least one of them and worn on the left and right ears of the user, using the results of the specific processing, wherein the input unit acquires learning content, including learning content information that can be played back as sound, as the user data; the processing unit performs the process of generating vibration information as the specific processing by inputting a prompt to the data generation model instructing it to generate vibration information that vibrates the vibration generating unit in sync with the timing of playing back as sound important parts of the learning content indicated by the learning content information; and the output unit plays back the learning content indicated by the learning content information from the speakers of the earphones and vibrates the vibration generating unit using the vibration information.

[0006] A second aspect of the technology of the present disclosure is a data processing method in which a computer performs a specific processing using a data generation model that acquires user data and generates a predetermined inference result corresponding to the user data, and uses the result of the specific processing to play sound from the speakers of two earphones, each equipped with a vibration generating unit and worn one on each of the user's left and right ears, wherein the computer acquires learning content, which includes learning content information that can be played back as sound, as the user data, and based on the user data, it inputs a prompt to the data generation model instructing it to generate vibration information that vibrates the vibration generating unit in synchronization with the timing of playing back in sound important parts of the learning content indicated by the learning content information, thereby performing the process of generating the vibration information as the specific processing, playing back the learning content indicated by the learning content information from the speakers of the earphones, and using the vibration information to vibrate the vibration generating unit.

[0007] A third aspect of the technology of this disclosure is a data processing program that causes a computer to execute a process which involves acquiring user data, performing a specific process using a data generation model that generates a predetermined inference result corresponding to the user data, and using the result of the specific process to play sound from the speakers of two earphones, each equipped with a vibration generating unit on at least one of them and worn one on each of the user's left and right ears, wherein the specific process involves acquiring learning content, which includes learning content information that can be played back as sound, as the user data, and inputting a prompt to the data generation model instructing it to generate vibration information that vibrates the vibration generating unit in sync with the timing of playing back the learning content indicated by the learning content information as sound, thereby playing back the learning content indicated by the learning content information from the speakers of the earphones and vibrating the vibration generating unit using the vibration information. [Brief explanation of the drawing]

[0008] [Figure 1] Figure 1 is a conceptual diagram showing an example of the configuration of a data processing system. [Figure 2] Figure 2 is a conceptual diagram showing an example of the main functions of a data processing device and earphones. [Figure 3A] Figure 3A shows an example of an earphone configuration. [Figure 3B] Figure 3B shows the user wearing earphones. [Figure 3C] Figure 3C is a diagram illustrating the field of view of camera 42. [Figure 3D] Figure 3D shows the user wearing the earphones. [Figure 3E] Figure 3E shows the user wearing the earphones. [Figure 3F] Figure 3F shows the user wearing earphones. [Figure 4] The functional configuration of a specific processing unit of a data processing device is shown in general terms. [Figure 5] An example of the operation flow of a specific process by the data processing device according to the first embodiment is schematically shown. [Figure 6] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 7] An example of the operation flow of a specific process by the data processing device according to the second embodiment is schematically shown. [Modes for carrying out the invention]

[0009] Hereinafter, an example of an embodiment of the data processing device, data processing method, and program relating to the technology of this disclosure will be described with reference to the attached drawings.

[0010] First, let's explain the terminology used in the following explanation.

[0011] In the following embodiments, the signed processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Furthermore, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), or an APU (Accelerated Processing Unit).

[0012] In the following embodiments, signed RAM (Random Access Memory) is a memory that temporarily stores information and is used as work memory by the processor.

[0013] In the following embodiments, the tagged storage is one or more non-volatile storage devices that store various programs, various parameters, and the like. Examples of non-volatile storage devices include flash memories (SSDs (Solid State Drives)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0014] In the following embodiments, the tagged communication I / F (Interface) is an interface including a communication processor, an antenna, and the like. The communication I / F controls communication between a plurality of computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), etc.

[0015] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B". That is, "A and / or B" means that it may be only A, only B, or a combination of A and B. Also, in this specification, when expressing three or more matters connected by "and / or", the same concept as "A and / or B" is applied.

[0016] [First Embodiment] FIG. 1 shows an example of the configuration of a data processing system 10 according to the first embodiment.

[0017] As shown in FIG. 1, the data processing system 10 includes a data processing device 12 and earphones 14. An example of the data processing device 12 is a server. In the present embodiment, the data processing device 12 is an example of the "data processing device" according to the technology of the present disclosure.

[0018] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0019] The earphone 14 includes a computer 36, a microphone 38, a speaker 40, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 38, speaker 40, and camera 42 are also connected to the bus 52.

[0020] The microphone 38 receives voice signals from the user 20 and accepts instructions from the user 20. The microphone 38 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 40 outputs audio according to the instructions from the processor 46. Hereafter, the microphone 38 may be simply referred to as the microphone 38.

[0021] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0022] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0023] Figure 2 shows an example of the main functions of the data processing device 12 and the earphone 14.

[0024] As shown in Figure 2, in the data processing device 12, specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "data processing program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0025] The storage 32 stores the data generation model 58. The data generation model 58 is used by the specific processing unit 290.

[0026] (Earphones 14) In the earphone 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0027] The earphone 14 may be interpreted as a canal-type earphone that is fitted into the ear canal of the user 20, as shown in Figure 3A. However, the earphone 14 is not limited to a canal type; it may also be an inner-ear type earphone that is inserted into the inner ear of the user 20, or a headphone type earphone that covers the entire ear of the user 20. Each of the two earphones 14 is equipped with a microphone 38, a speaker 40, and a camera 42. The sound and images collected by the two earphones 14 fitted into the ears of the user 20 may be recorded as a life log in the database 24.

[0028] The life log can be interpreted as a history of the user 20's actions in daily life, and may include sounds and images associated with the user 20, specifically sounds collected by the microphone 38 and images taken by the camera 42 during daily life. The life log may record sounds and images associated with the user 20, along with the date, time, and location in which they were acquired.

[0029] The sounds collected by the microphone 38 may include the voice of the person the user 20 is talking to, and sounds that occur around the user 20 while walking or cycling (such as the sound of cars driving, birds chirping, the babbling of a stream, and the sound of trees swaying in the wind).

[0030] As shown in Figure 3C, the camera 42 may capture images of the scenery within its field of view that is in front of the user 20, or it may capture images of scenery within its field of view that is not in front of the user 20, for example, to the side, behind, below, or above the user 20. The images captured by the camera 42 may include images of the person the user 20 is talking to, the scenery around the user 20 when they are walking or cycling, and images of the pet the user 20 is walking with.

[0031] Since each of the two earphones 14 is equipped with a camera 42, the two earphones 14 worn on the user's ears 20 are positioned at a specific distance apart, one on the left ear and the other on the right ear, as shown in Figure 3B. Therefore, compared to cases where two cameras are arranged side by side in a single housing, such as in a video camera, the spacing between the two cameras 42 can be increased, making 3D sensing easier. 3D sensing can be interpreted as measuring three-dimensional shapes.

[0032] Furthermore, when the two earphones 14 are placed in the user 20's ears, the two cameras 42 are positioned close to the user 20's left and right eyes, allowing images (captured images) that are nearly identical to those seen with the naked eye to be recorded as a life log in the database 24. Consequently, in specific processing, it becomes easier to reproduce information corresponding to inquiries from the user 20, that is, information corresponding to the content of the user 20's speech.

[0033] While the two earphones 14 are attached to the user 20, all or part of the images captured by the camera 42 may be recorded in the database 24 as a life log. Specifically, when the two earphones 14 are attached to the user 20, the recording of images captured by the camera 42 to the database 24 may begin, and when the two earphones 14 are removed from the user 20, the recording of those images to the database 24 may end.

[0034] While the two earphones 14 are worn by the user 20, all or part of the sound collected by the microphone 38 may be recorded as a lifelog in the database 24. Specifically, when the two earphones 14 are worn by the user 20, the recording of the sound collected by the microphone 38 to the database 24 may begin, and when the two earphones 14 are removed from the user 20, the recording of the sound to the database 24 may end.

[0035] Next, we will describe the processing of the specific processing unit 290 when the data processing device 12 receives an utterance from the user 20 wearing the earphones 14 regarding the user 20's memories or actions, and performs specific processing to propose information corresponding to the content of the user 20's utterance to the user 20.

[0036] (Specific processing) In this embodiment, the specific processing involves inputting user data and performing specific processing using a data generation model that generates predetermined inference results corresponding to the input user data. Specifically, in the specific processing, when utterances related to the user's memories or actions are received as user data from a user 20 wearing earphones 14, the system refers to the database 24 and performs processing to propose information corresponding to the content of the utterances to the user 20. Specifically, after a life log is recorded in the database 24, if the user 20 wearing earphones 14 makes an utterance related to the user's memories or actions, the specific processing may involve referring to the database 24 and proposing information corresponding to the content of the utterances to the user 20.

[0037] (Example of specific processing) If the user wearing the earphones requests a message that will trigger the recall of a specific memory, the specific processing unit 290 may propose one or more messages selected based on the life log to the user who made the request, as information corresponding to the content of the utterance (request).

[0038] For example, if user 20, wearing earphones 14, tries to recall their memory and asks, "What did I say to person A around [date] at [time]?", the identification processing unit 290, as part of its identification process, inputs this message as a prompt to the data generation model 58. The identification processing unit 290 may refer to the life log in database 24 and, based on the output obtained from the data generation model 58, generate a message such as, "I think you said, 'I found a nice restaurant, let's make a reservation.'" This message may be interpreted as an example of information corresponding to the content of user 20's utterance.

[0039] For example, if user 20 wearing earphones 14 tries to recall their memory and asks, "Who was I talking to around [date] at [time]?", the identification processing unit 290 will input this message as a prompt to the data generation model 58 as part of its identification process. The identification processing unit 290 may refer to the life log in database 24 and, based on the output obtained from the data generation model 58, generate a message such as, "It seems you were talking with two friends at that time, probably B and C." This message may be interpreted as an example of information corresponding to the content of user 20's utterance.

[0040] For example, if user 20, wearing earphones 14, tries to recall their emotions and says, "How did I feel when I was talking to person A around [date] at [time]?", the identification processing unit 290, as part of its identification process, inputs this message as a prompt to the data generation model 58. The identification processing unit 290 may refer to the life log in database 24 and, based on the output obtained from the data generation model 58, generate a message such as, "At that time, you were laughing a lot, so it seems you had a good impression of your friend and were very happy." This message may be interpreted as an example of information corresponding to the content of user 20's utterance.

[0041] (Example of specific processing, part 2) If a user 20 wearing earphones 14 mutters a specific matter as part of their utterance, the specific processing unit 290 may suggest to the user 20 who requested the message, based on their life log, recommended actions for the user 20 regarding that matter, as information corresponding to the content of their utterance (muttering).

[0042] For example, when user 20 wearing earphones 14 is shopping at a specific retail store and says, "What should I buy?", the specific processing unit 290 inputs this message as a prompt to the data generation model 58 as a specific processing step. The specific processing unit 290 may refer to the life log in the database 24 and, based on the output obtained by the data generation model 58, generate a message such as, "A few months ago, you purchased product A at this store and commented that it wasn't very tasty, so how about purchasing recently released products B and C this time?" This message may be interpreted as an example of information corresponding to the content of user 20's utterance.

[0043] (Third example of specific processing) As shown in Figure 3D, when user 20, wearing earphones 14, is operating a PC and says, "What was the name of product A that I searched for the day before yesterday?", the identification processing unit 290 inputs this message as a prompt to the data generation model 58 as part of its identification processing. The data generation model 58 refers to the life log in the database 24 and analyzes the video of the PC screen when user 20 was operating it in the past to generate a specific output. Based on the output obtained by the data generation model 58, the identification processing unit 290 may generate a message such as "Product A is ○○○". This message may be interpreted as an example of information corresponding to the content of user 20's utterance.

[0044] (Fourth example of specific processing) As shown in Figure 3E, if user 20, wearing earphones 14, says "There was a place nearby with a great view, but I wonder where it is?" while cycling, the identification processing unit 290 inputs this message as a prompt to the data generation model 58 as part of its identification process. The data generation model 58 refers to the life log in database 24 and analyzes places previously visited by user 20 and the route to those places to generate a specific output. Based on the output obtained by the data generation model 58, the identification processing unit 290 may generate a message such as "I think it's Cape XX, about 500m from here." This message can be interpreted as an example of information corresponding to the content of user 20's utterance.

[0045] (Example 5 of specific processing) As shown in Figure 3F, when user 20, wearing earphones 14, meets Mr. X at company A, the company he is visiting, and says, "Can you tell me this person's name?", the identification processing unit 290 inputs this message as a prompt to the data generation model 58 as part of the identification process. The data generation model 58 refers to the life log in database 24 and generates specific output from the history of people that user 20 met when he visited company A. Based on the output obtained from the data generation model 58, the identification processing unit 290 may generate a message such as, "I think his name is ○○." This message may be interpreted as an example of information corresponding to the content of user 20's utterance.

[0046] As shown in Figure 4, the specific processing unit 290 includes an input unit 291, a processing unit 292, and an output unit 293.

[0047] The input unit 291 acquires user input received through the earphone 14. Specifically, it acquires the user's voice received through the earphone 14.

[0048] The processing unit 292 performs specific processing using the data generation model 58. Specifically, it inputs voice from the user into the data generation model 58 and obtains a generation result. More specifically, when it receives an utterance from the user 20 wearing the earphones 14 regarding the user 20's memories or actions, it performs a specific processing step of proposing information corresponding to the content of the utterance to the user 20.

[0049] The output unit 293 transmits the result of the specific processing to the earphone 14. In the earphone 14, the control unit 46A causes the speaker 40 to output the result of the specific processing. The microphone 38 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0050] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include those described above. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions shown by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0051] Next, the operation of the data processing system 10 will be explained.

[0052] An example of the flow of a specific processing method will be explained with reference to Figure 5. Note that the flow of a specific processing method shown in Figure 5 is an example of a "data processing method" related to the technology disclosed herein.

[0053] In step S300, the data processing device 12 receives user data, including sound and images collected by the two earphones 14.

[0054] In step S302, if the data processing device 12 receives an utterance from the user wearing the earphones 14 regarding the user's memories or actions, it executes a specific process to propose information corresponding to the content of the utterance to the user 20 based on the user's life log.

[0055] In step S303, the data processing device 12 executes a process to play back the result of a specific process from the speaker 40.

[0056] [Second Embodiment] This embodiment describes an example of an embodiment in which the data processing system 10 aims to enable the user 20 of the earphones 14 to efficiently perform learning using learning content.

[0057] Traditionally, learning using earphones while on the go or in environments where visual information is unavailable has been limited to one-way audio information delivery, posing challenges in terms of learning efficiency. Furthermore, there were few means to highlight important points, sometimes resulting in insufficient understanding and retention of the learned content.

[0058] Therefore, the data processing system 10 according to this embodiment aims to solve these problems and enable user 20 to efficiently learn using learning content. In this embodiment, the learning content includes information showing audio of lectures at the school that user 20 attends (corresponding to "learning content information" described later), but it is not limited to this form. For example, learning content may include electronic books for learning, or digitized versions of textbooks and teaching materials used in school lectures.

[0059] Figure 6 shows an example of the configuration of the data processing system 10 according to the second embodiment. As shown in Figure 6, the data processing system 10 according to this embodiment differs from the data processing system 10 according to the first embodiment in that a sensor group 41A and a vibration generating unit 41B are added to the earphone 14. In this embodiment, the sensor group 41A and vibration generating unit 41B are provided in both of the two earphones 14 for the left and right ears, but the system is not limited to this configuration. For example, the sensor group 41A and vibration generating unit 41B may be provided in only one of the two earphones 14.

[0060] The sensor group 41A according to this embodiment includes a sensor for detecting biometric data of a user 20 wearing earphones 14 in their ears, a sensor for detecting movement data of the user 20, and a sensor for detecting environmental data around the user 20.

[0061] Sensors that detect biological data include, for example, heart rate sensors and blood oxygen sensors. Sensors that detect motion data include, for example, acceleration sensors and angular acceleration sensors. Furthermore, sensors that detect environmental data include, for example, temperature sensors and humidity sensors.

[0062] Furthermore, the earphone 14 according to this embodiment is equipped with a noise-canceling function. In the noise-canceling function according to this embodiment, the microphone 38 of the earphone 14 collects external sounds, and an internal digital circuit (not shown) generates a sound with the opposite phase to the collected sound, which is then reproduced by the speaker 40 of the earphone 14 along with the sound to be reproduced. This significantly reduces ambient noise, allowing almost only the sound to be reproduced to be heard. In addition, the intensity of the noise-canceling function according to this embodiment, i.e., the amount of ambient noise reduction, is adjustable. In this embodiment, the noise-canceling function is implemented in both of the two earphones 14, but the embodiment is not limited to this configuration. For example, the noise-canceling function may be implemented in only one of the two earphones 14.

[0063] On the other hand, in the data processing system 10 according to this embodiment, a learning database 59, which is a learning database for the user 20 of the data processing system 10, is constructed in the storage 32 of the data processing device 12. In the learning database 59 according to this embodiment, various types of learning content that the user 20 uses for learning, such as content related to history (hereinafter referred to as "history content"), content related to language listening (hereinafter referred to as "language content"), and content related to science (hereinafter referred to as "science content"), are registered for each user 20.

[0064] The learning content according to this embodiment includes learning content information that can be reproduced as audio. In this embodiment, audio information that can be reproduced as audio is used as the learning content information, but the embodiment is not limited to this form. For example, text information that can display the content of the target learning may be used as the learning content information.

[0065] As described above, in the data processing system 10 according to this embodiment, the learning content is registered in the data processing device 12, but the system is not limited to this configuration. For example, the learning content may be acquired by downloading it from an external device connected to the network 54.

[0066] Furthermore, the learning database 59 according to this embodiment includes, as information contained in the learning content, sound effect information that can be played when performing the learning, thereby improving the efficiency of the learning, in relation to the learning content indicated by the learning content information.

[0067] For example, when learning with historical content, sound effects that can improve learning efficiency are registered, such as music that evokes the historical context of the time, battle sounds, and the hustle and bustle of a city. Similarly, when learning with language content, sound effects that can improve learning efficiency are registered, such as ambient sounds and music from the region where the language is spoken. Furthermore, when learning with science content, sound effects that can improve learning efficiency are registered, such as sounds of space and natural phenomena.

[0068] The specific processing unit 290 according to this embodiment, like the specific processing unit 290 according to the first embodiment, includes an input unit 291, a processing unit 292, and an output unit 293, as shown in Figure 4 as an example.

[0069] The input unit 291 in this embodiment acquires user data. Specifically, it acquires learning content as user data, which includes learning content information that can be played back as audio, based on the learning content performed by the user 20.

[0070] Furthermore, the processing unit 292 according to this embodiment performs specific processing using a data generation model 58 that generates predetermined inference results according to user data. Specifically, the processing of generating vibration information is performed as a specific process by inputting a prompt to the data generation model 58 that instructs it to generate vibration information that vibrates the vibration generating unit 41B in synchronization with the timing of playing back in audio the important learning parts of the learning content indicated by the learning content information, based on the user data.

[0071] For example, along with the learning content used by user 20, the data generation model 58 is given the prompt: "This is the learning content used by the user. From the content of this learning material, detect important parts such as keywords and sections, and generate vibration information that synchronizes with the timing of the audio playback of the detected parts." As a result, the data generation model 58 generates vibration information.

[0072] The output unit 293 in this embodiment then uses the results of a specific process to play audio from the speakers 40 of the two earphones 14. Specifically, it plays the learning content indicated by the acquired learning content information from the speakers 40 of the earphones 14, and also uses the generated vibration information to vibrate the vibration generating unit 41B.

[0073] In this embodiment, as vibration information, instruction information is generated that instructs the earphone 14 to vibrate the vibration generating unit 41B in synchronization with the timing of playing the important parts when the content of the learning is played back as audio through the speaker 40 of the earphone 14. The output unit 293 then transmits the generated instruction information to the earphone 14. When the processor 46 of the earphone 14 in this embodiment receives the instruction information, it vibrates the vibration generating unit 41B according to the instruction information. However, the embodiment is not limited to this form, and the data generation model 58 may generate information to vibrate the vibration generating unit 41B as the vibration information, and the data processing device 12 may directly vibrate the vibration generating unit 41B of the earphone 14.

[0074] In this embodiment, the output unit 293 plays the learned content from the speaker 40 of one earphone 14 and plays a sound effect related to the learned content from the speaker 40 of the other earphone 14. In this embodiment, the case in which the learned content is played from the speaker 40 of the earphone 14 worn on the user 20's right ear and the sound effect is played from the speaker 40 of the earphone 14 worn on the user 20's left ear is described, but the embodiment is not limited to this form. The learned content may also be played from the speaker 40 of the earphone 14 worn on the user 20's left ear and the sound effect may be played from the speaker 40 of the earphone 14 worn on the user 20's right ear.

[0075] Furthermore, the input unit 291 according to this embodiment acquires the aforementioned biometric data, motion data, and environmental data from the sensor group 41A as user data. Accordingly, the processing unit 292 according to this embodiment inputs the learning content, biometric data, motion data, and environmental data acquired by the input unit 291 into the data generation model 58 and executes a process to generate vibration information as a specific process.

[0076] More specifically, the processing unit 292 according to this embodiment receives a prompt from the data generation model 58 that includes learning content, biometric data, motion data, and environmental data acquired by the input unit 291, and instructs the model to generate vibration information, and acquires the generation result, i.e., vibration information.

[0077] For example, along with the learning content used by user 20 for learning, biometric data, motion data, and environmental data, the following prompt is input to the data generation model 58: "This is the learning content that the user will learn, biometric data showing the user's heart rate and blood oxygen level, motion data showing the user's acceleration and angular acceleration, and environmental data showing the user's surrounding environment. From the content of this learning material, detect important parts for learning, such as keywords and sections, and generate vibration information that synchronizes with the timing of playing the detected parts aloud, so as to match the user's situation." As a result, the data generation model 58 obtains vibration information corresponding to user 20's situation.

[0078] As described above, the processing unit 292 according to this embodiment uses biometric data, motion data, and environmental data related to the user 20, acquired by the sensor group 41A, in addition to the learning content, but is not limited to this configuration. For example, it may use only the learning content, or it may use the learning content and a combination of one or two of the biometric data, motion data, and environmental data.

[0079] Furthermore, the input unit 291 according to this embodiment further acquires sounds from the microphone 38 that surround the user 20 as user data, and the processing unit 292 according to this embodiment performs a specific process to adjust the intensity of the output target by the output unit 293, specifically the playback volume of the voice and sound effects indicating the learning content, and the intensity of vibration to the vibration generating unit 41B, according to the acquired sounds from the user 20's surroundings.

[0080] Furthermore, the processing unit 292 according to this embodiment performs a specific process to adjust the intensity of the noise cancellation function according to the ambient sound surrounding the user 20 acquired from the microphone 38. Specifically, the louder the ambient sound surrounding the user 20, the greater the amount of ambient sound reduction by the noise cancellation function. This makes it possible to further improve the learning efficiency of the user 20 using the learning content.

[0081] Next, the operation of the data processing system 10 according to this embodiment will be described.

[0082] An example of a specific processing flow will be explained with reference to Figure 7. Note that the specific processing flow shown in Figure 7 is an example of a "data processing method" related to the technology disclosed herein. Here, to avoid confusion, we will explain the case where the learning content (hereinafter simply referred to as "learning content") to be studied by the user 20 (hereinafter simply referred to as "user") to be processed is known in advance.

[0083] In step S400, the data processing device 12 obtains the learning content by reading it from the learning database 59.

[0084] In step S402, the data processing device 12 receives and acquires the aforementioned biometric data, motion data, and environmental data from the earphones 14 worn by the user.

[0085] In step S404, the data processing device 12 generates the above-mentioned prompt using the acquired learning content, biometric data, motion data, and environmental data.

[0086] In step S406, the data processing device 12 inputs the generated prompt to the data generation model 58, thereby causing the data generation model 58 to generate vibration information.

[0087] In step S408, the data processing device 12 starts playback of audio indicating the learning content as indicated by the learning content information included in the acquired learning content, from the speaker 40 of the earphone 14 worn by the user in the right ear. The data processing device 12 also starts playback of sound effects as indicated by the sound effect information included in the acquired learning content, from the speaker 40 of the earphone 14 worn by the user in the left ear. Furthermore, the data processing device 12 transmits the generated vibration information to both earphones 14 worn by the user, thereby starting vibration of the earphones 14 in parallel with the playback of the learning content and sound effects, and synchronized with the timing of playback of important content in the learning content.

[0088] In step S410, the data processing device 12 receives audio data indicating the sounds around the user from the earphones 14 worn by the user.

[0089] In step S412, the data processing device 12 determines whether the sounds around the user indicated by the acquired audio data require adjustment of the learning content and the playback volume of sound effects. If the determination is negative, the device proceeds to step S416; if the determination is positive, the device proceeds to step S414.

[0090] In step S414, the data processing device 12 adjusts the volume of the audio and sound effects indicating the learning content, and also adjusts the intensity of the vibrations from the vibration generating unit 41B, according to the loudness of the sound surrounding the user as indicated by the acquired audio data.

[0091] In step S416, the data processing device 12 determines whether the sounds surrounding the user indicated by the acquired audio data require adjustment of the noise cancellation function's intensity. If the determination is negative, the device proceeds to step S420; if the determination is positive, the device proceeds to step S418.

[0092] In step S418, the data processing device 12 adjusts the intensity of the noise cancellation function according to the volume of ambient noise around the user indicated by the acquired audio data, and then proceeds to step S420.

[0093] The audio data acquired in step S410 consists of two audio data sets: one from the earphone 14 worn in the user's right ear and another from the earphone 14 worn in the user's left ear, and both are time-series data. Therefore, in this embodiment, the average value of the peak volume of each of the two audio data sets within a predetermined period (in this embodiment, the most recent past 1 second) is applied as the ambient sound around the user, which is applied in steps S412 to S418. However, this is not the only possible configuration; for example, the peak volume of each of the two audio data sets within a predetermined period may be applied.

[0094] In step S420, the data processing device 12 determines whether it is time to terminate this specific processing (in this embodiment, the time when the user 20 gives an instruction to terminate the specific processing, or the time when all of the learning content from the learning content has been played back). If the determination is negative, the device returns to step S410; if the determination is positive, the device terminates this specific processing.

[0095] Through the above specific processing, the user can use the earphones 14 they are wearing to play learning content and sound effects, and the earphones 14 can vibrate when important parts of the learning content are being played. As a result, learning can be done by combining auditory and tactile senses, and learning using learning content can be done efficiently even when there is no visual information. In addition, the associated sound effects increase interest and engagement in learning, as a result, concentration can be maintained. Furthermore, important points are emphasized by the vibration of the earphones 14, so important parts can be reliably recognized. For example, effective learning is possible even when hands and eyes are occupied, such as when learning while commuting to work or school.

[0096] The following scenario is a specific example of the data processing system 10 according to this embodiment. • Scenario 1: When studying history Situation: A user is studying history lectures during their commute. Right ear: Plays audio of a lecture about historical events. Left ear: Plays music and sound effects that evoke the historical context of the time (sounds of battle, city noise, etc.). Vibration: Vibration is performed at the timing of replaying important dates or events.

[0097] • Scenario 2: When learning a language Situation: User is learning a language through listening while taking a walk. Right ear: Plays audio of a conversation in a foreign language. Left ear: Plays ambient sounds and music from the region where the language is spoken. Vibration: Vibration is used when new words or important phrases are played.

[0098] • Scenario 3: When learning science Situation: A user is learning a science lecture while jogging. Right ear: Plays audio explanations of scientific theories and experiments. Left ear: Plays sounds of space and natural phenomena. Oscillation: Oscillation is used when explaining key concepts and formulas.

[0099] Although not mentioned in this embodiment, the playback of sound effects may be adjusted so as not to interfere with the audio that indicates the learning content.

[0100] Alternatively, the learning content and the output settings of the output unit 293 may be adjusted according to the importance of the learning content, settings made by the user 20, and the user's operations and responses.

[0101] Alternatively, the data generation model 58 may be used to provide additional explanations and answer questions in real time according to the user's level of understanding, and the difficulty level of the learning content and the vibration pattern of the earphones 14 may be dynamically adjusted based on the user's learning history, level of understanding, and progress.

[0102] Furthermore, although this embodiment describes a case in which learning content information and sound effect information are registered in a single learning database 59, the embodiment is not limited to this form, and the learning content information and sound effect information may be registered in different databases. In this embodiment, the database for registering learning content information and the database for registering sound effect information may be built on different devices.

[0103] Alternatively, instead of preparing sound effect information in advance, the system may use the data generation model 58 to search for and apply sound effect information related to the learning content indicated by the learning content information.

[0104] In this case, for example, along with the learning content used by user 20, the data generation model 58 is input with the following prompt: "This is the learning content used by the user. From the content of this learning material, detect important parts such as keywords and sections, and generate vibration information that will vibrate the earphones in sync with the timing of the audio playback of the detected parts. Also, search for sound effect information that can play sound effects that will allow learning with this learning content to be done efficiently, and generate the results."

[0105] Furthermore, in this case, and when biometric data, motion data, and environmental data are used, for example, along with the learning content used by user 20 for learning, the following prompt is input to the data generation model 58: "This is the learning content that the user will learn, along with biometric data showing the user's heart rate and blood oxygen level, motion data showing the user's acceleration and angular acceleration, and environmental data showing the user's surrounding environment. From the content of this learning material, detect important parts for learning, such as keywords and sections, and generate vibration information that vibrates the earphones in sync with the timing of playing the detected parts as audio. Also, considering this biometric data, motion data, and environmental data, search for sound effect information that has the effect of enabling the user to learn the learning content in a way that is optimal for the situation they are in, and generate the results."

[0106] Furthermore, the system may monitor heart rate, blood oxygen levels, etc., obtained by a biosensor installed in the earphone 14, estimate the user's level of concentration and fatigue, and prompt them to take a break at an appropriate time.

[0107] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0108] In the embodiments described above, examples were given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing method for the specific process may be used, which may involve multiple computers, including computer 22.

[0109] In the embodiments described above, examples were given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0110] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0111] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0112] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0113] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0114] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0115] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0116] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0117] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted as being incorporated by reference.

[0118] Furthermore, the following additional information is disclosed regarding the above explanation.

[0119] <Note 1> An input section for acquiring user data, A processing unit that performs specific processing using a data generation model that generates predetermined inference results according to the user data, Using the results of the specified processing, an output unit is provided which plays sound from the speakers of two earphones, one of which is attached to each of the user's left and right ears, and which has a vibration generating unit on at least one of them. Equipped with, The input unit acquires learning content, which includes learning content information that can be reproduced as audio, as user data, based on the learning content performed by the user. The processing unit performs the process of generating the vibration information as the specific process by inputting a prompt to the data generation model that instructs it to generate vibration information that vibrates the vibration generating unit in sync with the timing of playing back in audio the important learning parts of the learning content indicated by the learning content information, based on the user data. The output unit reproduces the learning content indicated by the learning content information from the speaker of the earphone, and also vibrates the vibration generating unit using the vibration information. Data processing device. <Note 2> The learning content further includes sound effect information capable of playing sound effects related to the learning content, The output unit plays the learning content indicated by the learning content information from the speaker of one of the earphones, and plays the sound effect indicated by the sound effect information from the speaker of the other earphone. The data processing device described in Appendix 1. <Note 3> At least one of the two earphones is equipped with a sensor that detects at least one of the user's biometric data, the user's movement data, and the user's surrounding environment data. The input unit further acquires at least one of the biological data, motion data, and environmental data from the sensor as user data. A data processing device as described in Appendix 1 or Appendix 2. <Note 4> At least one of the two earphones is equipped with a microphone, The input unit further acquires sounds from the microphone that surround the user as user data. The processing unit further performs a process to adjust the intensity of the output target by the output unit in accordance with the ambient sound, as the specific processing. A data processing device described in any one of the appendices 1 through 3. <Note 5> At least one of the two earphones further includes a noise-canceling function. The processing unit further performs a process to adjust the intensity of the noise-canceling function according to the ambient sound, as the specific process. The data processing device described in Appendix 4. <Note 6> Retrieve user data, A specific processing is performed using a data generation model that generates predetermined inference results according to the user data. A data processing method that uses the results of the aforementioned specific processing to perform a process of playing sound from the speakers of two earphones, each equipped with a vibration generating unit on at least one side and worn one on each of the user's left and right ears, The learning content, which includes learning content information that can be played back as audio, is acquired as user data by the user. Based on the user data, the process of generating the vibration information is performed as the specific process by inputting a prompt to the data generation model instructing it to generate vibration information that vibrates the vibration generating unit in sync with the timing of playing back in audio the important learning parts of the learning content indicated by the learning content information. The learning content indicated by the learning content information is reproduced from the speaker of the earphone, and the vibration generating part is vibrated using the vibration information. A data processing method in which a computer performs the processing. <Note 7> Retrieve user data, A specific processing is performed using a data generation model that generates predetermined inference results according to the user data. A process that uses the results of the aforementioned specific processing to reproduce sound from the speakers of two earphones, each equipped with a vibration generating unit and worn one on each of the user's left and right ears, The learning content, which includes learning content information that can be played back as audio, is acquired as user data by the user. Based on the user data, the process of generating the vibration information is performed as the specific process by inputting a prompt to the data generation model instructing it to generate vibration information that vibrates the vibration generating unit in sync with the timing of playing back in audio the important learning parts of the learning content indicated by the learning content information. The learning content indicated by the learning content information is reproduced from the speaker of the earphone, and the vibration generating part is vibrated using the vibration information. A data processing program that instructs a computer to perform a task. [Explanation of Symbols]

[0120] 10 Data Processing Systems 12 Data Processing Devices 14 Earphones 41A Sensor group 41B Vibration generating section 59 Learning Databases 290 Specific Processing Unit 291 Input section 292 Processing Unit 293 Output section< / url:>

Claims

1. An input section for acquiring user data, A processing unit that performs specific processing using a data generation model that generates predetermined inference results according to the user data, Using the results of the specified processing, an output unit is provided which plays sound from the speakers of two earphones, one of which is attached to each of the user's left and right ears, and which has a vibration generating unit on at least one of them. Equipped with, The input unit acquires learning content, which includes learning content information that can be reproduced as audio, as user data, based on the learning content performed by the user. The processing unit performs the process of generating the vibration information as the specific process by inputting a prompt to the data generation model that instructs it to generate vibration information that vibrates the vibration generating unit in sync with the timing of playing back in audio the important learning parts of the learning content indicated by the learning content information, based on the user data. The output unit reproduces the learning content indicated by the learning content information from the speaker of the earphone, and also vibrates the vibration generating unit using the vibration information. Data processing device.

2. The learning content further includes sound effect information capable of playing sound effects related to the learning content, The output unit plays the learning content indicated by the learning content information from the speaker of one of the earphones, and plays the sound effect indicated by the sound effect information from the speaker of the other earphone. The data processing device according to claim 1.

3. At least one of the two earphones is equipped with a sensor that detects at least one of the user's biometric data, the user's movement data, and the user's surrounding environment data. The input unit further acquires at least one of the biological data, motion data, and environmental data from the sensor as user data. A data processing device according to claim 1 or claim 2.

4. At least one of the two earphones is equipped with a microphone, The input unit further acquires sounds from the microphone that surround the user as user data. The processing unit further performs a process to adjust the intensity of the output target by the output unit in accordance with the ambient sound, as the specific processing. A data processing device according to claim 1 or claim 2.

5. At least one of the two earphones further includes a noise-canceling function. The processing unit further performs a process to adjust the intensity of the noise-canceling function according to the ambient sound, as the specific process. The data processing device according to claim 4.

6. Retrieve user data, A specific processing is performed using a data generation model that generates predetermined inference results according to the user data. A data processing method that uses the results of the aforementioned specific processing to perform a process of playing sound from the speakers of two earphones, each equipped with a vibration generating unit at least on one side and worn one on each of the user's left and right ears, The learning content, which includes learning content information that can be played back as audio, is acquired as user data by the user. Based on the user data, the process of generating the vibration information is performed as the specific process by inputting a prompt to the data generation model instructing it to generate vibration information that vibrates the vibration generating unit in sync with the timing of playing back in audio the important learning parts of the learning content indicated by the learning content information. The learning content indicated by the learning content information is reproduced from the speaker of the earphone, and the vibration generating part is vibrated using the vibration information. A data processing method in which a computer performs the processing.

7. Retrieve user data, A specific processing is performed using a data generation model that generates predetermined inference results according to the user data. A process that uses the results of the aforementioned specific processing to reproduce sound from the speakers of two earphones, each equipped with a vibration generating unit on at least one side and worn one on each of the user's left and right ears, The learning content, which includes learning content information that can be played back as audio, is acquired as user data by the user. Based on the user data, the process of generating the vibration information is performed as the specific process by inputting a prompt to the data generation model instructing it to generate vibration information that vibrates the vibration generating unit in sync with the timing of playing back in audio the important learning parts of the learning content indicated by the learning content information. The learning content indicated by the learning content information is reproduced from the speaker of the earphone, and the vibration generating part is vibrated using the vibration information. A data processing program that instructs a computer to perform a task.