Data processing device, data processing system, data processing method, and program

The data processing system uses earphones to capture surroundings, filter irrelevant visual information, and generate audio messages based on user profiles, addressing visual overload and enhancing concentration.

JP2026096809APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-03
Publication Date
2026-06-15

AI Technical Summary

Technical Problem

In modern urban environments, visual information such as digital signage and advertisements distracts users, leading to decreased concentration.

Method used

A data processing system that includes earphones with cameras to capture surroundings, extracts relevant visual information based on user profiles, and generates audio messages to provide useful information while filtering out irrelevant visual content.

🎯Benefits of technology

Enhances user concentration by providing relevant information and reducing visual overload, with the system capable of real-time processing and notification in noisy environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026096809000001_ABST
    Figure 2026096809000001_ABST
Patent Text Reader

Abstract

Solving the problem of visual information overload. [Solution] The system includes an acquisition unit that acquires video data including images of the user's surroundings, an extraction unit that extracts visual information directed at an unspecified number of people from the video data, a database that stores user profile data indicating the user's characteristics or attributes, a selection unit that selects the visual information based on the user profile data, and a message generation unit that generates a message containing the content of the selected visual information using a data generation model.
Need to check novelty before this filing date? Find Prior Art

Description

【Technical Field】 【0001】 The technology of the present disclosure relates to a data processing device, a data processing system, a data processing method, and a program. 【Background Art】 【0002】 Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance. 【Prior Art Documents】 【Patent Documents】 【0003】 【Patent Document 1】 Japanese Patent Application Laid-Open No. 2022-180282 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0004】 In modern urban environments, visual information such as digital signage, advertisements, and neon signs is overflowing, distracting users' attention and contributing to a decline in concentration. 【Means for Solving the Problems】 【0005】 The data processing device relating to the disclosed technology includes: an acquisition unit that acquires video data including images of the user's surroundings; an extraction unit that extracts visual information directed at an unspecified number of people from the video data; a database storing user profile data indicating the user's characteristics or attributes; a selection unit that selects the visual information based on the user profile data; and a message generation unit that generates a message containing the content of the selected visual information using a data generation model. 【0006】 The extraction unit may extract the visual information from the video of the information transmission medium included in the video data. 【0007】 The user profile data may include at least one of the user's hobbies, interests, behavioral history, web browsing history, application usage history, schedule, age, gender, and affiliation. 【0008】 The data processing system relating to the disclosed technology includes the above-described data processing device and an earphone worn by the user. The earphone includes a camera that photographs the user's surroundings and generates the video data, and a speaker that outputs the message. The data processing device includes a communication unit that transmits audio data, including the content of the message, to the earphone. 【0009】 The earphone may include a microphone and a vibration motor. The communication unit may transmit a control signal to the earphone, along with the audio data, to vibrate the vibration motor if the volume of ambient sound input to the microphone exceeds a threshold. 【0010】 The earphone may include a sensor for acquiring biometric information. The selection unit may select the visual information based on the user's emotions estimated based on the biometric information. 【0011】 The data processing method related to the disclosed technology involves a computer acquiring video data including images of the user's surroundings, extracting visual information directed at an unspecified number of people from the video data, selecting the visual information based on user profile data indicating the user's characteristics or attributes, and using a data generation model to generate a message containing the content of the selected visual information. 【0012】 The program relating to the disclosed technology is a program that causes a computer to perform the following processes: acquire video data including images of the user's surroundings, extract visual information directed at an unspecified number of people from the video data, select the visual information based on user profile data indicating the user's characteristics or attributes, and generate a message containing the content of the selected visual information using a data generation model. [Brief explanation of the drawing] 【0013】 [Figure 1] Figure 1 is a conceptual diagram showing an example of the configuration of a data processing system. [Figure 2] Figure 2 is a conceptual diagram showing an example of the main functions of a data processing device and earphones. [Figure 3A] Figure 3A shows an example of an earphone configuration. [Figure 3B] Figure 3B shows the user wearing earphones. [Figure 3C] Figure 3C is a diagram illustrating the field of view of camera 42. [Figure 3D] Figure 3D shows the user wearing the earphones. [Figure 3E] Figure 3E shows the user wearing the earphones. [Figure 3F] Figure 3F shows the user wearing earphones. [Figure 4] Figure 4 schematically shows the functional configuration of a specific processing unit of the data processing device. [Figure 5] Figure 5 schematically shows an example of the operation flow of a specific process performed by a data processing device. [Figure 6] FIG. 6 is a diagram showing the flow of various data transmitted and received between the earphone and the data processing device. [Figure 7] FIG. 7 is a diagram schematically showing the functional configuration of a specific processing unit of the data processing device. [Figure 8] FIG. 8 is a diagram schematically showing an example of the operation flow of specific processing by the data processing device. [Figure 9] FIG. 9 is a diagram showing an example of the configuration of the earphone. 【Embodiments for Carrying out the Invention】 【0014】 Hereinafter, an example of an embodiment of a data processing device, a data processing method, and a program according to the technology of the present disclosure will be described with reference to the accompanying drawings. 【0015】 First, the terms used in the following description will be described. 【0016】 In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), or an APU (Accelerated Processing Unit). 【0017】 In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor. 【0018】 In the following embodiments, the signed storage is one or more non-volatile storage devices that store various programs and various parameters. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes. 【0019】 In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark). 【0020】 In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or." 【0021】 [First Embodiment] Figure 1 shows an example of the configuration of the data processing system 10 according to the embodiment. 【0022】 As shown in Figure 1, the data processing system 10 includes a data processing device 12 and earphones 14. An example of the data processing device 12 is a server. In this embodiment, the data processing device 12 is an example of a "data processing device" according to the technology of this disclosure. 【0023】 The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network). 【0024】 The earphone 14 includes a computer 36, a microphone 38, a speaker 40, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 38, speaker 40, and camera 42 are also connected to the bus 52. 【0025】 The microphone 38 receives voice signals from the user 20 and accepts instructions from the user 20. The microphone 38 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 40 outputs audio according to the instructions from the processor 46. Hereafter, the microphone 38 may be simply referred to as the microphone 38. 【0026】 Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision). 【0027】 Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54. 【0028】 Figure 2 shows an example of the main functions of the data processing device 12 and the earphone 14. 【0029】 As shown in Figure 2, in the data processing device 12, specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "data processing program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30. 【0030】 The storage 32 stores the data generation model 58. The data generation model 58 is used by the specific processing unit 290. 【0031】 (Earphones 14) In the earphone 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48. 【0032】 The earphone 14 may be interpreted as a canal-type earphone that is fitted into the ear canal of the user 20, as shown in Figure 3A. However, the earphone 14 is not limited to a canal type; it may also be an inner-ear type earphone that is inserted into the inner ear of the user 20, or a headphone type earphone that covers the entire ear of the user 20. Each of the two earphones 14 is equipped with a microphone 38, a speaker 40, and a camera 42. The sound and images collected by the two earphones 14 fitted into the ears of the user 20 may be recorded as a life log in the database 24. 【0033】 The life log can be interpreted as a history of the user 20's actions in daily life, and may include sounds and images associated with the user 20, specifically sounds collected by the microphone 38 and images taken by the camera 42 during daily life. The life log may record sounds and images associated with the user 20, along with the date, time, and location in which they were acquired. 【0034】 The sounds collected by the microphone 38 may include the voice of the person the user 20 is talking to, and sounds that occur around the user 20 while walking or cycling (such as the sound of cars driving, birds chirping, the babbling of a stream, and the sound of trees swaying in the wind). 【0035】 As shown in Figure 3C, the camera 42 may capture images of the scenery within its field of view that is in front of the user 20, or it may capture images of scenery within its field of view that is not in front of the user 20, for example, to the side, behind, below, or above the user 20. The images captured by the camera 42 may include images of the person the user 20 is talking to, the scenery around the user 20 when they are walking or cycling, and images of the pet the user 20 is walking with. 【0036】 Since each of the two earphones 14 is equipped with a camera 42, the two earphones 14 worn on the user's ears 20 are positioned at a specific distance apart, one on the left ear and the other on the right ear, as shown in Figure 3B. Therefore, compared to cases where two cameras are arranged side by side in a single housing, such as in a video camera, the spacing between the two cameras 42 can be increased, making 3D sensing easier. 3D sensing can be interpreted as measuring three-dimensional shapes. 【0037】 Furthermore, when the two earphones 14 are placed in the user 20's ears, the two cameras 42 are positioned close to the user 20's left and right eyes, allowing images (captured images) that are nearly identical to those seen with the naked eye to be recorded as a life log in the database 24. Consequently, in specific processing, it becomes easier to reproduce information corresponding to inquiries from the user 20, that is, information corresponding to the content of the user 20's speech. 【0038】 While the two earphones 14 are attached to the user 20, all or part of the images captured by the camera 42 may be recorded in the database 24 as a life log. Specifically, when the two earphones 14 are attached to the user 20, the recording of images captured by the camera 42 to the database 24 may begin, and when the two earphones 14 are removed from the user 20, the recording of those images to the database 24 may end. 【0039】 While the two earphones 14 are worn by the user 20, all or part of the sound collected by the microphone 38 may be recorded as a lifelog in the database 24. Specifically, when the two earphones 14 are worn by the user 20, the recording of the sound collected by the microphone 38 to the database 24 may begin, and when the two earphones 14 are removed from the user 20, the recording of the sound to the database 24 may end. 【0040】 Next, we will describe the processing of the specific processing unit 290 when the data processing device 12 receives an utterance from the user 20 wearing the earphones 14 regarding the user 20's memories or actions, and performs specific processing to propose information corresponding to the content of the user 20's utterance to the user 20. 【0041】 (Specific processing) In this embodiment, the specific processing involves inputting user data and performing specific processing using a data generation model that generates predetermined inference results corresponding to the input user data. Specifically, in the specific processing, when utterances related to the user's memories or actions are received as user data from a user 20 wearing earphones 14, the system refers to the database 24 and performs processing to propose information corresponding to the content of the utterances to the user 20. Specifically, after a life log is recorded in the database 24, if the user 20 wearing earphones 14 makes an utterance related to the user's memories or actions, the specific processing may involve referring to the database 24 and proposing information corresponding to the content of the utterances to the user 20. 【0042】 (Example of specific processing) If the user wearing the earphones requests a message that will trigger the recall of a specific memory, the specific processing unit 290 may propose one or more messages selected based on the life log to the user who made the request, as information corresponding to the content of the utterance (request). 【0043】 For example, if user 20, wearing earphones 14, tries to recall their memory and asks, "What did I say to person A around [date] at [time]?", the identification processing unit 290, as part of its identification process, inputs this message as a prompt to the data generation model 58. The identification processing unit 290 may refer to the life log in database 24 and, based on the output obtained from the data generation model 58, generate a message such as, "I think you said, 'I found a nice restaurant, let's make a reservation.'" This message may be interpreted as an example of information corresponding to the content of user 20's utterance. 【0044】 For example, if user 20 wearing earphones 14 tries to recall their memory and asks, "Who was I talking to around [date] at [time]?", the identification processing unit 290 will input this message as a prompt to the data generation model 58 as part of its identification process. The identification processing unit 290 may refer to the life log in database 24 and, based on the output obtained from the data generation model 58, generate a message such as, "It seems you were talking with two friends at that time, probably B and C." This message may be interpreted as an example of information corresponding to the content of user 20's utterance. 【0045】 For example, if user 20, wearing earphones 14, tries to recall their emotions and says, "How did I feel when I was talking to person A around [date] at [time]?", the identification processing unit 290, as part of its identification process, inputs this message as a prompt to the data generation model 58. The identification processing unit 290 may refer to the life log in database 24 and, based on the output obtained from the data generation model 58, generate a message such as, "At that time, you were laughing a lot, so it seems you had a good impression of your friend and were very happy." This message may be interpreted as an example of information corresponding to the content of user 20's utterance. 【0046】 (Example of specific processing, part 2) If a user 20 wearing earphones 14 mutters a specific matter as part of their utterance, the specific processing unit 290 may suggest to the user 20 who requested the message, based on their life log, recommended actions for the user 20 regarding that matter, as information corresponding to the content of their utterance (muttering). 【0047】 For example, when user 20 wearing earphones 14 is shopping at a specific retail store and says, "What should I buy?", the specific processing unit 290 inputs this message as a prompt to the data generation model 58 as a specific processing step. The specific processing unit 290 may refer to the life log in the database 24 and, based on the output obtained by the data generation model 58, generate a message such as, "A few months ago, you purchased product A at this store and commented that it wasn't very tasty, so how about purchasing recently released products B and C this time?" This message may be interpreted as an example of information corresponding to the content of user 20's utterance. 【0048】 (Third example of specific processing) As shown in Figure 3D, when user 20, wearing earphones 14, is operating a PC and says, "What was the name of product A that I searched for the day before yesterday?", the identification processing unit 290 inputs this message as a prompt to the data generation model 58 as part of its identification processing. The data generation model 58 refers to the life log in the database 24 and analyzes the video of the PC screen when user 20 was operating it in the past to generate a specific output. Based on the output obtained by the data generation model 58, the identification processing unit 290 may generate a message such as "Product A is ○○○". This message may be interpreted as an example of information corresponding to the content of user 20's utterance. 【0049】 (Fourth example of specific processing) As shown in Figure 3E, if user 20, wearing earphones 14, says "There was a place nearby with a great view, but I wonder where it is?" while cycling, the identification processing unit 290 inputs this message as a prompt to the data generation model 58 as part of its identification process. The data generation model 58 refers to the life log in database 24 and analyzes places previously visited by user 20 and the route to those places to generate a specific output. Based on the output obtained by the data generation model 58, the identification processing unit 290 may generate a message such as "I think it's Cape XX, about 500m from here." This message can be interpreted as an example of information corresponding to the content of user 20's utterance. 【0050】 (Example 5 of specific processing) As shown in Figure 3F, when user 20, wearing earphones 14, meets Mr. X at company A, the company he is visiting, and says, "Can you tell me this person's name?", the identification processing unit 290 inputs this message as a prompt to the data generation model 58 as part of the identification process. The data generation model 58 refers to the life log in database 24 and generates specific output from the history of people that user 20 met when he visited company A. Based on the output obtained from the data generation model 58, the identification processing unit 290 may generate a message such as, "I think his name is ○○." This message may be interpreted as an example of information corresponding to the content of user 20's utterance. 【0051】 As shown in Figure 4, the specific processing unit 290 includes an input unit 291, a processing unit 292, and an output unit 293. 【0052】 The input unit 291 acquires user input received through the earphone 14. Specifically, it acquires the user's voice received through the earphone 14. 【0053】 The processing unit 292 performs specific processing using the data generation model 58. Specifically, it inputs voice from the user into the data generation model 58 and obtains a generation result. More specifically, when it receives an utterance from the user 20 wearing the earphones 14 regarding the user 20's memories or actions, it performs a specific processing step of proposing information corresponding to the content of the utterance to the user 20. 【0054】 The output unit 293 transmits the result of the specific processing to the earphone 14. In the earphone 14, the control unit 46A causes the speaker 40 to output the result of the specific processing. The microphone 38 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data. 【0055】 Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include those described above. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions shown by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization. 【0056】 Next, the operation of the data processing system 10 will be explained. 【0057】 An example of the flow of a specific processing method will be explained with reference to Figure 5. Note that the flow of a specific processing method shown in Figure 5 is an example of a "data processing method" related to the technology disclosed herein. 【0058】 In step S300, the data processing device 12 receives user data, including sound and images collected by the two earphones 14. 【0059】 In step S302, if the data processing device 12 receives an utterance from the user wearing the earphones 14 regarding the user's memories or actions, it executes a specific process to propose information corresponding to the content of the utterance to the user 20 based on the user's life log. 【0060】 In step S303, the data processing device 12 executes a process to play back the result of a specific process from the speaker 40. 【0061】 [Second Embodiment] In modern urban environments, visual information such as digital signage, advertisements, and neon signs is abundant, distracting users and leading to decreased concentration. The data processing system according to the second embodiment of the disclosed technology solves the problem of visual information overload. Details of the data processing system according to the second embodiment are described below. 【0062】 Figure 6 shows the flow of various data transmitted and received between the earphone 14 and the data processing device 12. The camera 42 in the earphone 14 captures images of the area around the user 20 and generates video data. The video captured by the camera 42 includes images of information transmission media that provide visual information to an unspecified number of people, such as digital signage, advertisements, billboards, and signs, that are present around the user 20. The video data generated by the camera 42 is transmitted to the data processing device 12 via the communication I / F 44 in the earphone 14. 【0063】 The processor 28 of the data processing device 12 operates as a specific processing unit 290A by executing a specific processing program 56 on the RAM 30. As shown in Figure 7, the specific processing unit 290A includes an acquisition unit 501, an extraction unit 502, a sorting unit 503, a message generation unit 504, and a communication unit 505. 【0064】 The acquisition unit 501 acquires video data transmitted from the earphone 14. The extraction unit 502 identifies video from information transmission media such as digital signage, advertisements, billboards, and signs from the video data acquired by the acquisition unit 501. The extraction unit 502 extracts visual information intended for an unspecified number of people from the video of the identified information transmission media. Visual information is information represented by characters, symbols, figures, marks, etc. that are recognized by sight. Visual information may include, for example, sales information, store information, weather information, current events information, time information, traffic information, warning information, guidance information, etc. The extraction unit 502 extracts visual information from the video data using image recognition technology. 【0065】 The sorting unit 503 sorts the visual information extracted by the extraction unit 502 based on user profile data that indicates the characteristics or attributes of user 20. The user profile data is recorded in the database 24. The user profile data includes user 20's hobbies, interests, behavioral history, web browsing history, product purchase history, application usage history, schedule, age, gender, and affiliation. The user profile data may also be acquired by the data processing device 12 in cooperation with user terminals such as smartphones and personal computers used by user 20. For example, web browsing, product purchases, and application usage history performed using the user terminal may be transmitted from the user terminal to the data processing device 12. Some or all of the user profile data may be provided to the data processing device 12 through input operations on the user terminal. The processor 28 of the data processing device 12 records the acquired user profile data in the database 24. 【0066】 The selection unit 503 identifies the user 20's requests, desires, interests, and concerns based on the user profile data. The selection unit 503 then selects visual information from the visual information extracted by the extraction unit 502 that corresponds to the user 20's requests, desires, interests, and concerns. In other words, from the visual information, information that the user 20 is interested in, information that is useful to the user 20, and information that the user 20 needs are selected. For example, if the user profile data identifies that the user 20 is interested in a particular event, then visual information related to that event is selected. 【0067】 The message generation unit 504 generates a message containing the content of the visual information selected by the selection unit 503 using the data generation model 58. The message generation unit 504 inputs a prompt statement to the data generation model 58 instructing it to generate audio to explain the content of the visual information selected by the selection unit 503. The data generation model 58 generates audio to explain the content of the visual information selected based on the prompt statement. 【0068】 The communication unit 505 transmits the audio data of the message voice generated by the message generation unit 504 to the earphone 14. 【0069】 Figure 8 is a flowchart showing an example of the flow of a specific processing operation performed by the specific processing unit 290A of the data processing device 12. 【0070】 In step S310, the acquisition unit 501 acquires video data transmitted from the earphone 14. The video data is generated when the camera 42 of the earphone 14 takes pictures of the area around the user 20. The video data includes images from information transmission media that provide visual information to an unspecified number of people, such as digital signage, advertisements, billboards, and signs. 【0071】 In step S311, the extraction unit 502 extracts visual information directed at an unspecified number of people from the video data acquired in step S310. 【0072】 In step S312, the selection unit 503 selects the visual information extracted in step S311 based on the user profile data recorded in the database 24. The user profile data includes at least one of the user 20's hobbies, interests, behavioral history, web browsing history, application usage history, schedule, age, gender, and affiliation. 【0073】 In step S313, the message generation unit 504 uses the data generation model 58 to generate a message containing the content of the visual information selected in step S312. Specifically, the message generation unit 504 inputs a prompt statement to the data generation model 58 instructing it to generate audio to explain the content of the visual information selected by the selection unit 503. The data generation model 58 generates audio to explain the content of the selected visual information based on the prompt statement. 【0074】 In step S314, the communication unit 505 transmits the audio data of the message generated in step S313 to the earphone 14. The earphone 14 outputs an audio message containing the content of the visual information selected based on the user profile data. The processing from step S310 to step S314 is performed in real time. That is, the visual information captured by the camera 42 of the earphone 14 is immediately selected by the data processing device 12, and audio data corresponding to that content is provided to the user 20. 【0075】 As described above, the data processing device 12 according to this embodiment includes an acquisition unit 501 that acquires video data including video of the user 20's surroundings, an extraction unit 502 that extracts visual information directed at an unspecified number of people from the video data, a database 24 that stores user profile data indicating the user's characteristics or attributes, a selection unit 503 that selects visual information based on the user profile data, and a message generation unit 504 that generates a message containing the content of the selected visual information using a data generation model 58. 【0076】 According to the data processing system 10 of this embodiment, video data captured at a field of view that substantially matches the user 20's visual range is acquired by two cameras 42 on two earphones 14 attached to the user 20. The video data includes visual information provided through information transmission media such as digital signage, advertisements, billboards, and signs. The visual information surrounding the user 20 includes a lot of information that the user 20 is not interested in or does not need. According to the data processing system 10 of this embodiment, information that the user 20 is interested in, information that is useful to the user 20, and information that the user 20 needs are selected from the visual information and provided to the user 20. In other words, information that the user 20 is not interested in or does not need is blocked out. This solves the problem of visual information overload, which can distract the user 20 and lead to decreased concentration. 【0077】 As shown in Figure 9, the earphone 14 may have a vibration motor 43. The communication unit 505 of the data processing device 12 may send a control signal to the earphone 14, along with the message audio, to vibrate the vibration motor 43 if the volume of ambient sound input to the microphone 38 of the earphone 14 exceeds a threshold. This makes it possible for the user 20 to be notified of the arrival of a message containing selected visual information, even if the user 20 is in a noisy environment. 【0078】 Furthermore, as shown in Figure 9, the earphone 14 may have a sensor 45 for acquiring the user 20's biometric information. The biometric information acquired by the sensor 45 may be, for example, blood pressure, heart rate, body temperature, or electroencephalogram (EEG). The data processing device 12 may estimate the user's emotions by inputting the biometric information acquired by the sensor 45 into a data generation model 58. The selection unit 503 may select visual information based on the estimated emotions of the user 20. For example, if the estimated emotions of the user 20 are negative, visual information that reinforces negative emotions may be excluded by the selection process of the selection unit 503. 【0079】 The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format. 【0080】 In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing method for the specific process may be used, which includes computer 22 and multiple other computers. 【0081】 In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56. 【0082】 Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12. 【0083】 Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56. 【0084】 The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using this memory. 【0085】 The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor. 【0086】 Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources. 【0087】 Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose. 【0088】 The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above. 【0089】 All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference. 【0090】 Furthermore, the following additional information is disclosed regarding the above explanation. (Note 1) An acquisition unit that acquires video data including video of the user's surroundings, An extraction unit that extracts visual information directed at an unspecified number of people from the aforementioned video data, A database containing user profile data that indicates the characteristics or attributes of the aforementioned user, A selection unit that selects the visual information based on the user profile data, A message generation unit that generates a message containing the content of selected visual information using a data generation model, A data processing device that includes a data processing device. 【0091】 (Note 2) The extraction unit extracts the visual information from the video of the information transmission medium included in the video data. The data processing device described in Appendix 1. 【0092】 (Note 3) The user profile data includes at least one of the user's hobbies, interests, behavioral history, web browsing history, application usage history, schedule, age, gender, and affiliation. The data processing device described in Appendix 1 or Appendix 2. 【0093】 (Note 4) A data processing device described in any one of the appendices 1 to 3, The earphones to be worn by the aforementioned user, A data processing system including, The earphone includes a camera that captures the user's surroundings and generates the video data, Includes a speaker that outputs the aforementioned message, The data processing device includes a communication unit that transmits audio data, including the content of the message, to the earphone. Data processing system. 【0094】 (Note 5) The aforementioned earphone includes a microphone and a vibration motor. The communication unit transmits a control signal to the earphone, along with the audio data, to vibrate the vibration motor when the volume of ambient sound input to the microphone exceeds a threshold. The data processing system described in Appendix 4. 【0095】 (Note 6) The aforementioned earphone includes a sensor for acquiring biometric information. The sorting unit sorts the visual information based on the emotions of the user estimated based on the biometric information. The data processing system described in Appendix 4 or Appendix 5. 【0096】 (Note 7) Computers Acquire video data including images of the user's surroundings, From the aforementioned video data, visual information directed at an unspecified number of people is extracted, Based on user profile data that indicates the characteristics or attributes of the user, the visual information is selected. Using a data generation model, generate messages containing the content of selected visual information. A data processing method for performing data processing. 【0097】 (Note 8) Acquire video data including images of the user's surroundings, From the aforementioned video data, visual information directed at an unspecified number of people is extracted, Based on user profile data that indicates the characteristics or attributes of the user, the visual information is selected. Using a data generation model, generate messages containing the content of selected visual information. A program that causes a computer to perform a process. [Explanation of symbols] 【0098】 10 Data Processing Systems 12 Data Processing Devices 14 Earphones 290 Specific Processing Unit 291 Input section 292 Processing Unit 293 Output section< / url:>

Claims

[Claim 1] An acquisition unit that acquires video data including video of the user's surroundings, An extraction unit that extracts visual information directed at an unspecified number of people from the aforementioned video data, A database containing user profile data that indicates the characteristics or attributes of the aforementioned user, A selection unit that selects the visual information based on the user profile data, A message generation unit that generates a message containing the content of selected visual information using a data generation model, A data processing device that includes a data processing device. [Claim 2] The extraction unit extracts the visual information from the video of the information transmission medium included in the video data. The data processing device according to claim 1. [Claim 3] The user profile data includes at least one of the user's hobbies, interests, behavioral history, web browsing history, application usage history, schedule, age, gender, and affiliation. The data processing device according to claim 1. [Claim 4] A data processing device according to any one of claims 1 to 3, The earphones to be worn by the aforementioned user, A data processing system including, The earphone includes a camera that captures the user's surroundings and generates the video data, Includes a speaker that outputs the aforementioned message, The data processing device includes a communication unit that transmits audio data, including the content of the message, to the earphone. Data processing system. [Claim 5] The aforementioned earphone includes a microphone and a vibration motor. The communication unit transmits a control signal to the earphone, along with the audio data, to vibrate the vibration motor when the volume of ambient sound input to the microphone exceeds a threshold. The data processing system according to claim 4. [Claim 6] The aforementioned earphone includes a sensor for acquiring biometric information. The sorting unit sorts the visual information based on the emotions of the user estimated based on the biometric information. The data processing system according to claim 4. [Claim 7] Computers Acquire video data including images of the user's surroundings, From the aforementioned video data, visual information directed at an unspecified number of people is extracted, Based on user profile data that indicates the characteristics or attributes of the user, the visual information is selected. Using a data generation model, generate messages containing the content of selected visual information. A data processing method for performing data processing. [Claim 8] Acquire video data including images of the user's surroundings, From the aforementioned video data, visual information directed at an unspecified number of people is extracted, Based on user profile data that indicates the characteristics or attributes of the user, the visual information is selected. Using a data generation model, generate messages containing the content of selected visual information. A program that causes a computer to perform a process.