system

The system addresses communication challenges for individuals with language disorders by capturing and converting user gestures and objects into images or comics, facilitating effective communication.

JP7880387B2Active Publication Date: 2026-06-25SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-09-19
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Individuals with language disorders face challenges in effectively communicating using body movements and objects.

Method used

A system comprising a shooting unit, analysis unit, and conversion unit that captures user gestures and objects, analyzes them using AI, and converts the results into images, illustrations, or comics for communication support.

Benefits of technology

Enables effective communication for individuals with speech impairments by visually presenting options and converting gestures and objects into understandable forms.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007880387000001
    Figure 0007880387000001
  • Figure 0007880387000002
    Figure 0007880387000002
  • Figure 0007880387000003
    Figure 0007880387000003
Patent Text Reader

Abstract

To provide a system that supports effective communication for those with language disorder by utilizing gestures and objects.SOLUTION: A system comprises a capturing unit, an analysis unit, a presentation unit, and a conversion unit. The capturing unit captures the user's gestures or objects. The analysis unit analyzes the video captured by the capturing unit. The presentation unit presents candidates based on the analyzed result. The conversion unit converts the candidates presented by the presentation unit into images, illustrations, or comics.SELECTED DRAWING: Figure 1
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In the prior art, there is a problem that it is difficult for a person with a language disorder to effectively communicate using body movements and objects.

[0005] The system according to the embodiment aims to enable a person with a language disorder to effectively communicate using body movements and objects.

Means for Solving the Problems

[0006] The system according to this embodiment comprises a shooting unit, an analysis unit, a presentation unit, and a conversion unit. The shooting unit captures the user's gestures or objects. The analysis unit analyzes the images captured by the shooting unit. The presentation unit presents candidates based on the analysis results. The conversion unit converts the candidates presented by the presentation unit into images, illustrations, or comics. [Effects of the Invention]

[0007] The system according to this embodiment allows individuals with speech impairments to communicate effectively using gestures and objects. [Brief explanation of the drawing]

[0008] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Modes for carrying out the invention]

[0009] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0010] First, let's explain the terminology used in the following explanation.

[0011] In the following embodiments, the signed processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Furthermore, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include CPU (Central Processing Unit), GPU (Graphics Processing Unit), GPGPU (General-Purpose computing on Graphics Processing Units), APU (Accelerated Processing Unit), or TPU (Tensor Processing Unit).

[0012] In the following embodiments, signed RAM (Random Access Memory) is a memory that temporarily stores information and is used as work memory by the processor.

[0013] In the following embodiments, the signed storage is one or more non-volatile storage devices that store various programs and various parameters. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes.

[0014] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor, an antenna, and the like. The communication I / F manages communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0015] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B". That is, "A and / or B" means that it may be only A, only B, or a combination of A and B. Also, in this specification, when expressing three or more matters connected by "and / or", the same concept as "A and / or B" is applied.

[0016] [First Embodiment] FIG. 1 shows an example of the configuration of a data processing system 10 according to the first embodiment.

[0017] As shown in FIG. 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0018] The data processing device 12 includes a computer 22, a database 24, and a communication I / F 26. The computer 22 includes a processor 28, a RAM 30, and a storage 32. The processor 28, the RAM 30, and the storage 32 are connected to a bus 34. Also, the database 24 and the communication I / F 26 are connected to the bus 34. The communication I / F 26 is connected to a network 54. Examples of the network 54 include a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0019] The smart device 14 comprises a computer 36, a receiving device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The receiving device 38, output device 40, and camera 42 are also connected to the bus 52.

[0020] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, and accepts user input. The touch panel 38A accepts user input via touch by detecting contact with an object (e.g., a pen or finger). The microphone 38B accepts user input via voice by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 (see Figure 2) acquires the data indicating the user input.

[0021] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user by outputting the data in a form perceptible to the user (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0022] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0023] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0024] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0025] Storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290. The identification processing unit 290 can estimate the user's emotions using the emotion identification model 59 and perform identification processing using the user's emotions. The emotion estimation function (emotion identification function) using the emotion identification model 59 performs various estimations and predictions regarding the user's emotions, including but not limited to these examples. Furthermore, emotion estimation and prediction also include, for example, emotion analysis.

[0026] In the smart device 14, specific processing is performed by the processor 46. The storage 50 stores a specific processing program 60. The specific processing program 60 is used in conjunction with the specific processing program 56 by the data processing system 10. The processor 46 reads the specific processing program 60 from the storage 50 and executes the read specific processing program 60 on the RAM 48. The specific processing is realized by the processor 46 operating as a control unit 46A according to the specific processing program 60 executed on the RAM 48. The smart device 14 also has a data generation model 58 and an emotion identification model 59, similar to the data generation model and emotion identification model 59, and can perform processing similar to that of the specific processing unit 290 using these models.

[0027] Furthermore, other devices besides the data processing device 12 may also have the data generation model 58. For example, a server device (e.g., a generation server) may have the data generation model 58. In this case, the data processing device 12 obtains processing results (such as prediction results) using the data generation model 58 by communicating with the server device having the data generation model 58. The data processing device 12 may also be a server device or a terminal device owned by a user (e.g., a mobile phone, robot, home appliance, etc.). Next, an example of processing by the data processing system 10 according to the first embodiment will be described.

[0028] (Example of form 1) The communication support system according to an embodiment of the present invention is a system used as an assistive tool for people with language disorders such as aphasia. This system uses a camera to capture gestures and objects that the user uses in their daily life, and a generating AI analyzes the captured video to provide several options to support communication even when words and objects do not match. The generating AI can also convert these into images, illustrations, or comics for communication with the other party. For example, the camera captures an object that the user is pointing to or a gesture they are making with their hand. This video is input to the generating AI. Next, the generating AI analyzes the input video. The generating AI recognizes objects and actions in the video and presents corresponding words as candidates. For example, if the object the user is pointing to is "Object A," the generating AI will present candidates such as "Object A" or "Object B." Furthermore, the generating AI converts the presented candidates into images, illustrations, or comics. For example, for the word "Object A," it generates an image, illustration, or comic of Object A. This allows the user to visually confirm the information. This mechanism enables communication support by providing several options even when words and objects do not match. Furthermore, by converting information into images, illustrations, and comics, it can be conveyed visually, enabling communication even when the other person's words cannot be understood. For example, if a person with aphasia is pointing to "object A" but cannot find the words, the AI ​​can suggest candidates such as "object A" or "object B," and then generate an image, illustration, or comic of object A to convey "object A" to the other person. In this way, communication support systems can assist communication for people with language impairments.

[0029] The communication support system according to the embodiment comprises a shooting unit, an analysis unit, a presentation unit, and a conversion unit. The shooting unit captures the user's gestures and objects. The user's gestures and objects include, but are not limited to, hand movements, facial expressions, and specific objects. The shooting unit captures the user's gestures and objects using, for example, a camera. The analysis unit analyzes the video captured by the shooting unit using a generation AI. The analysis unit recognizes objects and actions in the video using, for example, an image recognition algorithm or a motion analysis method. The presentation unit presents candidates based on the results of the analysis by the analysis unit. The presentation unit presents, for example, a list of words or a selection of images. The conversion unit converts the candidates presented by the presentation unit into images, illustrations, or comics. The conversion unit converts the presented candidates into a visually easy-to-understand form using, for example, image processing technology or an illustration generation algorithm. Thus, the communication support system according to the embodiment can support communication for people with speech impairments by capturing, analyzing, presenting, and converting the user's gestures and objects.

[0030] The camera unit captures the user's gestures and objects. These include, but are not limited to, hand movements, facial expressions, and specific objects. The camera unit uses cameras, for example, to capture the user's gestures and objects. Specifically, the cameras are high-resolution and can accurately capture even subtle movements and changes in the user's facial expressions. Both fixed and movable cameras are available and feature automatic focus adjustment based on the user's movements. Furthermore, the cameras incorporate infrared and depth sensors, enabling accurate shooting even in dark environments or against complex backgrounds. This allows the camera unit to acquire high-quality images regardless of the user's environment. Additionally, by linking multiple cameras, the camera unit can simultaneously acquire images from different angles, enabling three-dimensional analysis. This allows for a more accurate understanding of the user's movements and the spatial relationships of objects.

[0031] The analysis unit uses generative AI to analyze video footage captured by the camera unit. The analysis unit recognizes objects and actions within the video using, for example, image recognition algorithms and motion analysis techniques. Specifically, the generative AI employs a deep learning model and is pre-trained on a large dataset. This allows for high-precision recognition of hand movements, facial expressions, and specific objects. For example, when analyzing sign language movements, the generative AI analyzes the hand's position, shape, and movement speed in real time to identify corresponding sign language words and phrases. Similarly, when analyzing facial expressions, the generative AI analyzes the movement of various facial features to infer emotions and intentions. Furthermore, in object recognition, the generative AI analyzes the shape, color, and texture of objects to identify specific objects. This enables the analysis unit to accurately analyze user gestures and object information, providing the data necessary for subsequent processing.

[0032] The presentation unit presents candidates based on the results analyzed by the analysis unit. For example, the presentation unit presents a list of words or a selection of images. Specifically, it presents the user with the most suitable means of communication based on the data obtained from the analysis unit. For example, if the results of analyzing sign language movements are found, it displays a list of corresponding words for the user to select from. If the results of analyzing facial expressions are found, it presents images or icons corresponding to emotions, allowing the user to visually express their feelings. Furthermore, if object recognition is found, it presents information and options related to a specific object, enabling the user to communicate smoothly about that object. The presentation unit uses a touchscreen or voice assistant as a user interface, providing intuitive and easy-to-use operation. This allows the user to communicate quickly and accurately based on the analysis results.

[0033] The conversion unit transforms the candidates presented by the presentation unit into images, illustrations, or comics. The conversion unit uses image processing technology and illustration generation algorithms, for example, to transform the presented candidates into a visually easy-to-understand form. Specifically, the conversion unit uses a generation AI to generate corresponding illustrations or comics based on the words and images selected by the user. For example, it analyzes sign language movements and then illustrates the corresponding words, presenting them in a visually easy-to-understand format. It also analyzes facial expressions and then draws the corresponding character expressions in a comic style, allowing users to express their emotions more richly. Furthermore, based on object recognition, it draws scenes and stories related to specific objects as comics, enabling users to understand information about those objects in an enjoyable way. The conversion unit performs the conversion process in real time, allowing users to see the results immediately. This allows the conversion unit to visually support user communication, enabling smooth communication even for those with language impairments.

[0034] The analysis unit can recognize objects or actions within a video and suggest corresponding words as candidates. For example, the analysis unit can recognize objects within a video using an image recognition algorithm. For instance, the analysis unit recognizes that an object in the video is "Object A" and suggests the word "Object A" as a candidate. The analysis unit can also recognize actions within a video using motion analysis techniques. For example, the analysis unit recognizes that a user is waving and suggests the word "waving" as a candidate. In this way, the analysis unit can support communication by recognizing objects and actions within a video and suggesting corresponding words as candidates.

[0035] The presentation unit can convert the presented candidates into images, illustrations, or comics. For example, the presentation unit can convert the presented candidates into images using image processing technology. For instance, in response to the word "object A," the presentation unit generates an image of object A. The presentation unit can also convert the presented candidates into illustrations using an illustration generation algorithm. For example, in response to the word "object A," the presentation unit generates an illustration of object A. Furthermore, the presentation unit can also convert the presented candidates into comics using a comic generation algorithm. For example, in response to the word "object A," the presentation unit generates a comic of object A. In this way, the presentation unit can support communication by converting the presented candidates into a visually easy-to-understand form.

[0036] The camera unit can analyze the user's past behavior patterns during shooting and select an appropriate shooting angle or distance. For example, the camera unit can automatically select the optimal angle based on the user's preferred shooting angles in the past. For example, the camera unit can analyze footage previously shot by the user and select the optimal shooting angle. The camera unit can also set the optimal shooting distance based on the user's past behavior patterns. For example, the camera unit can analyze footage previously shot by the user and set the optimal shooting distance. Furthermore, the camera unit can analyze footage previously shot by the user and suggest optimal shooting conditions. In this way, the camera unit can automatically set the optimal shooting conditions by analyzing the user's past behavior patterns. Some or all of the above processing in the camera unit may be performed using AI, for example, or without using AI.

[0037] The camera unit can automatically adjust shooting conditions based on the user's current environmental information during shooting. For example, if the user is outdoors, the camera unit adjusts shooting conditions considering natural light. For example, if the user is outdoors, the camera unit adjusts the exposure according to the intensity of natural light. The camera unit can also set shooting conditions based on the brightness of the lighting if the user is indoors. For example, the camera unit adjusts the white balance according to the brightness of the indoor lighting. Furthermore, if the user is moving, the camera unit can automatically adjust shooting conditions according to the movement. For example, if the user is moving, the camera unit enables image stabilization. In this way, the camera unit can automatically set optimal shooting conditions by considering the user's current environmental information. Some or all of the above processing in the camera unit may be performed using AI, for example, or without using AI.

[0038] The camera unit can prioritize photographing highly relevant subjects by considering the user's geographical location information during shooting. For example, if the user is in a tourist area, the camera unit will prioritize photographing tourist attractions. For example, if the user is in a tourist area, the camera unit will automatically detect tourist attractions and prioritize photographing them. The camera unit can also prioritize photographing everyday objects if the user is at home. For example, if the user is at home, the camera unit will automatically detect everyday objects and prioritize photographing them. Furthermore, if the user is at work, the camera unit can also prioritize photographing work-related objects. For example, if the user is at work, the camera unit will automatically detect work-related objects and prioritize photographing them. In this way, the camera unit can prioritize photographing highly relevant subjects by considering the user's geographical location information. Some or all of the above processing in the camera unit may be performed using AI, for example, or without using AI.

[0039] The camera unit can analyze the user's social media activity during shooting and photograph relevant subjects. For example, the camera unit can photograph relevant subjects based on the content the user frequently posts on social media. For example, the camera unit can analyze the content the user frequently posts on social media, automatically detect relevant subjects, and photograph them. The camera unit can also prioritize photographing subjects that the user's followers are likely to be interested in. For example, the camera unit can automatically detect and photograph subjects that the user's followers are likely to be interested in. Furthermore, the camera unit can also photograph relevant subjects based on the content the user has previously "liked". For example, the camera unit can analyze the content the user has previously "liked", automatically detect relevant subjects, and photograph them. In this way, the camera unit can prioritize photographing relevant subjects by analyzing the user's social media activity. Some or all of the above processing in the camera unit may be performed using AI, for example, or without AI.

[0040] The analysis unit can optimize its analysis algorithm by referring to past data of objects and actions within the video during analysis. For example, the analysis unit can optimize its analysis algorithm based on past data of objects within the video. For example, the analysis unit can optimize its analysis algorithm by referring to past data of objects within the video. The analysis unit can also optimize its analysis algorithm based on past data of actions within the video. For example, the analysis unit can optimize its analysis algorithm by referring to past data of actions within the video. Furthermore, the analysis unit can optimize its analysis algorithm by comprehensively referring to past data of objects and actions within the video. In this way, the analysis unit can optimize its analysis algorithm by referring to past data and improve the accuracy of the analysis.

[0041] The analysis unit can apply different analysis methods depending on the category of objects and actions in the video during analysis. For example, if an object in the video is stationary, the analysis unit applies an analysis method specialized for stationary objects. For example, the analysis unit recognizes that an object in the video is stationary and applies an analysis method specialized for stationary objects. The analysis unit can also apply an analysis method specialized for dynamic actions if the action in the video is dynamic. For example, the analysis unit recognizes that the action in the video is dynamic and applies an analysis method specialized for dynamic actions. Furthermore, the analysis unit can select the optimal analysis method according to the category of objects and actions in the video. This allows the analysis unit to improve analysis accuracy by applying an analysis method appropriate to the category.

[0042] The analysis unit can determine the priority of analysis based on when the video was shot. For example, the analysis unit can prioritize the analysis of the most recent video. For example, the analysis unit can automatically detect the most recent video and prioritize its analysis. The analysis unit can also prioritize the analysis of video shot within a specific period. For example, the analysis unit can automatically detect video shot within a specific period and prioritize its analysis. Furthermore, the analysis unit can prioritize the analysis of video within a period specified by the user. For example, the analysis unit can automatically detect video within a period specified by the user and prioritize its analysis. In this way, the analysis unit can prioritize the analysis of the most recent information by determining the priority of analysis based on when the video was shot.

[0043] The analysis unit can improve the accuracy of its analysis by referring to relevant literature on the video during the analysis. For example, the analysis unit can improve the accuracy of its analysis by referring to literature related to objects in the video. For example, the analysis unit can improve the accuracy of its analysis by referring to literature related to objects in the video. Furthermore, the analysis unit can improve the accuracy of its analysis by referring to literature related to actions in the video. For example, the analysis unit can improve the accuracy of its analysis by referring to literature related to actions in the video. In addition, the analysis unit can improve the accuracy of its analysis by comprehensively referring to literature related to objects and actions in the video. In this way, the analysis unit can improve the accuracy of its analysis by referring to relevant literature.

[0044] The presentation unit can adjust the level of detail in its presentation based on the importance of the candidates. For example, it can present highly important candidates in detail. It can also present less important candidates concisely. Furthermore, the presentation unit can adjust the level of detail in stages according to importance. This allows the presentation unit to provide more appropriate information by adjusting the level of detail in its presentation based on the importance of the candidates.

[0045] The presentation unit can apply different presentation algorithms depending on the category of the candidate during presentation. For example, when presenting object candidates, the presentation unit applies an object-specific presentation algorithm. Similarly, when presenting action candidates, the presentation unit can apply an action-specific presentation algorithm. Furthermore, the presentation unit can select the optimal presentation algorithm depending on the category. This allows the presentation unit to improve the accuracy of presentations by applying a presentation algorithm appropriate to the category.

[0046] The presentation unit can determine the priority of presentations based on the submission date of the candidates. For example, the presentation unit can prioritize the presentation of the most recent candidates. For example, the presentation unit can automatically detect and prioritize the most recent candidates. The presentation unit can also prioritize the presentation of candidates submitted within a specific period. For example, the presentation unit can automatically detect and prioritize the presentation of candidates submitted within a specific period. Furthermore, the presentation unit can prioritize the presentation of candidates submitted within a period specified by the user. For example, the presentation unit can automatically detect and prioritize the presentation of candidates submitted within a period specified by the user. In this way, the presentation unit can prioritize the presentation of the most recent information by determining the priority of presentations based on the submission date.

[0047] The presentation unit can adjust the order of presentation based on the relevance of the candidates. For example, the presentation unit can prioritize the presentation of highly relevant candidates. For instance, the presentation unit can automatically detect and prioritize the presentation of highly relevant candidates. The presentation unit can also postpone the presentation of less relevant candidates. For example, the presentation unit can automatically detect and postpone the presentation of less relevant candidates. Furthermore, the presentation unit can adjust the order of presentation in stages according to relevance. This allows the presentation unit to provide more appropriate information by adjusting the order of presentation based on relevance.

[0048] The conversion unit can optimize the conversion algorithm by referring to past conversion data of candidates during the conversion process. For example, the conversion unit can select the optimal conversion algorithm based on past conversion data. For example, the conversion unit can refer to past conversion data and select the optimal conversion algorithm. The conversion unit can also analyze past conversion data and optimize the conversion algorithm. For example, the conversion unit can analyze past conversion data and optimize the conversion algorithm. Furthermore, the conversion unit can comprehensively refer to past conversion data and optimize the conversion algorithm. In this way, the conversion unit can optimize the conversion algorithm by referring to past conversion data and improve conversion accuracy.

[0049] The conversion unit can apply different conversion methods depending on the candidate category during conversion. For example, when converting an object candidate, the conversion unit applies an object-specific conversion method. Similarly, when converting an action candidate, the conversion unit can apply an action-specific conversion method. Furthermore, the conversion unit can select the optimal conversion method according to the category. This allows the conversion unit to improve conversion accuracy by applying a conversion method appropriate to the category.

[0050] The conversion unit can determine the conversion priority based on when the candidates were submitted. For example, the conversion unit can prioritize the conversion of the most recent candidates. For example, the conversion unit can automatically detect the most recent candidates and convert them preferentially. The conversion unit can also prioritize the conversion of candidates submitted within a specific period. For example, the conversion unit can automatically detect candidates submitted within a specific period and convert them preferentially. Furthermore, the conversion unit can also prioritize the conversion of candidates within a period specified by the user. For example, the conversion unit can automatically detect candidates within a period specified by the user and convert them preferentially. In this way, the conversion unit can prioritize the conversion of the latest information by determining the conversion priority based on the submission date.

[0051] The conversion unit can improve the accuracy of the conversion by referring to related literature for the candidate during the conversion process. For example, the conversion unit can improve the accuracy of the conversion by referring to literature related to the candidate. For example, the conversion unit can improve the accuracy of the conversion by referring to literature related to the candidate. The conversion unit can also improve the accuracy of the conversion by referring to past conversion data for the candidate. For example, the conversion unit can improve the accuracy of the conversion by referring to past conversion data for the candidate. Furthermore, the conversion unit can improve the accuracy of the conversion by comprehensively referring to related literature and past conversion data for the candidate. In this way, the conversion unit can improve the accuracy of the conversion by referring to related literature.

[0052] The system according to the embodiment is not limited to the example described above, and various modifications are possible, for example, as follows.

[0053] The analysis unit can improve the accuracy of its analysis by referring to the user's past behavior data. For example, the analysis unit can more accurately recognize objects and actions in the current video based on data of objects and actions that the user has previously instructed. Furthermore, the analysis unit can learn the user's past behavior patterns and build a predictive model. For example, it can prioritize presenting objects and actions that the user has frequently instructed in the past as candidates. In this way, the analysis unit can improve the accuracy of its analysis and present more appropriate candidates by referring to the user's past behavior data.

[0054] The presentation unit can customize the presentation format based on the user's visual preferences. For example, if the user prefers illustrations, the presentation unit can present options in illustration format. Similarly, if the user prefers photographs, the presentation unit can present options in photographic format. Furthermore, if the user prefers comics, the presentation unit can present options in comic book format. This allows the presentation unit to support more effective communication by customizing the presentation format based on the user's visual preferences.

[0055] The camera unit can automatically select the optimal shooting settings based on the user's past shooting data. For example, it can learn the shooting settings the user has preferred in the past and apply them to the current shoot. The camera unit can also analyze the user's past shooting data and automatically set the optimal shooting angle and distance. Furthermore, the camera unit can suggest optimal shooting conditions based on the user's past shooting data. As a result, by referring to the user's past shooting data, the camera unit can automatically select the optimal shooting settings and acquire more effective footage.

[0056] The camera unit can automatically adjust the exposure settings based on the user's current environmental information. For example, if the user is outdoors, the exposure can be adjusted according to the intensity of natural light. If the user is indoors, the exposure can be adjusted according to the brightness of the lighting. Furthermore, if the user is moving, the exposure can be automatically adjusted according to the movement. In this way, the camera unit can automatically adjust the optimal exposure settings by considering the user's current environmental information, enabling the acquisition of more appropriate images.

[0057] The camera unit can overlay relevant information based on the user's geographical location. For example, if the user is in a tourist area, information about tourist attractions can be overlaid. If the user is at home, information about nearby shops and facilities can be overlaid. Furthermore, if the user is at work, work-related information can be overlaid. In this way, the camera unit can provide richer information by overlaying relevant information based on the user's geographical location.

[0058] The photography team can automatically generate relevant hashtags based on the user's social media activity. For example, it can automatically generate relevant hashtags based on hashtags the user frequently uses on social media. It can also automatically generate hashtags that the user's followers are likely to be interested in. Furthermore, it can automatically generate relevant hashtags based on hashtags from posts the user has previously "liked". In this way, the photography team can support more effective social media posts by automatically generating relevant hashtags based on the user's social media activity.

[0059] The following briefly describes the processing flow for example form 1.

[0060] Step 1: The camera unit captures the user's gestures and objects. These include, but are not limited to, hand movements, facial expressions, and specific objects. The camera unit captures the user's gestures and objects, for example, using a camera. Step 2: The analysis unit uses a generation AI to analyze the video captured by the shooting unit. The analysis unit recognizes objects and actions in the video, for example, using image recognition algorithms and motion analysis methods. Step 3: The presentation unit presents candidates based on the results analyzed by the analysis unit. For example, the presentation unit presents a list of words or a selection of images. Step 4: The conversion unit converts the candidates presented by the presentation unit into images, illustrations, or comics. The conversion unit uses, for example, image processing technology or illustration generation algorithms to convert the presented candidates into a visually easy-to-understand form.

[0061] (Example of form 2) The communication support system according to an embodiment of the present invention is a system used as an assistive tool for people with language disorders such as aphasia. This system uses a camera to capture gestures and objects that the user uses in their daily life, and a generating AI analyzes the captured video to provide several options to support communication even when words and objects do not match. The generating AI can also convert these into images, illustrations, or comics for communication with the other party. For example, the camera captures an object that the user is pointing to or a gesture they are making with their hand. This video is input to the generating AI. Next, the generating AI analyzes the input video. The generating AI recognizes objects and actions in the video and presents corresponding words as candidates. For example, if the object the user is pointing to is "Object A," the generating AI will present candidates such as "Object A" or "Object B." Furthermore, the generating AI converts the presented candidates into images, illustrations, or comics. For example, for the word "Object A," it generates an image, illustration, or comic of Object A. This allows the user to visually confirm the information. This mechanism enables communication support by providing several options even when words and objects do not match. Furthermore, by converting information into images, illustrations, and comics, it can be conveyed visually, enabling communication even when the other person's words cannot be understood. For example, if a person with aphasia is pointing to "object A" but cannot find the words, the AI ​​can suggest candidates such as "object A" or "object B," and then generate an image, illustration, or comic of object A to convey "object A" to the other person. In this way, communication support systems can assist communication for people with language impairments.

[0062] The communication support system according to the embodiment comprises a shooting unit, an analysis unit, a presentation unit, and a conversion unit. The shooting unit captures the user's gestures and objects. The user's gestures and objects include, but are not limited to, hand movements, facial expressions, and specific objects. The shooting unit captures the user's gestures and objects using, for example, a camera. The analysis unit analyzes the video captured by the shooting unit using a generation AI. The analysis unit recognizes objects and actions in the video using, for example, an image recognition algorithm or a motion analysis method. The presentation unit presents candidates based on the results of the analysis by the analysis unit. The presentation unit presents, for example, a list of words or a selection of images. The conversion unit converts the candidates presented by the presentation unit into images, illustrations, or comics. The conversion unit converts the presented candidates into a visually easy-to-understand form using, for example, image processing technology or an illustration generation algorithm. Thus, the communication support system according to the embodiment can support communication for people with speech impairments by capturing, analyzing, presenting, and converting the user's gestures and objects.

[0063] The camera unit captures the user's gestures and objects. These include, but are not limited to, hand movements, facial expressions, and specific objects. The camera unit uses cameras, for example, to capture the user's gestures and objects. Specifically, the cameras are high-resolution and can accurately capture even subtle movements and changes in the user's facial expressions. Both fixed and movable cameras are available and feature automatic focus adjustment based on the user's movements. Furthermore, the cameras incorporate infrared and depth sensors, enabling accurate shooting even in dark environments or against complex backgrounds. This allows the camera unit to acquire high-quality images regardless of the user's environment. Additionally, by linking multiple cameras, the camera unit can simultaneously acquire images from different angles, enabling three-dimensional analysis. This allows for a more accurate understanding of the user's movements and the spatial relationships of objects.

[0064] The analysis unit uses generative AI to analyze video footage captured by the camera unit. The analysis unit recognizes objects and actions within the video using, for example, image recognition algorithms and motion analysis techniques. Specifically, the generative AI employs a deep learning model and is pre-trained on a large dataset. This allows for high-precision recognition of hand movements, facial expressions, and specific objects. For example, when analyzing sign language movements, the generative AI analyzes the hand's position, shape, and movement speed in real time to identify corresponding sign language words and phrases. Similarly, when analyzing facial expressions, the generative AI analyzes the movement of various facial features to infer emotions and intentions. Furthermore, in object recognition, the generative AI analyzes the shape, color, and texture of objects to identify specific objects. This enables the analysis unit to accurately analyze user gestures and object information, providing the data necessary for subsequent processing.

[0065] The presentation unit presents candidates based on the results analyzed by the analysis unit. For example, the presentation unit presents a list of words or a selection of images. Specifically, it presents the user with the most suitable means of communication based on the data obtained from the analysis unit. For example, if the results of analyzing sign language movements are found, it displays a list of corresponding words for the user to select from. If the results of analyzing facial expressions are found, it presents images or icons corresponding to emotions, allowing the user to visually express their feelings. Furthermore, if object recognition is found, it presents information and options related to a specific object, enabling the user to communicate smoothly about that object. The presentation unit uses a touchscreen or voice assistant as a user interface, providing intuitive and easy-to-use operation. This allows the user to communicate quickly and accurately based on the analysis results.

[0066] The conversion unit transforms the candidates presented by the presentation unit into images, illustrations, or comics. The conversion unit uses image processing technology and illustration generation algorithms, for example, to transform the presented candidates into a visually easy-to-understand form. Specifically, the conversion unit uses a generation AI to generate corresponding illustrations or comics based on the words and images selected by the user. For example, it analyzes sign language movements and then illustrates the corresponding words, presenting them in a visually easy-to-understand format. It also analyzes facial expressions and then draws the corresponding character expressions in a comic style, allowing users to express their emotions more richly. Furthermore, based on object recognition, it draws scenes and stories related to specific objects as comics, enabling users to understand information about those objects in an enjoyable way. The conversion unit performs the conversion process in real time, allowing users to see the results immediately. This allows the conversion unit to visually support user communication, enabling smooth communication even for those with language impairments.

[0067] The analysis unit can recognize objects or actions within a video and suggest corresponding words as candidates. For example, the analysis unit can recognize objects within a video using an image recognition algorithm. For instance, the analysis unit recognizes that an object in the video is "Object A" and suggests the word "Object A" as a candidate. The analysis unit can also recognize actions within a video using motion analysis techniques. For example, the analysis unit recognizes that a user is waving and suggests the word "waving" as a candidate. In this way, the analysis unit can support communication by recognizing objects and actions within a video and suggesting corresponding words as candidates.

[0068] The presentation unit can convert the presented candidates into images, illustrations, or comics. For example, the presentation unit can convert the presented candidates into images using image processing technology. For instance, in response to the word "object A," the presentation unit generates an image of object A. The presentation unit can also convert the presented candidates into illustrations using an illustration generation algorithm. For example, in response to the word "object A," the presentation unit generates an illustration of object A. Furthermore, the presentation unit can also convert the presented candidates into comics using a comic generation algorithm. For example, in response to the word "object A," the presentation unit generates a comic of object A. In this way, the presentation unit can support communication by converting the presented candidates into a visually easy-to-understand form.

[0069] The camera unit can estimate the user's emotions and adjust the timing of shooting based on those emotions. For example, the camera unit can capture the user's facial expressions with a camera and estimate their emotions using an emotion estimation algorithm. For instance, if the camera unit is tense, it can delay shooting until the user relaxes. Conversely, if the user is excited, it can start shooting immediately. Furthermore, if the camera unit is tired, it can resume shooting after a break. This allows the camera unit to acquire more appropriate footage by adjusting the timing of shooting according to the user's emotions. Emotion estimation is achieved using an emotion estimation function, such as an emotion engine or generative AI. Generative AI includes, but is not limited to, text generation AI (e.g., LLM) or multimodal generation AI.

[0070] The camera unit can analyze the user's past behavior patterns during shooting and select an appropriate shooting angle or distance. For example, the camera unit can automatically select the optimal angle based on the user's preferred shooting angles in the past. For example, the camera unit can analyze footage previously shot by the user and select the optimal shooting angle. The camera unit can also set the optimal shooting distance based on the user's past behavior patterns. For example, the camera unit can analyze footage previously shot by the user and set the optimal shooting distance. Furthermore, the camera unit can analyze footage previously shot by the user and suggest optimal shooting conditions. In this way, the camera unit can automatically set the optimal shooting conditions by analyzing the user's past behavior patterns. Some or all of the above processing in the camera unit may be performed using AI, for example, or without using AI.

[0071] The camera unit can automatically adjust shooting conditions based on the user's current environmental information during shooting. For example, if the user is outdoors, the camera unit adjusts shooting conditions considering natural light. For example, if the user is outdoors, the camera unit adjusts the exposure according to the intensity of natural light. The camera unit can also set shooting conditions based on the brightness of the lighting if the user is indoors. For example, the camera unit adjusts the white balance according to the brightness of the indoor lighting. Furthermore, if the user is moving, the camera unit can automatically adjust shooting conditions according to the movement. For example, if the user is moving, the camera unit enables image stabilization. In this way, the camera unit can automatically set optimal shooting conditions by considering the user's current environmental information. Some or all of the above processing in the camera unit may be performed using AI, for example, or without using AI.

[0072] The camera unit can estimate the user's emotions and determine the priority of subjects to film based on the estimated emotions. For example, the camera unit can film the user's facial expressions and estimate their emotions using an emotion estimation algorithm. For instance, if the user is excited, the camera unit will prioritize filming subjects that interest them. If the user is relaxed, the camera unit can also prioritize filming calming subjects. Furthermore, if the user is tense, the camera unit can prioritize filming subjects that provide a sense of security. In this way, the camera unit can acquire more appropriate footage by determining the priority of subjects to film according to the user's emotions. Emotion estimation is achieved using an emotion estimation function, for example, with an emotion engine or generative AI. Generative AI includes, but is not limited to, text generation AI (e.g., LLM) or multimodal generation AI.

[0073] The camera unit can prioritize photographing highly relevant subjects by considering the user's geographical location information during shooting. For example, if the user is in a tourist area, the camera unit will prioritize photographing tourist attractions. For example, if the user is in a tourist area, the camera unit will automatically detect tourist attractions and prioritize photographing them. The camera unit can also prioritize photographing everyday objects if the user is at home. For example, if the user is at home, the camera unit will automatically detect everyday objects and prioritize photographing them. Furthermore, if the user is at work, the camera unit can also prioritize photographing work-related objects. For example, if the user is at work, the camera unit will automatically detect work-related objects and prioritize photographing them. In this way, the camera unit can prioritize photographing highly relevant subjects by considering the user's geographical location information. Some or all of the above processing in the camera unit may be performed using AI, for example, or without using AI.

[0074] The camera unit can analyze the user's social media activity during shooting and photograph relevant subjects. For example, the camera unit can photograph relevant subjects based on the content the user frequently posts on social media. For example, the camera unit can analyze the content the user frequently posts on social media, automatically detect relevant subjects, and photograph them. The camera unit can also prioritize photographing subjects that the user's followers are likely to be interested in. For example, the camera unit can automatically detect and photograph subjects that the user's followers are likely to be interested in. Furthermore, the camera unit can also photograph relevant subjects based on the content the user has previously "liked". For example, the camera unit can analyze the content the user has previously "liked", automatically detect relevant subjects, and photograph them. In this way, the camera unit can prioritize photographing relevant subjects by analyzing the user's social media activity. Some or all of the above processing in the camera unit may be performed using AI, for example, or without AI.

[0075] The analysis unit can estimate the user's emotions and adjust the accuracy of the analysis based on the estimated emotions. For example, the analysis unit can capture the user's facial expressions with a camera and estimate the emotions using an emotion estimation algorithm. For example, if the user is relaxed, the analysis unit can perform a detailed analysis. The analysis unit can also perform a rapid analysis if the user is in a hurry. Furthermore, if the user is excited, the analysis unit can provide visually stimulating analysis results. In this way, the analysis unit can provide more appropriate analysis results by adjusting the accuracy of the analysis according to the user's emotions. Emotion estimation is achieved using an emotion estimation function, for example, using an emotion engine or generative AI. Generative AI is, but is not limited to, text generation AI (e.g., LLM) or multimodal generation AI.

[0076] The analysis unit can optimize its analysis algorithm by referring to past data of objects and actions within the video during analysis. For example, the analysis unit can optimize its analysis algorithm based on past data of objects within the video. For example, the analysis unit can optimize its analysis algorithm by referring to past data of objects within the video. The analysis unit can also optimize its analysis algorithm based on past data of actions within the video. For example, the analysis unit can optimize its analysis algorithm by referring to past data of actions within the video. Furthermore, the analysis unit can optimize its analysis algorithm by comprehensively referring to past data of objects and actions within the video. In this way, the analysis unit can optimize its analysis algorithm by referring to past data and improve the accuracy of the analysis.

[0077] The analysis unit can apply different analysis methods depending on the category of objects and actions in the video during analysis. For example, if an object in the video is stationary, the analysis unit applies an analysis method specialized for stationary objects. For example, the analysis unit recognizes that an object in the video is stationary and applies an analysis method specialized for stationary objects. The analysis unit can also apply an analysis method specialized for dynamic actions if the action in the video is dynamic. For example, the analysis unit recognizes that the action in the video is dynamic and applies an analysis method specialized for dynamic actions. Furthermore, the analysis unit can select the optimal analysis method according to the category of objects and actions in the video. This allows the analysis unit to improve analysis accuracy by applying an analysis method appropriate to the category.

[0078] The analysis unit can estimate the user's emotions and adjust the display method of the analysis results based on the estimated emotions. For example, the analysis unit can capture the user's facial expressions with a camera and estimate the emotions using an emotion estimation algorithm. For example, if the user is tense, the analysis unit can provide a simple and highly visible display method. If the user is relaxed, the analysis unit can also provide a display method that includes detailed information. Furthermore, if the user is in a hurry, the analysis unit can provide a display method that gets straight to the point. In this way, the analysis unit can provide more appropriate information by adjusting the display method of the analysis results according to the user's emotions. Emotion estimation is achieved using an emotion estimation function, for example, using an emotion engine or generative AI. Generative AI is, but is not limited to, text generation AI (e.g., LLM) or multimodal generation AI.

[0079] The analysis unit can determine the priority of analysis based on when the video was shot. For example, the analysis unit can prioritize the analysis of the most recent video. For example, the analysis unit can automatically detect the most recent video and prioritize its analysis. The analysis unit can also prioritize the analysis of video shot within a specific period. For example, the analysis unit can automatically detect video shot within a specific period and prioritize its analysis. Furthermore, the analysis unit can prioritize the analysis of video within a period specified by the user. For example, the analysis unit can automatically detect video within a period specified by the user and prioritize its analysis. In this way, the analysis unit can prioritize the analysis of the most recent information by determining the priority of analysis based on when the video was shot.

[0080] The analysis unit can improve the accuracy of its analysis by referring to relevant literature on the video during the analysis. For example, the analysis unit can improve the accuracy of its analysis by referring to literature related to objects in the video. For example, the analysis unit can improve the accuracy of its analysis by referring to literature related to objects in the video. Furthermore, the analysis unit can improve the accuracy of its analysis by referring to literature related to actions in the video. For example, the analysis unit can improve the accuracy of its analysis by referring to literature related to actions in the video. In addition, the analysis unit can improve the accuracy of its analysis by comprehensively referring to literature related to objects and actions in the video. In this way, the analysis unit can improve the accuracy of its analysis by referring to relevant literature.

[0081] The presentation unit can estimate the user's emotions and adjust the presentation's presentation style based on the estimated emotions. For example, the presentation unit can capture the user's facial expressions with a camera and estimate the emotions using an emotion estimation algorithm. For instance, if the user is tense, the presentation unit can provide a simple and highly visible presentation style. If the user is relaxed, the presentation unit can also provide a presentation style that includes detailed information. Furthermore, if the user is in a hurry, the presentation unit can provide a concise presentation style. In this way, the presentation unit can provide more appropriate information by adjusting the presentation style according to the user's emotions. Emotion estimation is achieved using an emotion estimation function, for example, with an emotion engine or generative AI. Generative AI includes, but is not limited to, text generation AI (e.g., LLM) or multimodal generation AI.

[0082] The presentation unit can adjust the level of detail in its presentation based on the importance of the candidates. For example, it can present highly important candidates in detail. It can also present less important candidates concisely. Furthermore, the presentation unit can adjust the level of detail in stages according to importance. This allows the presentation unit to provide more appropriate information by adjusting the level of detail in its presentation based on the importance of the candidates.

[0083] The presentation unit can apply different presentation algorithms depending on the category of the candidate during presentation. For example, when presenting object candidates, the presentation unit applies an object-specific presentation algorithm. Similarly, when presenting action candidates, the presentation unit can apply an action-specific presentation algorithm. Furthermore, the presentation unit can select the optimal presentation algorithm depending on the category. This allows the presentation unit to improve the accuracy of presentations by applying a presentation algorithm appropriate to the category.

[0084] The presentation unit can estimate the user's emotions and adjust the length of the presentation based on the estimated emotions. For example, the presentation unit can capture the user's facial expressions with a camera and estimate the emotions using an emotion estimation algorithm. For example, if the user is in a hurry, the presentation unit will provide a short, concise presentation. If the user is relaxed, the presentation unit can also provide a longer presentation with detailed explanations. Furthermore, if the user is excited, the presentation unit can add visually stimulating effects to the presentation. In this way, the presentation unit can provide more appropriate information by adjusting the length of the presentation according to the user's emotions. Emotion estimation is achieved using an emotion estimation function, for example, using an emotion engine or generative AI. Generative AI includes, but is not limited to, text generation AI (e.g., LLM) or multimodal generation AI.

[0085] The presentation unit can determine the priority of presentations based on the submission date of the candidates. For example, the presentation unit can prioritize the presentation of the most recent candidates. For example, the presentation unit can automatically detect and prioritize the most recent candidates. The presentation unit can also prioritize the presentation of candidates submitted within a specific period. For example, the presentation unit can automatically detect and prioritize the presentation of candidates submitted within a specific period. Furthermore, the presentation unit can prioritize the presentation of candidates submitted within a period specified by the user. For example, the presentation unit can automatically detect and prioritize the presentation of candidates submitted within a period specified by the user. In this way, the presentation unit can prioritize the presentation of the most recent information by determining the priority of presentations based on the submission date.

[0086] The presentation unit can adjust the order of presentation based on the relevance of the candidates. For example, the presentation unit can prioritize the presentation of highly relevant candidates. For instance, the presentation unit can automatically detect and prioritize the presentation of highly relevant candidates. The presentation unit can also postpone the presentation of less relevant candidates. For example, the presentation unit can automatically detect and postpone the presentation of less relevant candidates. Furthermore, the presentation unit can adjust the order of presentation in stages according to relevance. This allows the presentation unit to provide more appropriate information by adjusting the order of presentation based on relevance.

[0087] The transformation unit can estimate the user's emotions and adjust the transformation method based on the estimated emotions. For example, the transformation unit can capture the user's facial expressions with a camera and estimate the emotions using an emotion estimation algorithm. For example, if the user is relaxed, the transformation unit can perform a detailed transformation. It can also perform a rapid transformation if the user is in a hurry. Furthermore, if the user is excited, the transformation unit can perform a visually stimulating transformation. In this way, the transformation unit can provide more appropriate transformation results by adjusting the transformation method according to the user's emotions. Emotion estimation is achieved using an emotion estimation function, for example, using an emotion engine or generative AI. Generative AI is, but is not limited to, text generation AI (e.g., LLM) or multimodal generation AI.

[0088] The conversion unit can optimize the conversion algorithm by referring to past conversion data of candidates during the conversion process. For example, the conversion unit can select the optimal conversion algorithm based on past conversion data. For example, the conversion unit can refer to past conversion data and select the optimal conversion algorithm. The conversion unit can also analyze past conversion data and optimize the conversion algorithm. For example, the conversion unit can analyze past conversion data and optimize the conversion algorithm. Furthermore, the conversion unit can comprehensively refer to past conversion data and optimize the conversion algorithm. In this way, the conversion unit can optimize the conversion algorithm by referring to past conversion data and improve conversion accuracy.

[0089] The conversion unit can apply different conversion methods depending on the candidate category during conversion. For example, when converting an object candidate, the conversion unit applies an object-specific conversion method. Similarly, when converting an action candidate, the conversion unit can apply an action-specific conversion method. Furthermore, the conversion unit can select the optimal conversion method according to the category. This allows the conversion unit to improve conversion accuracy by applying a conversion method appropriate to the category.

[0090] The transformation unit can estimate the user's emotions and adjust the display method of the transformation results based on the estimated emotions. For example, the transformation unit can capture the user's facial expression with a camera and estimate the emotions using an emotion estimation algorithm. For example, if the user is tense, the transformation unit can provide a simple and highly visible display method. If the user is relaxed, the transformation unit can also provide a display method that includes detailed information. Furthermore, if the user is in a hurry, the transformation unit can provide a display method that gets straight to the point. In this way, the transformation unit can provide more appropriate information by adjusting the display method of the transformation results according to the user's emotions. Emotion estimation is achieved using an emotion estimation function, for example, using an emotion engine or generative AI. Generative AI is, but is not limited to, text generation AI (e.g., LLM) or multimodal generation AI.

[0091] The conversion unit can determine the conversion priority based on when the candidates were submitted. For example, the conversion unit can prioritize the conversion of the most recent candidates. For example, the conversion unit can automatically detect the most recent candidates and convert them preferentially. The conversion unit can also prioritize the conversion of candidates submitted within a specific period. For example, the conversion unit can automatically detect candidates submitted within a specific period and convert them preferentially. Furthermore, the conversion unit can also prioritize the conversion of candidates within a period specified by the user. For example, the conversion unit can automatically detect candidates within a period specified by the user and convert them preferentially. In this way, the conversion unit can prioritize the conversion of the latest information by determining the conversion priority based on the submission date.

[0092] The conversion unit can improve the accuracy of the conversion by referring to related literature for the candidate during the conversion process. For example, the conversion unit can improve the accuracy of the conversion by referring to literature related to the candidate. For example, the conversion unit can improve the accuracy of the conversion by referring to literature related to the candidate. The conversion unit can also improve the accuracy of the conversion by referring to past conversion data for the candidate. For example, the conversion unit can improve the accuracy of the conversion by referring to past conversion data for the candidate. Furthermore, the conversion unit can improve the accuracy of the conversion by comprehensively referring to related literature and past conversion data for the candidate. In this way, the conversion unit can improve the accuracy of the conversion by referring to related literature.

[0093] The system according to the embodiment is not limited to the example described above, and various modifications are possible, for example, as follows.

[0094] The communication support system can also be equipped with a speech recognition unit. The speech recognition unit recognizes the user's speech in real time and transmits it to the analysis unit. For example, if the user says, "Please give me some water," the speech recognition unit converts the speech into text and transmits it to the analysis unit. Based on the text transmitted from the speech recognition unit, the analysis unit can analyze the user's intent and present appropriate options. The speech recognition unit can also analyze the speed and tone of the user's speech and estimate the user's emotions. For example, if the user is excited, the analysis unit can quickly present options. In this way, the speech recognition unit can support communication by recognizing the user's speech in real time and transmitting it to the analysis unit.

[0095] The analysis unit can improve the accuracy of its analysis by referring to the user's past behavior data. For example, the analysis unit can more accurately recognize objects and actions in the current video based on data of objects and actions that the user has previously instructed. Furthermore, the analysis unit can learn the user's past behavior patterns and build a predictive model. For example, it can prioritize presenting objects and actions that the user has frequently instructed in the past as candidates. In this way, the analysis unit can improve the accuracy of its analysis and present more appropriate candidates by referring to the user's past behavior data.

[0096] The presentation unit can customize the presentation format based on the user's visual preferences. For example, if the user prefers illustrations, the presentation unit can present options in illustration format. Similarly, if the user prefers photographs, the presentation unit can present options in photographic format. Furthermore, if the user prefers comics, the presentation unit can present options in comic book format. This allows the presentation unit to support more effective communication by customizing the presentation format based on the user's visual preferences.

[0097] The camera unit can estimate the user's emotions and adjust the frame rate based on those emotions. For example, if the user is excited, the camera unit can shoot at a high frame rate to smoothly record fast-moving scenes. If the user is relaxed, it can shoot at a low frame rate to record quiet scenes in more detail. Furthermore, if the user is tense, the camera unit can shoot at a medium frame rate to provide a balanced image. In this way, the camera unit can obtain more appropriate footage by adjusting the frame rate according to the user's emotions.

[0098] The camera unit can automatically select the optimal shooting settings based on the user's past shooting data. For example, it can learn the shooting settings the user has preferred in the past and apply them to the current shoot. The camera unit can also analyze the user's past shooting data and automatically set the optimal shooting angle and distance. Furthermore, the camera unit can suggest optimal shooting conditions based on the user's past shooting data. As a result, by referring to the user's past shooting data, the camera unit can automatically select the optimal shooting settings and acquire more effective footage.

[0099] The camera unit can automatically adjust the exposure settings based on the user's current environmental information. For example, if the user is outdoors, the exposure can be adjusted according to the intensity of natural light. If the user is indoors, the exposure can be adjusted according to the brightness of the lighting. Furthermore, if the user is moving, the exposure can be automatically adjusted according to the movement. In this way, the camera unit can automatically adjust the optimal exposure settings by considering the user's current environmental information, enabling the acquisition of more appropriate images.

[0100] The camera unit can estimate the user's emotions and adjust the zoom level based on those emotions. For example, if the user is excited, the camera unit can zoom in to capture detailed footage. If the user is relaxed, it can zoom out to capture a wider view. Furthermore, if the user is tense, it can shoot at an appropriate zoom level to provide balanced footage. In this way, the camera unit can capture more appropriate footage by adjusting the zoom level according to the user's emotions.

[0101] The camera unit can overlay relevant information based on the user's geographical location. For example, if the user is in a tourist area, information about tourist attractions can be overlaid. If the user is at home, information about nearby shops and facilities can be overlaid. Furthermore, if the user is at work, work-related information can be overlaid. In this way, the camera unit can provide richer information by overlaying relevant information based on the user's geographical location.

[0102] The photography team can automatically generate relevant hashtags based on the user's social media activity. For example, it can automatically generate relevant hashtags based on hashtags the user frequently uses on social media. It can also automatically generate hashtags that the user's followers are likely to be interested in. Furthermore, it can automatically generate relevant hashtags based on hashtags from posts the user has previously "liked". In this way, the photography team can support more effective social media posts by automatically generating relevant hashtags based on the user's social media activity.

[0103] The analysis unit can estimate the user's emotions and filter the analysis results based on those emotions. For example, if the user is relaxed, it can provide detailed analysis results. If the user is in a hurry, it can filter and provide only the most important information. Furthermore, if the user is excited, it can provide visually stimulating analysis results. In this way, the analysis unit can provide more appropriate information by filtering the analysis results according to the user's emotions.

[0104] The following briefly describes the processing flow for example form 2.

[0105] Step 1: The camera unit captures the user's gestures and objects. These include, but are not limited to, hand movements, facial expressions, and specific objects. The camera unit captures the user's gestures and objects, for example, using a camera. Step 2: The analysis unit uses a generation AI to analyze the video captured by the shooting unit. The analysis unit recognizes objects and actions in the video, for example, using image recognition algorithms and motion analysis methods. Step 3: The presentation unit presents candidates based on the results analyzed by the analysis unit. For example, the presentation unit presents a list of words or a selection of images. Step 4: The conversion unit converts the candidates presented by the presentation unit into images, illustrations, or comics. The conversion unit uses, for example, image processing technology or illustration generation algorithms to convert the presented candidates into a visually easy-to-understand form.

[0106] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0107] Data generation model 58 is a form of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> Examples of generative AI include text generation AI, image generation AI, and multimodal generation AI. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images (e.g., still image data or video data). The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference result in one or more data formats from audio data, text data, and image data. The data generation model 58 includes, for example, text generation AI, image generation AI, and multimodal generation AI. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization. The specific processing unit 290 performs the specific processing described above using the data generation model 58. The data generation model 58 may be a fine-tuned model that outputs inference results from prompts that do not contain instructions, in which case the data generation model 58 can output inference results from prompts that do not contain instructions. In the data processing device 12, etc., there are multiple types of data generation models 58, and the data generation model 58 includes AI other than generative AI. AI other than generative AI includes, for example, linear regression, logistic regression, decision trees, random forests, support vector machines (SVMs), k-means clustering, convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), or naive Bayes, and can perform various processes, but is not limited to these examples. Also, the AI ​​may be an AI agent. Furthermore, when the processing of each of the above parts is performed by the AI, the processing may be performed by the AI ​​in part or in whole, but is not limited to this example.Furthermore, processing performed by AI, including generative AI, may be replaced with rule-based processing, and rule-based processing may be replaced with processing performed by AI, including generative AI.

[0108] Furthermore, the processing performed by the data processing system 10 described above is carried out by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the smart device 14, but it may also be carried out by the specific processing unit 290 of the data processing device 12 and the control unit 46A of the smart device 14. In addition, the specific processing unit 290 of the data processing device 12 acquires or collects information necessary for processing from the smart device 14 or an external device, and the smart device 14 acquires or collects information necessary for processing from the data processing device 12 or an external device.

[0109] Each of the multiple elements described above, including the shooting unit, analysis unit, presentation unit, and conversion unit, is implemented in at least one of the smart device 14 and the data processing unit 12. For example, the shooting unit is implemented by the camera 42 of the smart device 14 and captures the user's gestures or objects. The analysis unit is implemented by the specific processing unit 290 of the data processing unit 12 and analyzes the captured images. The presentation unit is implemented by the display 40A of the smart device 14 and presents candidates based on the analysis results. The conversion unit is implemented by the specific processing unit 290 of the data processing unit 12 and converts the presented candidates into images, illustrations, or comics. The correspondence between each unit and the device or control unit is not limited to the example described above and can be modified in various ways.

[0110] [Second Embodiment] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0111] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0112] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN and / or LAN.

[0113] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0114] The microphone 238 receives voice signals from the user and accepts instructions from the user. The microphone 238 captures the voice signals from the user, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0115] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, which captures images of the area around the user (for example, an imaging range defined by a field of view equivalent to the field of vision of a typical healthy person).

[0116] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0117] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing by the processor 28. The storage 32 stores the specific processing program 56.

[0118] The processor 28 reads a specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 acting as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0119] Storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290. The identification processing unit 290 can estimate the user's emotions using the emotion identification model 59 and perform identification processing using the user's emotions. The emotion estimation function (emotion identification function) using the emotion identification model 59 performs various estimations and predictions regarding the user's emotions, including but not limited to these examples. Furthermore, emotion estimation and prediction also include, for example, emotion analysis.

[0120] In the smart glasses 214, specific processing is performed by the processor 46. The storage 50 stores a specific processing program 60. The processor 46 reads the specific processing program 60 from the storage 50 and executes the read specific processing program 60 on the RAM 48. The specific processing is realized by the processor 46 acting as a control unit 46A according to the specific processing program 60 executed on the RAM 48. The smart glasses 214 also have a data generation model 58 and an emotion identification model 59, similar to the data generation model and emotion identification model 59, and can perform processing similar to that of the specific processing unit 290 using these models.

[0121] Furthermore, other devices besides the data processing device 12 may also have the data generation model 58. For example, a server device may have the data generation model 58. In this case, the data processing device 12 obtains processing results (such as prediction results) using the data generation model 58 by communicating with the server device that has the data generation model 58. Also, the data processing device 12 may be a server device or a terminal device owned by the user (for example, a mobile phone, robot, home appliance, etc.).

[0122] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0123] The data generation model 58 is a so-called generative AI. An example of a data generation model 58 is a generative AI such as ChatGPT. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and inference data such as audio data representing speech, text data representing text, and image data representing images (e.g., still image data or video data). The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference result in one or more data formats such as audio data, text data, and image data. The data generation model 58 includes, for example, text generation AI, image generation AI, and multimodal generation AI. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization. The specific processing unit 290 performs the specific processing described above using the data generation model 58. The data generation model 58 may be a fine-tuned model that outputs inference results from prompts that do not contain instructions, in which case the data generation model 58 can output inference results from prompts that do not contain instructions. In the data processing device 12, etc., there are multiple types of data generation models 58, and the data generation model 58 includes AI other than generative AI. AI other than generative AI includes, for example, linear regression, logistic regression, decision trees, random forests, support vector machines (SVM), k-means clustering, convolutional neural networks (CNN), recurrent neural networks (RNN), generative adversarial networks (GAN), or naive Bayes, and can perform various processes, but is not limited to these examples. Also, the AI ​​may be an AI agent. Furthermore, when the processing of each part described above is performed by the AI, the processing may be performed by the AI ​​in part or in whole, but is not limited to this example. Also, processing performed by an AI including a generative AI may be replaced by rule-based processing, and rule-based processing may be replaced by processing performed by an AI including a generative AI.

[0124] The data processing system 210 according to the second embodiment performs the same processing as the data processing system 10 according to the first embodiment. The processing by the data processing system 210 is performed by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the smart glasses 214, but it may also be performed by the specific processing unit 290 of the data processing device 12 and the control unit 46A of the smart glasses 214. In addition, the specific processing unit 290 of the data processing device 12 acquires or collects information necessary for processing from the smart glasses 214 or an external device, and the smart glasses 214 acquires or collects information necessary for processing from the data processing device 12 or an external device.

[0125] Each of the multiple elements described above, including the imaging unit, analysis unit, presentation unit, and conversion unit, is implemented in at least one of the smart glasses 214 and the data processing unit 12. For example, the imaging unit is implemented by the camera 42 of the smart glasses 214 and captures the user's gestures or objects. The analysis unit is implemented by the identification processing unit 290 of the data processing unit 12 and analyzes the captured images. The presentation unit is implemented by the display of the smart glasses 214 and presents candidates based on the analysis results. The conversion unit is implemented by the identification processing unit 290 of the data processing unit 12 and converts the presented candidates into images, illustrations, or comics. The correspondence between each unit and the device or control unit is not limited to the example described above and can be modified in various ways.

[0126] [Third Embodiment] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0127] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0128] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN and / or LAN.

[0129] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0130] The microphone 238 receives voice signals from the user and accepts instructions from the user. The microphone 238 captures the voice signals from the user, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0131] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, which captures images of the area around the user (for example, an imaging range defined by a field of view equivalent to the field of vision of a typical healthy person).

[0132] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0133] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0134] The processor 28 reads a specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 acting as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0135] Storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290. The identification processing unit 290 can estimate the user's emotions using the emotion identification model 59 and perform identification processing using the user's emotions. The emotion estimation function (emotion identification function) using the emotion identification model 59 performs various estimations and predictions regarding the user's emotions, including but not limited to these examples. Furthermore, emotion estimation and prediction also include, for example, emotion analysis.

[0136] In the headset terminal 314, specific processing is performed by the processor 46. The storage 50 stores a specific program 60. The processor 46 reads the specific program 60 from the storage 50 and executes the read specific program 60 on the RAM 48. The specific processing is realized by the processor 46 acting as a control unit 46A according to the specific program 60 executed on the RAM 48. The headset terminal 314 also has a data generation model 58 and an emotion identification model 59, similar to the data generation model and emotion identification model 59, and can perform processing similar to that of the specific processing unit 290 using these models.

[0137] Furthermore, other devices besides the data processing device 12 may also have the data generation model 58. For example, a server device may have the data generation model 58. In this case, the data processing device 12 obtains processing results (such as prediction results) using the data generation model 58 by communicating with the server device that has the data generation model 58. Also, the data processing device 12 may be a server device or a terminal device owned by the user (for example, a mobile phone, robot, home appliance, etc.).

[0138] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0139] The data generation model 58 is a so-called generative AI. An example of a data generation model 58 is a generative AI such as ChatGPT. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and inference data such as audio data representing speech, text data representing text, and image data representing images (e.g., still image data or video data). The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference result in one or more data formats such as audio data, text data, and image data. The data generation model 58 includes, for example, text generation AI, image generation AI, and multimodal generation AI. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization. The specific processing unit 290 performs the specific processing described above using the data generation model 58. The data generation model 58 may be a fine-tuned model that outputs inference results from prompts that do not contain instructions, in which case the data generation model 58 can output inference results from prompts that do not contain instructions. In the data processing device 12, etc., there are multiple types of data generation models 58, and the data generation model 58 includes AI other than generative AI. AI other than generative AI includes, for example, linear regression, logistic regression, decision trees, random forests, support vector machines (SVM), k-means clustering, convolutional neural networks (CNN), recurrent neural networks (RNN), generative adversarial networks (GAN), or naive Bayes, and can perform various processes, but is not limited to these examples. Also, the AI ​​may be an AI agent. Furthermore, when the processing of each part described above is performed by the AI, the processing may be performed by the AI ​​in part or in whole, but is not limited to this example. Also, processing performed by an AI including a generative AI may be replaced by rule-based processing, and rule-based processing may be replaced by processing performed by an AI including a generative AI.

[0140] The data processing system 310 according to the third embodiment performs the same processing as the data processing system 10 according to the first embodiment. The processing by the data processing system 310 is performed by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the headset terminal 314, but may also be performed by the specific processing unit 290 of the data processing device 12 and the control unit 46A of the headset terminal 314. In addition, the specific processing unit 290 of the data processing device 12 acquires or collects information necessary for processing from the headset terminal 314 or an external device, and the headset terminal 314 acquires or collects information necessary for processing from the data processing device 12 or an external device.

[0141] Each of the multiple elements described above, including the imaging unit, analysis unit, presentation unit, and conversion unit, is implemented in at least one of the headset terminal 314 and the data processing unit 12. For example, the imaging unit is implemented by the camera 42 of the headset terminal 314 and captures the user's gestures or objects. The analysis unit is implemented by the specific processing unit 290 of the data processing unit 12 and analyzes the captured images. The presentation unit is implemented by the display 343 of the headset terminal 314 and presents candidates based on the analysis results. The conversion unit is implemented by the specific processing unit 290 of the data processing unit 12 and converts the presented candidates into images, illustrations, or comics. The correspondence between each unit and the device or control unit is not limited to the example described above and can be modified in various ways.

[0142] [Fourth Embodiment] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0143] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0144] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN and / or LAN.

[0145] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0146] The microphone 238 receives voice signals from the user and accepts instructions from the user. The microphone 238 captures the voice signals from the user, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0147] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS image sensor or CCD image sensor, which captures images of the area around the user (for example, an imaging range defined by a field of view equivalent to the field of vision of a typical healthy person).

[0148] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0149] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. The robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0150] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0151] The processor 28 reads a specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 acting as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0152] Storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290. The identification processing unit 290 can estimate the user's emotions using the emotion identification model 59 and perform identification processing using the user's emotions. The emotion estimation function (emotion identification function) using the emotion identification model 59 performs various estimations and predictions regarding the user's emotions, including but not limited to these examples. Furthermore, emotion estimation and prediction also include, for example, emotion analysis.

[0153] In robot 414, specific processing is performed by processor 46. A specific program 60 is stored in storage 50. Processor 46 reads the specific program 60 from storage 50 and executes it on RAM 48. The specific processing is achieved by processor 46 acting as a control unit 46A according to the specific program 60 executed on RAM 48. Robot 414 also has data generation model 58 and emotion identification model 59, similar to those of the robot, and can perform processing similar to that of the specific processing unit 290 using these models.

[0154] Furthermore, other devices besides the data processing device 12 may also have the data generation model 58. For example, a server device may have the data generation model 58. In this case, the data processing device 12 obtains processing results (such as prediction results) using the data generation model 58 by communicating with the server device that has the data generation model 58. Also, the data processing device 12 may be a server device or a terminal device owned by the user (for example, a mobile phone, robot, home appliance, etc.).

[0155] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0156] The data generation model 58 is a so-called generative AI. An example of a data generation model 58 is a generative AI such as ChatGPT. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and inference data such as audio data representing speech, text data representing text, and image data representing images (e.g., still image data or video data). The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference result in one or more data formats such as audio data, text data, and image data. The data generation model 58 includes, for example, text generation AI, image generation AI, and multimodal generation AI. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization. The specific processing unit 290 performs the specific processing described above using the data generation model 58. The data generation model 58 may be a fine-tuned model that outputs inference results from prompts that do not contain instructions, in which case the data generation model 58 can output inference results from prompts that do not contain instructions. In the data processing device 12, etc., there are multiple types of data generation models 58, and the data generation model 58 includes AI other than generative AI. AI other than generative AI includes, for example, linear regression, logistic regression, decision trees, random forests, support vector machines (SVM), k-means clustering, convolutional neural networks (CNN), recurrent neural networks (RNN), generative adversarial networks (GAN), or naive Bayes, and can perform various processes, but is not limited to these examples. Also, the AI ​​may be an AI agent. Furthermore, when the processing of each part described above is performed by the AI, the processing may be performed by the AI ​​in part or in whole, but is not limited to this example. Also, processing performed by an AI including a generative AI may be replaced by rule-based processing, and rule-based processing may be replaced by processing performed by an AI including a generative AI.

[0157] The data processing system 410 according to the fourth embodiment performs the same processing as the data processing system 10 according to the first embodiment. The processing by the data processing system 410 is performed by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the robot 414, but it may also be performed by the specific processing unit 290 of the data processing device 12 and the control unit 46A of the robot 414. In addition, the specific processing unit 290 of the data processing device 12 acquires or collects information necessary for processing from the robot 414 or an external device, and the robot 414 acquires or collects information necessary for processing from the data processing device 12 or an external device.

[0158] Each of the multiple elements described above, including the imaging unit, analysis unit, presentation unit, and conversion unit, is implemented in at least one of the following: the robot 414 and the data processing unit 12. For example, the imaging unit is implemented by the camera 42 of the robot 414 and captures the user's gestures or objects. The analysis unit is implemented by the specific processing unit 290 of the data processing unit 12 and analyzes the captured images. The presentation unit is implemented by the display of the robot 414 and presents candidates based on the analysis results. The conversion unit is implemented by the specific processing unit 290 of the data processing unit 12 and converts the presented candidates into images, illustrations, or comics. The correspondence between each unit and the device or control unit is not limited to the example described above and can be modified in various ways.

[0159] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0160] Figure 9 shows the emotion map 400, in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0161] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0162] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0163] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, and motorcycles, emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated based, for example, on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0164] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0165] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0166] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing method for the specific process may be used, which includes computer 22 and multiple other computers.

[0167] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0168] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0169] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0170] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0171] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0172] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0173] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0174] Furthermore, although the above-described examples were divided into four embodiments, some or all of these embodiments may be combined. Also, the smart device 14, smart glasses 214, headset terminal 314, and robot 414 are just examples, and they may be combined, or other devices may be used. Also, although the above-described examples were divided into two embodiments, Embodiment 1 and Embodiment 2, these may be combined.

[0175] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and other things that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0176] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0177] (Note 1) A camera unit that captures the user's gestures or objects, An analysis unit that analyzes the video captured by the aforementioned shooting unit, A presentation unit that presents candidates based on the analysis results, The system includes a conversion unit that converts the candidates presented by the presentation unit into images, illustrations, or comics. A system characterized by the following features. (Note 2) The aforementioned analysis unit, It recognizes objects or actions in the video and suggests corresponding words as candidates. The system described in Appendix 1, characterized by the features described herein. (Note 3) The aforementioned display unit is, Convert the presented options into images, illustrations, or comics. The system described in Appendix 1, characterized by the features described herein. (Note 4) The aforementioned imaging unit is It estimates the user's emotions and adjusts the timing of the photo shoot based on those emotions. The system described in Appendix 1, characterized by the features described herein. (Note 5) The aforementioned imaging unit is During shooting, the system analyzes the user's past behavior patterns to select the appropriate shooting angle or distance. The system described in Appendix 1, characterized by the features described herein. (Note 6) The aforementioned imaging unit is During shooting, the system automatically adjusts shooting conditions based on the user's current environmental information. The system described in Appendix 1, characterized by the features described herein. (Note 7) The aforementioned imaging unit is It estimates the user's emotions and determines the priority of subjects to photograph based on the estimated user emotions. The system described in Appendix 1, characterized by the features described herein. (Note 8) The aforementioned imaging unit is During shooting, the system prioritizes capturing highly relevant subjects by considering the user's geographical location. The system described in Appendix 1, characterized by the features described herein. (Note 9) The aforementioned imaging unit is During shooting, the system analyzes the user's social media activity and photographs relevant subjects. The system described in Appendix 1, characterized by the features described herein. (Note 10) The aforementioned analysis unit, It estimates the user's emotions and adjusts the accuracy of the analysis based on the estimated user emotions. The system described in Appendix 1, characterized by the features described herein. (Note 11) The aforementioned analysis unit, During analysis, the analysis algorithm is optimized by referring to past data on objects and actions within the video. The system described in Appendix 1, characterized by the features described herein. (Note 12) The aforementioned analysis unit, During analysis, different analysis methods are applied depending on the category of objects and actions in the video. The system described in Appendix 1, characterized by the features described herein. (Note 13) The aforementioned analysis unit, It estimates the user's emotions and adjusts how the analysis results are displayed based on the estimated emotions. The system described in Appendix 1, characterized by the features described herein. (Note 14) The aforementioned analysis unit, During analysis, the priority of the analysis is determined based on when the video was filmed. The system described in Appendix 1, characterized by the features described herein. (Note 15) The aforementioned analysis unit, During analysis, we refer to relevant literature related to the video to improve the accuracy of the analysis. The system described in Appendix 1, characterized by the features described herein. (Note 16) The aforementioned display unit is, It estimates the user's emotions and adjusts the presentation method based on the estimated user emotions. The system described in Appendix 1, characterized by the features described herein. (Note 17) The aforementioned display unit is, When presenting, adjust the level of detail based on the importance of the candidates. The system described in Appendix 1, characterized by the features described herein. (Note 18) The aforementioned display unit is, When presenting, different presentation algorithms are applied depending on the candidate category. The system described in Appendix 1, characterized by the features described herein. (Note 19) The aforementioned display unit is, It estimates the user's emotions and adjusts the length of the presentation based on the estimated emotions. The system described in Appendix 1, characterized by the features described herein. (Note 20) The aforementioned display unit is, When presenting candidates, the priority of presentations will be determined based on when the candidates were submitted. The system described in Appendix 1, characterized by the features described herein. (Note 21) The aforementioned display unit is, When presenting, adjust the order of presentation based on the relevance of the candidates. The system described in Appendix 1, characterized by the features described herein. (Note 22) The conversion unit is It estimates the user's emotions and adjusts the conversion method based on the estimated user emotions. The system described in Appendix 1, characterized by the features described herein. (Note 23) The conversion unit is During conversion, the conversion algorithm is optimized by referring to past conversion data of the candidates. The system described in Appendix 1, characterized by the features described herein. (Note 24) The conversion unit is During conversion, different conversion methods are applied depending on the candidate category. The system described in Appendix 1, characterized by the features described herein. (Note 25) The conversion unit is It estimates the user's emotions and adjusts how the conversion results are displayed based on the estimated emotions. The system described in Appendix 1, characterized by the features described herein. (Note 26) The conversion unit is During the conversion process, the conversion priority is determined based on when the candidates were submitted. The system described in Appendix 1, characterized by the features described herein. (Note 27) The conversion unit is During conversion, the accuracy of the conversion is improved by referring to related literature for the candidates. The system described in Appendix 1, characterized by the features described herein. [Explanation of Symbols]

[0178] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots

Claims

1. A camera unit that captures the user's gestures or objects, An analysis unit that analyzes the video captured by the aforementioned shooting unit, A presentation unit that presents candidates based on the analysis results, The system includes a conversion unit that generates an image, illustration, or comic representing one of the candidates selected from the candidates presented by the aforementioned presentation unit, using generation AI. The aforementioned shooting unit estimates the user's emotions by capturing the user's facial expression prior to capturing the user's gestures or objects, and adjusts the timing of capturing the user's gestures or objects based on the estimated emotions. The aforementioned shooting unit further prioritizes photographing tourist attractions if the geographical location information indicates a tourist destination, and prioritizes photographing everyday objects located within the user's home if the geographical location information indicates the user's home. A system characterized by the following features.

2. The aforementioned analysis unit, It recognizes objects or actions within the video and presents suggested words corresponding to the recognized objects or actions. The system according to feature 1.

3. The aforementioned imaging unit is If the estimated emotion is one of tension, the timing of the photo shoot will be delayed until the user is relaxed. The system according to feature 1.

4. The aforementioned imaging unit is During shooting, the system analyzes the user's past behavior patterns to select the appropriate shooting angle or distance. The system according to feature 1.

5. The aforementioned imaging unit is During shooting, the system automatically adjusts shooting conditions based on the user's current environmental information. The system according to feature 1.

6. The aforementioned imaging unit is Prior to photographing the user's gestures or objects, the priority of subjects to be photographed is determined based on the emotions estimated by photographing the user's facial expressions. The system according to feature 1.

7. The aforementioned imaging unit is If the aforementioned geographical location information indicates a workplace, prioritize photographing work-related items. The system according to feature 1.