system

JP2026105507APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-16
Publication Date: 2026-06-26

Application Information

Patent Timeline

16 Dec 2024

Application

26 Jun 2026

Publication

JP2026105507A

IPC: G06F40/56; G06F40/40; G06F40/58; G06F40/30

AI Tagging

Technology Topics

User input Output device

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

system
JP2026103537AFinance User input Engineering
Multimodal model customization and orchestration
WO2026135797A1Digital data information retrieval Machine learning User input Engineering
Real-time evaluation framework for ai-based assistants in collaborative environments
US20260178837A1Natural language translation Biological models Data pack User input
system
JP2026103409AOffice automation Resources Information processingNetwork generation
system
JP2026101233ACommerce Information processing User input

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure 2026105507000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] An input receiving means for receiving natural language information, either voice or text, entered by the user, A data analysis means for analyzing the format of the received natural language information, Contextual analysis tools that understand the context based on analyzed information and grasp cultural backgrounds and unique meanings, Translation generation means for generating translations and supplementary explanations based on the analyzed results, Data output means for presenting the aforementioned translation and supplementary explanations, Audio output means for outputting the aforementioned translation and supplementary explanations as audio, A system including a means for converting the aforementioned audio information into text.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] The problem to be solved by the present invention is to provide an understanding that takes into account context and background information, rather than just a language translation, for foreigners who have difficulty understanding Japanese-specific expressions and cultural backgrounds. In particular, there is a need to provide an environment that avoids misunderstandings between different cultures and realizes smooth communication.

Means for Solving the Problems

[0005] To solve the above problems, the present invention provides a context analysis means that receives natural language data input by the user, analyzes the format of the data to understand the context, and grasps the cultural background and unique meaning. This is achieved by providing a means to generate and present a translation and supplementary explanation that is easy for foreigners to understand based on the analysis results. Furthermore, by providing a feedback collection means, continuous performance improvement can be achieved.

[0006] A "user" is an entity that provides input to a system and receives the results.

[0007] "Natural language data" refers to data in everyday language formats, such as sentences and phrases entered by users.

[0008] An "input receiving means" is a function that receives natural language data entered by the user and passes it on to subsequent processing.

[0009] A "data analysis tool" is a function that understands the structure and format of received natural language data and processes the data to adapt it to the next processing stage.

[0010] "Contextual analysis tools" are functions that take into account context and background information to gain a deeper understanding of the meaning of natural language data.

[0011] A "translation generation means" is a function for generating translations and their supplementary explanations based on analyzed data.

[0012] "Data output means" refers to a function for presenting the generated translation and supplementary explanations to the user.

[0013] A "feedback collection method" is a function that collects feedback from users regarding their satisfaction with and understanding of inputs, and uses this feedback to improve the system.

[0014] "Encryption means" refers to the function of encrypting data in order to securely transfer input data to a server. [Brief explanation of the drawing]

[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine.

Embodiments for Carrying out the Invention

[0016] An example of an embodiment of the system according to the technology of the present disclosure will be described below with reference to the accompanying drawings.

[0017] First, the terms used in the following description will be explained.

[0018] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be one arithmetic unit or a combination of a plurality of arithmetic units. Also, the processor may be one type of arithmetic unit or a combination of a plurality of types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0019] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0020] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0023] [First Embodiment]

[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0036] One embodiment of the present invention begins with the user inputting a Japanese sentence or phrase to be translated into a terminal. The terminal converts the input natural language data into a digital format and securely transmits this data to a server using encryption. The server analyzes the received data using data analysis means and extracts basic information for understanding the context.

[0037] Subsequently, the server uses context analysis to gain a deep understanding of the natural language data, taking into account the meaning of words and their cultural context. Based on this analysis, the server's translation generation system generates an appropriate translation and its supplementary explanation. The generated translation and explanation are then transmitted from the server to the terminal via a data output system and presented to the user.

[0038] As a concrete example, consider a case where a user inputs the phrase "Gokurosama" (Thank you for your hard work). This phrase is generally used by superiors to subordinates, but a direct translation would not fully convey its nuance. The server uses context analysis to recognize this difference and simultaneously provides the user with the translation and explanation: "Thank you for your effort, typically said to someone of lesser status." In this way, the present invention goes beyond mere translation of words and phrases, and can present information that takes into account the underlying meaning and cultural context.

[0039] Furthermore, users can send feedback on the provided translations and explanations to the server via their device. The server utilizes feedback collection methods to accumulate this information and use it to improve the model, thereby enhancing the accuracy of translations and the quality of understanding. In this way, a system is realized that can overcome language barriers between cultures and support smooth communication.

[0040] The following describes the processing flow.

[0041] Step 1:

[0042] The user inputs Japanese sentences or phrases that need translation or understanding into the terminal. The terminal confirms this input and prepares the system to begin analysis.

[0043] Step 2:

[0044] The terminal converts the received natural language data into packet format and encrypts it for secure transmission to the server. The converted data is then sent to the server via the network.

[0045] Step 3:

[0046] The server receives data from the terminal. After verifying the accuracy of the received data, it performs preprocessing of the natural language data using data analysis tools. Here, the sentence structure is analyzed, and unnecessary spaces and grammatical errors are corrected as needed.

[0047] Step 4:

[0048] The context analysis mechanism within the server uses pre-processed data to gain a deep understanding of the text's context. This includes grasping the meaning of words within the text and their underlying cultural background. In this process, the server consults an internal database and leverages past similar cases to perform semantic analysis.

[0049] Step 5:

[0050] The server generates translations using translation generation tools based on the results of context analysis. The translations include supplementary explanations that are easy for foreign users to understand, going beyond simple literal translations.

[0051] Step 6:

[0052] The generated translation and its explanation are sent from the server to the terminal. The terminal decodes this data upon receipt and displays it in a user-readable format.

[0053] Step 7:

[0054] Regarding the translation results, users can send feedback to the server via their device based on their level of understanding and satisfaction. The feedback collection system stores this information and uses it to improve the system and increase accuracy in the future.

[0055] (Example 1)

[0056] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0057] In today's world, with the increasing frequency of intercultural communication, natural language translation is required to go beyond mere word-for-word conversion and encompass a deeper understanding that includes context and cultural background. Furthermore, there is a need for mechanisms to effectively utilize user feedback to improve translation accuracy, and for means of securely exchanging data while protecting personal information.

[0058] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0059] In this invention, the server includes an input receiving means that receives natural language information entered by a user and converts it into a digital format, an encryption means that encrypts the received natural language information to maintain confidentiality, and a context analysis means that understands the context from the analyzed information and grasps the cultural background and specific meanings. This allows the user to securely transmit natural language information to the server and receive an appropriate translation and supplementary explanation that takes cultural background into account.

[0060] An "input receiving means" is a mechanism for receiving natural language information entered by a user into a terminal.

[0061] "Encryption methods" refer to technologies that encrypt received natural language information in order to maintain confidentiality.

[0062] "Communication methods" refer to protocols and technologies for securely transmitting encrypted natural language information to a server.

[0063] "Data analysis means" refers to techniques or tools for analyzing received natural language information to understand the structure and basic meaning of a text.

[0064] "Contextual analysis methods" are technologies that perform processing to understand the background, context, and cultural meaning of analyzed information.

[0065] A "translation generation method" is a technology that generates translations based on the analyzed results and adds cultural supplementary explanations using a generation AI.

[0066] "Data output means" refers to means for displaying the generated translation and supplementary explanations to the user.

[0067] "Feedback collection methods" refer to technologies used to collect user feedback on translations and explanations and utilize it to improve the system.

[0068] A "communication protocol" is a set of rules and procedures used to securely transmit encrypted information to a server.

[0069] This system accurately translates natural language information entered by users and provides supplementary explanations that take into account the cultural background and context. Users input Japanese sentences or phrases they wish to translate using a terminal. The terminal converts this input information into a digital format and encrypts it using encryption methods, thereby maintaining the confidentiality of the information.

[0070] The terminal sends encrypted information to the server using a secure communication protocol. For example, the security of the information is ensured by using the HTTPS protocol. The receiving server analyzes the input information using data analysis tools to understand its grammar and basic meaning.

[0071] Next, the server uses context analysis tools to gain a deep understanding of the information's context. Specifically, it utilizes generative AI models (such as natural language processing models) to analyze the cultural background and specific meanings of the input phrases. Based on this analysis, the translation generation tool generates the optimal translation and adds supplementary explanations as needed.

[0072] Subsequently, the generated translation and supplementary explanations are sent to the terminal using a data output device and presented to the user. The user can review this and provide feedback as needed. The feedback is sent to the server through a feedback collection device and used to improve the model.

[0073] For example, if a user enters the phrase "Gokurosama," the system will not simply provide a direct translation of the words, but will also consider that the expression is generally used by superiors to subordinates, and will generate a translation and explanation such as "Thank you for your effort, typically said to someone of lesser status."

[0074] This allows users to gain valuable information that goes beyond simple translation, leading to a deeper understanding of its meaning. This system can be used to facilitate smoother intercultural communication.

[0075] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0076] Step 1:

[0077] The user inputs natural language sentences or phrases they want to translate into the device. The input information is saved to the device in text format. The device then converts this text data into a digital format and encrypts it using encryption methods. This ensures that the data is protected from unauthorized external access. The input is natural language text data, and the output is encrypted digital data.

[0078] Step 2:

[0079] The terminal sends encrypted digital data to the server using a secure communication protocol (e.g., HTTPS). This protocol ensures the confidentiality and integrity of the data. The input is encrypted digital data, and the output is a secure data transfer to the server.

[0080] Step 3:

[0081] The server decrypts the received encrypted data, returning it to its original digital format. It then uses data analysis tools to analyze the text structure. This analysis identifies grammatical elements and tokens within the text. The input is encrypted digital data, and the output is the analyzed text data.

[0082] Step 4:

[0083] The server deeply understands the context of the text data analyzed using contextual analysis tools. Specifically, it leverages generative AI models to extract the cultural background and specific meanings of the input text. The information obtained through this process is used as the context necessary for translation generation. The input is the analyzed text data, and the output is contextual information.

[0084] Step 5:

[0085] The server uses a translation generation mechanism to generate the optimal translation and its supplementary explanation based on the acquired contextual information. A generative AI model participates in this process to create natural and culturally appropriate translations. The input is contextual information, and the output is the translation result and its supplementary explanation.

[0086] Step 6:

[0087] The server sends the generated translation and supplementary explanations to the terminal via a data output mechanism. The terminal receives this data and renders it in an appropriate format for visual presentation to the user. The input is the translation result and its supplementary explanations, and the output is the visual display on the terminal.

[0088] Step 7:

[0089] If a user has feedback on the presented translation results and explanations, they send that feedback to the server via their device. The server receives this information using feedback collection tools and uses it to improve the translation model and analysis methods. This improves the overall quality of the system. The input is user feedback, and the output is the accumulation of feedback data to the server.

[0090] (Application Example 1)

[0091] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0092] Overcoming language barriers that people from different linguistic and cultural backgrounds face in natural communication is crucial. This challenge cannot be solved without translations that consider context, cultural background, and linguistic nuances, rather than simply providing literal translations. However, current technology suffers from a lack of naturalness and accuracy, particularly in real-time, two-way oral communication.

[0093] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0094] In this invention, the server includes input receiving means for receiving natural language information in the form of voice or text entered by a user, data analysis means for analyzing the format of the received natural language information, and context analysis means for understanding the context based on the analyzed information and grasping the cultural background and unique meaning. This enables natural and highly accurate communication between people with different languages and cultures.

[0095] "Input receiving means" refers to a device or function that has the role of receiving natural language information, such as voice or text, entered by a user.

[0096] "Data analysis means" refers to a device or function for analyzing the format of received natural language information and extracting necessary information.

[0097] A "context analysis tool" is a device or function used to understand context based on analyzed information and to grasp cultural background and unique meanings.

[0098] "Translation generation means" refers to a device or function for generating a translation and its supplementary explanation based on the analyzed results.

[0099] "Data output means" refers to a device or function that is responsible for presenting the generated translation and supplementary explanations to the user.

[0100] "Audio output means" refers to a device or function for outputting translations and supplementary explanations as audio.

[0101] "Means for converting speech to text" refers to a device or function for converting speech information into text.

[0102] "Feedback collection means" refers to a device or function that receives feedback from users and uses it to improve the performance of the system.

[0103] "Encryption means" refers to a device or function that encrypts information in order to securely transfer received natural language information to a server.

[0104] To implement this invention, first, the user inputs natural language information to be translated in voice or text format into a terminal. The terminal is equipped with an input receiving means to receive the input information and processes this information digitally. When using voice input, a voice-to-text conversion means such as Google® Cloud Speech-to-Text API is used to convert the voice information into text data.

[0105] The converted text information is encrypted using AES encryption with Python's cryptography library and securely transmitted to the server. The server analyzes the received information using data analysis tools such as the BERT model from the Transformers library to understand the context, and generates a contextually appropriate translation and supplementary explanation using a translation generation tool such as the DeepL API. The generated information is sent to the terminal by a data output tool and presented to the user, while the generated translation and supplementary explanation are simultaneously output as audio using a speech output tool that utilizes the Google Text-to-Speech API.

[0106] A concrete example would be a user who wants to try to converse with a local while traveling. The user could type "Thank you for your help" and receive a translation that accurately expresses its nuance. The system would generate a translation and explanation such as "Thank you for the care and support you have provided," supporting the conversation in a natural flow. An example of a prompt to the generating AI model in this case would be text like, "Please enter the Japanese sentence or phrase you would like to translate. Then, please provide an appropriate translation and explanation, taking its cultural context into consideration."

[0107] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0108] Step 1:

[0109] The device receives voice or text input from the user. Specifically, the user inputs the phrase they want to translate into voice or text on their smartphone or tablet. In the case of voice input, the Google Cloud Speech-to-Text API is used to convert the voice data into text. At this point, the input is voice or text information, and the output is digital text data.

[0110] Step 2:

[0111] The terminal encrypts the received text information using AES encryption. Specifically, it uses a cryptography library to encrypt the text data and ensure its security. This encrypted text data is then sent to the server.

[0112] Step 3:

[0113] The server receives encrypted data sent from the terminal and decrypts it. Using the decrypted text data as input, it performs data analysis using models such as the BERT model from the Transformers library. Through data analysis, it extracts basic information for context and semantic understanding, and outputs contextual information as the analysis result.

[0114] Step 4:

[0115] The server performs contextual analysis based on the extracted contextual information. This analysis utilizes contextual analysis tools to understand cultural backgrounds and language-specific nuances from the extracted information. At this stage, the input is contextual information, and the output is detailed contextual information derived from further analysis.

[0116] Step 5:

[0117] The server generates translations using translation generation methods such as the DeepL API, based on detailed contextual information, and also creates supplementary explanations. These translations include contextually appropriate expressions. The input is detailed contextual information, and the output is the translated result and its supplementary explanations.

[0118] Step 6:

[0119] The server sends the generated translation and supplementary explanations to the terminal, which displays the translation results on the screen as a data output. Simultaneously, it uses the Google Text-to-Speech API to convert the translated content into speech and play it back. The input is the translation results and supplementary explanations, and the output is the screen display and audio output.

[0120] Step 7:

[0121] The user inputs feedback on the provided translation into the terminal. The terminal sends the feedback information to the server, which stores it using a feedback collection mechanism and uses it to improve context analysis and translation generation. The input in this step is user feedback, and the output is feedback data used for improvement.

[0122] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0123] In an embodiment of the present invention, first, the user inputs Japanese text or phrases they wish to translate or understand using a terminal. This input natural language data is received by the terminal and securely transmitted to a server using encryption. The server analyzes the received data and obtains basic information to facilitate contextual understanding.

[0124] The analyzed data is then analyzed by a context analysis tool on the server to understand its context and cultural background. Subsequently, a translation generation tool generates appropriate translations and supplementary explanations. This process incorporates mechanisms to present the information requested by the user as accurately and naturally as possible.

[0125] Furthermore, the server uses an emotion engine to recognize the user's emotions from the analyzed data. Based on this emotion recognition, the translation and supplementary explanations are adjusted. For example, if the emotion engine detects that the user is surprised or dissatisfied with a particular phrase, it will present a translation result that takes into account an appropriate response to that emotion.

[0126] For example, if a user enters the phrase "Why is it so cold?", the emotion engine will detect feelings of surprise or dissatisfaction. As a result, the generated translation will include an explanation that reflects the emotion, such as "It's surprisingly cold today, isn't it?".

[0127] Finally, the translation and explanation generated by the server are sent to the terminal and displayed to the user. Users can provide feedback on the information provided, and this feedback is collected by a feedback collection mechanism and used to improve the overall system, including the sentiment engine. Continuous system optimization based on feedback makes it possible to provide a more accurate and personalized translation service.

[0128] The following describes the processing flow.

[0129] Step 1:

[0130] The user inputs Japanese text that needs to be translated or understood into the device. The device receives this natural language data and prepares it for processing.

[0131] Step 2:

[0132] The terminal encrypts the entered data to securely transmit it to the server, then packets it and sends it to the server via the network.

[0133] Step 3:

[0134] The server receives data from the terminal for analysis. It checks the format of the received data, parses the sentence structure as a preprocessing step, and performs sorting and necessary transformations.

[0135] Step 4:

[0136] The context analysis mechanism within the server uses the analyzed data to understand the context. At this stage, cultural backgrounds and specific nuances are also grasped, which influence subsequent processing.

[0137] Step 5:

[0138] The emotion engine recognizes the user's emotions from pre-processed data. For example, emotions such as joy, anger, sadness, and happiness can be inferred from the tone of the text and specific keywords.

[0139] Step 6:

[0140] The server uses a translation generation mechanism to generate translations and supplementary explanations that take into account the user's emotional state. Based on the recognized emotion, the most appropriate expression is selected.

[0141] Step 7:

[0142] After formalizing and encrypting the generated translation and supplementary explanations, the server sends this data back to the terminal.

[0143] Step 8:

[0144] The terminal receives and decodes the translation results sent from the server and displays them to the user. The user reviews the translation and enters feedback if necessary.

[0145] Step 9:

[0146] User feedback is sent to the server via the device and collected by feedback collection tools. This feedback is used to improve the sentiment engine and translation accuracy.

[0147] (Example 2)

[0148] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0149] Conventional machine translation systems often fail to adequately consider context and cultural background, and cannot reflect the user's emotions, making it difficult to provide the information the user truly needs. There is a need to overcome this lack of accuracy and unnaturalness, and to provide a translation service that satisfies users.

[0150] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0151] In this invention, the server includes an input receiving means for receiving natural language information entered by the user, an information analysis means for analyzing the form of the received natural language information, a context analysis means for understanding the context based on the analyzed information and grasping the cultural background and unique meaning, a translation generation means for generating translations and supplementary information based on the analyzed results, and an emotion recognition means for analyzing the user's emotions using generational AI technology and adjusting the translated content based on those emotions. This makes it possible to provide a natural and accurate translation service that takes into account the user's context and emotions.

[0152] "Natural language information" refers to data expressed in human language that users input into a computer system.

[0153] "Input receiving means" refers to a function or device for receiving natural language information sent by a user.

[0154] "Information analysis means" refers to functions or technologies for examining and analyzing the form and structure of received natural language information.

[0155] "Contextual analysis means" refers to functions and methods for understanding the context, background, and cultural elements of analyzed information.

[0156] "Translation generation means" refers to a function or technology that generates translations and supplementary information in a natural way based on the results of contextual analysis.

[0157] "Emotion recognition means" refers to functions and technologies that use generative AI technology to analyze a user's emotions and adjust information and services accordingly.

[0158] "Encryption methods" refer to technologies and devices that encrypt data in order to securely protect information received or transmitted.

[0159] "Information output means" refers to methods or devices for providing the generated translation or supplementary explanation to the user.

[0160] "Feedback collection methods" refer to functions or technologies that collect user reactions and opinions and use them to improve the system.

[0161] First, the user uses their device to input natural language information in Japanese that they want to translate or understand. This input information is received by the device and securely transmitted to the server using encryption. Common encryption protocols are used for encryption.

[0162] The server first analyzes the received information using information analysis tools. Natural language processing techniques are used for this analysis. Specifically, morphological and syntactic analysis are performed to understand the form and structure of the information. At this stage, contextual analysis tools consider the cultural background and emotions of the information to understand its context and grasp its meaning.

[0163] Next, the server uses translation generation tools to generate a translation based on the analyzed context. This process utilizes a generative AI model. The generated translation includes supplementary information for the user. At this stage, the user's emotions are analyzed by emotion recognition tools, and the translation is adjusted accordingly. Specifically, appropriate nuances of emotion are incorporated into the translation.

[0164] For example, if a user types "Why is it so cold?" into their device, the AI model will instruct the engine using the prompt "Translate the following sentence and consider any emotional cues: 'Why is it so cold?'". Based on this instruction, the system will generate an emotionally sensitive translation such as "It's surprisingly cold today, isn't it?".

[0165] Finally, the server sends the generated translation and supplementary explanations to the terminal via an information output device and displays them to the user. The user can provide feedback based on the presented information, and this feedback is used to improve the system through a feedback collection device. In this way, a more accurate and personalized translation service can be provided.

[0166] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0167] Step 1:

[0168] The user uses a terminal to input Japanese natural language information that they wish to translate or understand. This input information is received by the terminal's input receiving mechanism. The received data is prepared for processing as string data. At this stage, the input is the user's natural language sentence, and the output is that sentence data.

[0169] Step 2:

[0170] The terminal encrypts the received natural language information using encryption methods and securely transmits it to the server. Encryption protects the information from being intercepted by third parties. This encrypted data is received by the server and ready for analysis. At this stage, the input is the user's encrypted text data, and the output is the encrypted data transferred to the server.

[0171] Step 3:

[0172] The server analyzes the received data using information analysis tools. First, it decrypts the data by removing the encryption. Next, it performs morphological analysis to break it down into individual words. It uses syntactic analysis to understand the structure of the entire sentence. This analysis extracts grammatical information and semantic information of words, which becomes the basis for the next processing. At this stage, the input is the decrypted text data, and the output is the analyzed grammatical information and semantic information of words.

[0173] Step 4:

[0174] The server uses contextual analysis tools to understand the context and cultural background from the analyzed information. Contextual understanding may involve retrieving relevant information from external databases. This allows the server to grasp the cultural meanings and usage contexts of words and sentences. The input for this process is the grammatical and semantic information obtained in the previous stage, while the output is contextual and cultural background information.

[0175] Step 5:

[0176] The server generates translations using a translation generation method that leverages a generative AI model. In this process, a prompt sentence is formed, prompting the AI model to consider translation and sentiment. An example prompt sentence is used: "Translate the following sentence and consider any emotional cues: 'Why is it so cold?'" The generated translation will reflect nuances that match the user's expectations. The input at this stage is contextual and cultural background information, and the output is the generated translation.

[0177] Step 6:

[0178] The server analyzes the user's emotions through emotion recognition mechanisms. Based on the analysis results, it adjusts the translated text. For example, if the user expresses surprise or dissatisfaction, the server incorporates those nuances into the translated text. In this process, the input is the user's emotional information, and the output is the adjusted translated text.

[0179] Step 7:

[0180] The server transmits the generated translation and supplementary information to the terminal via an information output device. The terminal makes the translation available for the user to review by displaying the translation results. Finally, the user can provide feedback on this translation. The input at this processing stage is the adjusted translation, and the output is the information displayed to the user.

[0181] (Application Example 2)

[0182] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0183] In modern society, there is a growing demand for natural communication between robots and humans within the home. However, conventional technologies have not adequately achieved the ability to provide appropriate translations and explanations based on context and emotions, making it difficult to realize dialogues that accurately reflect the user's intentions. Solving these challenges is essential.

[0184] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0185] In this invention, the server includes data receiving means for receiving language data input by the user, data analysis means for analyzing the received language data, and translation generation means for generating translations and supplementary explanations based on the analysis results and making adjustments that take emotions into consideration. This makes it possible for a home robot to provide natural dialogue that is in tune with the context and the user's emotions.

[0186] A "data receiving means" is a mechanism for receiving language data transmitted by a user from a terminal.

[0187] "Data analysis means" refers to a function that performs a process to analyze the structure and meaning of received linguistic data and to understand the context.

[0188] A "contextual analysis tool" is a mechanism that uses analyzed data to understand the cultural background and specific meanings of language, thereby improving the accuracy of translation.

[0189] The "translation generation method" is a function that generates appropriate translations and supplementary explanations using contextual analysis results, and makes adjustments that take into account the user's emotions.

[0190] A "data presentation method" is a mechanism for providing the generated translation and explanation to the user in audio or visual form.

[0191] A "feedback collection mechanism" is a system that collects user feedback and uses it to improve the system's translation accuracy and user satisfaction.

[0192] "Information protection measures" are mechanisms designed to prevent unauthorized access to or leakage of data when securely transferring received data to an external information processing device.

[0193] The system for implementing this invention is primarily designed for home robots. The core of the invention lies in its ability to receive natural language data input by the user in real time and to analyze it appropriately. The system includes the following main functions:

[0194] The home robot receives user speech and text input via a data receiving device. The hardware used is a processor unit equipped with a high-performance speech recognition sensor, specifically utilizing a general-purpose speech technology platform for speech recognition. The received data is securely transferred to a server. During this process, the data is encrypted using information protection measures and designed to prevent unauthorized external access.

[0195] The server analyzes the received data using data analysis tools. A natural language processing engine is used for analysis to understand the format and context of the linguistic data. Next, contextual analysis tools are used to generate appropriate translations and supplementary explanations, taking into account the user's cultural background and emotions. A sentiment analysis module is also incorporated to include emotions in the translation generation process. The generated results are provided to the user via voice or display.

[0196] As a concrete example, if a user asks the robot, "Which fruit is the sweetest?", the server analyzes the input and generates and presents a friendly, emotion-conscious answer such as, "Bananas are generally considered very sweet, but it depends on how ripe the fruit is." Through this process, feedback collection mechanisms ensure that user responses and additional questions are continuously used to improve the system.

[0197] An example of a prompt using a generative AI model is as follows: "Analyze the user's question and, based on contextual understanding, generate a natural translation and a sentiment-sensitive explanation. Question: 'Which fruit is the sweetest?'"

[0198] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0199] Step 1:

[0200] The terminal receives user input as speech or text. The natural language input is converted into text using speech recognition technology if it's speech. This converted text then becomes the input for subsequent processes.

[0201] Step 2:

[0202] The terminal encrypts the received text data and sends it to the server. The encryption process protects the data from unauthorized access. The server receives this encrypted data and decrypts it to obtain analyzable text.

[0203] Step 3:

[0204] The server analyzes the structure of text data using data analysis tools. This process includes grammatical analysis and word semantic analysis, laying the foundation for contextual understanding. The result is structured language data.

[0205] Step 4:

[0206] The server understands the context and cultural background based on the analyzed data, using contextual analysis tools. Cultural elements and emotional information related to the data are then added. This contributes to improving the accuracy of translation and explanation.

[0207] Step 5:

[0208] The server uses translation generation tools to generate context-aware translations and explanations. Leveraging a generation AI model, it obtains output that appropriately reflects emotions. This results in a natural translation that includes supplementary information and nuances.

[0209] Step 6:

[0210] The server sends the generated translation results to the terminal. The terminal presents this to the user either verbally or by displaying it on the screen. The output information is based on the user's emotions and intentions.

[0211] Step 7:

[0212] Users evaluate the provided translations and explanations and provide feedback. This feedback is sent to the server via the device and used by the feedback collection system to improve the entire system.

[0213] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0214] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0215] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0216] [Second Embodiment]

[0217] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0218] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0219] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0220] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0221] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0222] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0223] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0224] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0225] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0226] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0227] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0228] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0229] One embodiment of the present invention begins with the user inputting a Japanese sentence or phrase to be translated into a terminal. The terminal converts the input natural language data into a digital format and securely transmits this data to a server using encryption. The server analyzes the received data using data analysis means and extracts basic information for understanding the context.

[0230] Subsequently, the server uses context analysis to gain a deep understanding of the natural language data, taking into account the meaning of words and their cultural context. Based on this analysis, the server's translation generation system generates an appropriate translation and its supplementary explanation. The generated translation and explanation are then transmitted from the server to the terminal via a data output system and presented to the user.

[0231] As a concrete example, consider a case where a user inputs the phrase "Gokurosama" (Thank you for your hard work). This phrase is generally used by superiors to subordinates, but a direct translation would not fully convey its nuance. The server uses context analysis to recognize this difference and simultaneously provides the user with the translation and explanation: "Thank you for your effort, typically said to someone of lesser status." In this way, the present invention goes beyond mere translation of words and phrases, and can present information that takes into account the underlying meaning and cultural context.

[0232] Furthermore, users can send feedback on the provided translations and explanations to the server via their device. The server utilizes feedback collection methods to accumulate this information and use it to improve the model, thereby enhancing the accuracy of translations and the quality of understanding. In this way, a system is realized that can overcome language barriers between cultures and support smooth communication.

[0233] The following describes the processing flow.

[0234] Step 1:

[0235] The user inputs Japanese sentences or phrases that need translation or understanding into the terminal. The terminal confirms this input and prepares the system to begin analysis.

[0236] Step 2:

[0237] The terminal converts the received natural language data into packet format and encrypts it for secure transmission to the server. The converted data is then sent to the server via the network.

[0238] Step 3:

[0239] The server receives data from the terminal. After verifying the accuracy of the received data, it performs preprocessing of the natural language data using data analysis tools. Here, the sentence structure is analyzed, and unnecessary spaces and grammatical errors are corrected as needed.

[0240] Step 4:

[0241] The context analysis mechanism within the server uses pre-processed data to gain a deep understanding of the text's context. This includes grasping the meaning of words within the text and their underlying cultural background. In this process, the server consults an internal database and leverages past similar cases to perform semantic analysis.

[0242] Step 5:

[0243] The server generates translations using translation generation tools based on the results of context analysis. The translations include supplementary explanations that are easy for foreign users to understand, going beyond simple literal translations.

[0244] Step 6:

[0245] The generated translation and its explanation are sent from the server to the terminal. The terminal decodes this data upon receipt and displays it in a user-readable format.

[0246] Step 7:

[0247] Regarding the translation results, users can send feedback to the server via their device based on their level of understanding and satisfaction. The feedback collection system stores this information and uses it to improve the system and increase accuracy in the future.

[0248] (Example 1)

[0249] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0250] In today's world, with the increasing frequency of intercultural communication, natural language translation is required to go beyond mere word-for-word conversion and encompass a deeper understanding that includes context and cultural background. Furthermore, there is a need for mechanisms to effectively utilize user feedback to improve translation accuracy, and for means of securely exchanging data while protecting personal information.

[0251] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0252] In this invention, the server includes an input receiving means that receives natural language information entered by a user and converts it into a digital format, an encryption means that encrypts the received natural language information to maintain confidentiality, and a context analysis means that understands the context from the analyzed information and grasps the cultural background and specific meanings. This allows the user to securely transmit natural language information to the server and receive an appropriate translation and supplementary explanation that takes cultural background into account.

[0253] An "input receiving means" is a mechanism for receiving natural language information entered by a user into a terminal.

[0254] "Encryption methods" refer to technologies that encrypt received natural language information in order to maintain confidentiality.

[0255] "Communication methods" refer to protocols and technologies for securely transmitting encrypted natural language information to a server.

[0256] "Data analysis means" refers to techniques or tools for analyzing received natural language information to understand the structure and basic meaning of a text.

[0257] "Contextual analysis methods" are technologies that perform processing to understand the background, context, and cultural meaning of analyzed information.

[0258] A "translation generation method" is a technology that generates translations based on the analyzed results and adds cultural supplementary explanations using a generation AI.

[0259] "Data output means" refers to means for displaying the generated translation and supplementary explanations to the user.

[0260] "Feedback collection methods" refer to technologies used to collect user feedback on translations and explanations and utilize it to improve the system.

[0261] A "communication protocol" is a set of rules and procedures used to securely transmit encrypted information to a server.

[0262] This system accurately translates natural language information entered by users and provides supplementary explanations that take into account the cultural background and context. Users input Japanese sentences or phrases they wish to translate using a terminal. The terminal converts this input information into a digital format and encrypts it using encryption methods, thereby maintaining the confidentiality of the information.

[0263] The terminal sends encrypted information to the server using a secure communication protocol. For example, the security of the information is ensured by using the HTTPS protocol. The receiving server analyzes the input information using data analysis tools to understand its grammar and basic meaning.

[0264] Next, the server uses context analysis tools to gain a deep understanding of the information's context. Specifically, it utilizes generative AI models (such as natural language processing models) to analyze the cultural background and specific meanings of the input phrases. Based on this analysis, the translation generation tool generates the optimal translation and adds supplementary explanations as needed.

[0265] Subsequently, the generated translation and supplementary explanations are sent to the terminal using a data output device and presented to the user. The user can review this and provide feedback as needed. The feedback is sent to the server through a feedback collection device and used to improve the model.

[0266] For example, if a user enters the phrase "Gokurosama," the system will not simply provide a direct translation of the words, but will also consider that the expression is generally used by superiors to subordinates, and will generate a translation and explanation such as "Thank you for your effort, typically said to someone of lesser status."

[0267] This allows users to gain valuable information that goes beyond simple translation, leading to a deeper understanding of its meaning. This system can be used to facilitate smoother intercultural communication.

[0268] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0269] Step 1:

[0270] The user inputs natural language sentences or phrases they want to translate into the device. The input information is saved to the device in text format. The device then converts this text data into a digital format and encrypts it using encryption methods. This ensures that the data is protected from unauthorized external access. The input is natural language text data, and the output is encrypted digital data.

[0271] Step 2:

[0272] The terminal sends encrypted digital data to the server using a secure communication protocol (e.g., HTTPS). This protocol ensures the confidentiality and integrity of the data. The input is encrypted digital data, and the output is a secure data transfer to the server.

[0273] Step 3:

[0274] The server decrypts the received encrypted data, returning it to its original digital format. It then uses data analysis tools to analyze the text structure. This analysis identifies grammatical elements and tokens within the text. The input is encrypted digital data, and the output is the analyzed text data.

[0275] Step 4:

[0276] The server deeply understands the context of the text data analyzed using contextual analysis tools. Specifically, it leverages generative AI models to extract the cultural background and specific meanings of the input text. The information obtained through this process is used as the context necessary for translation generation. The input is the analyzed text data, and the output is contextual information.

[0277] Step 5:

[0278] The server uses a translation generation mechanism to generate the optimal translation and its supplementary explanation based on the acquired contextual information. A generative AI model participates in this process to create natural and culturally appropriate translations. The input is contextual information, and the output is the translation result and its supplementary explanation.

[0279] Step 6:

[0280] The server sends the generated translation and supplementary explanations to the terminal via a data output mechanism. The terminal receives this data and renders it in an appropriate format for visual presentation to the user. The input is the translation result and its supplementary explanations, and the output is the visual display on the terminal.

[0281] Step 7:

[0282] If the user has feedback on the presented translation result and explanation, the user transmits the feedback to the server via the terminal. The server utilizes feedback collection means to receive this information and uses it to improve the translation model and analysis method. This improves the overall quality of the system. The input is the user's feedback, and the output is the accumulation of feedback data to the server.

[0283] (Application Example 1)

[0284] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0285] People with different languages and cultural backgrounds are required to overcome the language barriers they face when communicating naturally. This problem cannot be solved without providing translations that take into account context, cultural background, and linguistic nuances, rather than just literal translations. However, current technologies have the problem of lacking naturalness and accuracy, especially in oral and real-time two-way communication.

[0286] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0287] In this invention, the server includes an input receiving means for receiving natural language information of voice or text input by the user, a data analysis means for analyzing the format of the received natural language information, and a context analysis means for understanding the context based on the analyzed information and grasping the cultural background and unique meaning. This enables communication between people with different languages and cultures in a natural and highly accurate manner.

[0288] The "input receiving means" is a device or function having the role of receiving natural language information of voice or text input by the user.

[0289] "Data analysis means" refers to a device or function for analyzing the format of received natural language information and extracting necessary information.

[0290] A "context analysis tool" is a device or function used to understand context based on analyzed information and to grasp cultural background and unique meanings.

[0291] "Translation generation means" refers to a device or function for generating a translation and its supplementary explanation based on the analyzed results.

[0292] "Data output means" refers to a device or function that is responsible for presenting the generated translation and supplementary explanations to the user.

[0293] "Audio output means" refers to a device or function for outputting translations and supplementary explanations as audio.

[0294] "Means for converting speech to text" refers to a device or function for converting speech information into text.

[0295] "Feedback collection means" refers to a device or function that receives feedback from users and uses it to improve the performance of the system.

[0296] "Encryption means" refers to a device or function that encrypts information in order to securely transfer received natural language information to a server.

[0297] To implement this invention, first, the user inputs natural language information to be translated in voice or text format into a terminal. The terminal is equipped with an input receiving means to receive the input information and processes this information digitally. When using voice input, a voice-to-text conversion means such as the Google Cloud Speech-to-Text API is used to convert the voice information into text data.

[0298] The converted text information is encrypted using AES encryption with Python's cryptography library and securely transmitted to the server. The server analyzes the received information using data analysis tools such as the BERT model from the Transformers library to understand the context, and generates a contextually appropriate translation and supplementary explanation using a translation generation tool such as the DeepL API. The generated information is sent to the terminal by a data output tool and presented to the user, while the generated translation and supplementary explanation are simultaneously output as audio using a speech output tool that utilizes the Google Text-to-Speech API.

[0299] A concrete example would be a user who wants to try to converse with a local while traveling. The user could type "Thank you for your help" and receive a translation that accurately expresses its nuance. The system would generate a translation and explanation such as "Thank you for the care and support you have provided," supporting the conversation in a natural flow. An example of a prompt to the generating AI model in this case would be text like, "Please enter the Japanese sentence or phrase you would like to translate. Then, please provide an appropriate translation and explanation, taking its cultural context into consideration."

[0300] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0301] Step 1:

[0302] The device receives voice or text input from the user. Specifically, the user inputs the phrase they want to translate into voice or text on their smartphone or tablet. In the case of voice input, the Google Cloud Speech-to-Text API is used to convert the voice data into text. At this point, the input is voice or text information, and the output is digital text data.

[0303] Step 2:

[0304] The terminal encrypts the received text information using AES encryption means. Specifically, it encrypts the text data using the cryptography library to ensure security. This encrypted text data serves as the input to the server.

[0305] Step 3:

[0306] The server receives the encrypted data sent from the terminal and decrypts it. Using the decrypted text data as input, it performs data analysis using models such as the BERT model of the Transformers library. It extracts basic information for context and meaning understanding from the data analysis and outputs context information as the analysis result.

[0307] Step 4:

[0308] The server conducts context analysis based on the extracted context information. Context analysis means are used in this analysis to understand cultural backgrounds and language-specific nuances from the extracted information. The input at this stage is the context information, and the output is detailed context information from further analysis.

[0309] Step 5:

[0310] The server uses the detailed context information to generate a translation using translation generation means such as the DeepL API and also creates supplementary explanations for it. This translation includes appropriate expressions based on the context. The input is the detailed context information, and the output is the translation result and its supplementary explanations.

[0311] Step 6:

[0312] The server sends the generated translation and supplementary explanations to the terminal, and the terminal displays the translation result on the screen as data output means. At the same time, it uses the Google Text-to-Speech API to vocalize and play the translation content through voice output means. The input is the translation result and supplementary explanations, and the output is screen display and voice output.

[0313] Step 7:

[0314] The user inputs feedback on the provided translation into the terminal. The terminal sends the feedback information to the server, which stores it using a feedback collection mechanism and uses it to improve context analysis and translation generation. The input in this step is user feedback, and the output is feedback data used for improvement.

[0315] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0316] In an embodiment of the present invention, first, the user inputs Japanese text or phrases they wish to translate or understand using a terminal. This input natural language data is received by the terminal and securely transmitted to a server using encryption. The server analyzes the received data and obtains basic information to facilitate contextual understanding.

[0317] The analyzed data is then analyzed by a context analysis tool on the server to understand its context and cultural background. Subsequently, a translation generation tool generates appropriate translations and supplementary explanations. This process incorporates mechanisms to present the information requested by the user as accurately and naturally as possible.

[0318] Furthermore, the server uses an emotion engine to recognize the user's emotions from the analyzed data. Based on this emotion recognition, the translation and supplementary explanations are adjusted. For example, if the emotion engine detects that the user is surprised or dissatisfied with a particular phrase, it will present a translation result that takes into account an appropriate response to that emotion.

[0319] For example, if a user enters the phrase "Why is it so cold?", the emotion engine will detect feelings of surprise or dissatisfaction. As a result, the generated translation will include an explanation that reflects the emotion, such as "It's surprisingly cold today, isn't it?".

[0320] Finally, the translation and explanation generated by the server are sent to the terminal and displayed to the user. Users can provide feedback on the information provided, and this feedback is collected by a feedback collection mechanism and used to improve the overall system, including the sentiment engine. Continuous system optimization based on feedback makes it possible to provide a more accurate and personalized translation service.

[0321] The following describes the processing flow.

[0322] Step 1:

[0323] The user inputs Japanese text that needs to be translated or understood into the device. The device receives this natural language data and prepares it for processing.

[0324] Step 2:

[0325] The terminal encrypts the entered data to securely transmit it to the server, then packets it and sends it to the server via the network.

[0326] Step 3:

[0327] The server receives data from the terminal for analysis. It checks the format of the received data, parses the sentence structure as a preprocessing step, and performs sorting and necessary transformations.

[0328] Step 4:

[0329] The context analysis mechanism within the server uses the analyzed data to understand the context. At this stage, cultural backgrounds and specific nuances are also grasped, which influence subsequent processing.

[0330] Step 5:

[0331] The emotion engine recognizes the user's emotions from pre-processed data. For example, emotions such as joy, anger, sadness, and happiness can be inferred from the tone of the text and specific keywords.

[0332] Step 6:

[0333] The server uses a translation generation mechanism to generate translations and supplementary explanations that take into account the user's emotional state. Based on the recognized emotion, the most appropriate expression is selected.

[0334] Step 7:

[0335] After formalizing and encrypting the generated translation and supplementary explanations, the server sends this data back to the terminal.

[0336] Step 8:

[0337] The terminal receives and decodes the translation results sent from the server and displays them to the user. The user reviews the translation and enters feedback if necessary.

[0338] Step 9:

[0339] User feedback is sent to the server via the device and collected by feedback collection tools. This feedback is used to improve the sentiment engine and translation accuracy.

[0340] (Example 2)

[0341] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0342] Conventional machine translation systems often fail to adequately consider context and cultural background, and cannot reflect the user's emotions, making it difficult to provide the information the user truly needs. There is a need to overcome this lack of accuracy and unnaturalness, and to provide a translation service that satisfies users.

[0343] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0344] In this invention, the server includes an input receiving means for receiving natural language information entered by the user, an information analysis means for analyzing the form of the received natural language information, a context analysis means for understanding the context based on the analyzed information and grasping the cultural background and unique meaning, a translation generation means for generating translations and supplementary information based on the analyzed results, and an emotion recognition means for analyzing the user's emotions using generational AI technology and adjusting the translated content based on those emotions. This makes it possible to provide a natural and accurate translation service that takes into account the user's context and emotions.

[0345] "Natural language information" refers to data expressed in human language that users input into a computer system.

[0346] "Input receiving means" refers to a function or device for receiving natural language information sent by a user.

[0347] "Information analysis means" refers to functions or technologies for examining and analyzing the form and structure of received natural language information.

[0348] "Contextual analysis means" refers to functions and methods for understanding the context, background, and cultural elements of analyzed information.

[0349] "Translation generation means" refers to a function or technology that generates translations and supplementary information in a natural way based on the results of contextual analysis.

[0350] "Emotion recognition means" refers to functions and technologies that use generative AI technology to analyze a user's emotions and adjust information and services accordingly.

[0351] "Encryption methods" refer to technologies and devices that encrypt data in order to securely protect information received or transmitted.

[0352] "Information output means" refers to methods or devices for providing the generated translation or supplementary explanation to the user.

[0353] "Feedback collection methods" refer to functions or technologies that collect user reactions and opinions and use them to improve the system.

[0354] First, the user uses their device to input natural language information in Japanese that they want to translate or understand. This input information is received by the device and securely transmitted to the server using encryption. Common encryption protocols are used for this encryption.

[0355] The server first analyzes the received information using information analysis tools. Natural language processing techniques are used for this analysis. Specifically, morphological and syntactic analysis are performed to understand the form and structure of the information. At this stage, contextual analysis tools consider the cultural background and emotions of the information to understand its context and grasp its meaning.

[0356] Next, the server uses translation generation tools to generate a translation based on the analyzed context. This process utilizes a generative AI model. The generated translation includes supplementary information for the user. At this stage, the user's emotions are analyzed by emotion recognition tools, and the translation is adjusted accordingly. Specifically, appropriate nuances of emotion are incorporated into the translation.

[0357] For example, if a user types "Why is it so cold?" into their device, the AI model will instruct the engine using the prompt "Translate the following sentence and consider any emotional cues: 'Why is it so cold?'". Based on this instruction, the system will generate an emotionally sensitive translation such as "It's surprisingly cold today, isn't it?".

[0358] Finally, the server sends the generated translation and supplementary explanations to the terminal via an information output device and displays them to the user. The user can provide feedback based on the presented information, and this feedback is used to improve the system through a feedback collection device. In this way, a more accurate and personalized translation service can be provided.

[0359] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0360] Step 1:

[0361] The user uses a terminal to input Japanese natural language information that they wish to translate or understand. This input information is received by the terminal's input receiving mechanism. The received data is prepared for processing as string data. At this stage, the input is the user's natural language sentence, and the output is that sentence data.

[0362] Step 2:

[0363] The terminal encrypts the received natural language information using encryption methods and securely transmits it to the server. Encryption protects the information from being intercepted by third parties. This encrypted data is received by the server and ready for analysis. At this stage, the input is the user's encrypted text data, and the output is the encrypted data transferred to the server.

[0364] Step 3:

[0365] The server analyzes the received data using information analysis tools. First, it decrypts the data by removing the encryption. Next, it performs morphological analysis to break it down into individual words. It uses syntactic analysis to understand the structure of the entire sentence. This analysis extracts grammatical information and semantic information of words, which becomes the basis for the next processing. At this stage, the input is the decrypted text data, and the output is the analyzed grammatical information and semantic information of words.

[0366] Step 4:

[0367] The server uses contextual analysis tools to understand the context and cultural background from the analyzed information. Contextual understanding may involve retrieving relevant information from external databases. This allows the server to grasp the cultural meanings and usage contexts of words and sentences. The input for this process is the grammatical and semantic information obtained in the previous stage, while the output is contextual and cultural background information.

[0368] Step 5:

[0369] The server generates translations using a translation generation method that leverages a generative AI model. In this process, a prompt sentence is formed, prompting the AI model to consider translation and sentiment. An example prompt sentence is used: "Translate the following sentence and consider any emotional cues: 'Why is it so cold?'" The generated translation will reflect nuances that match the user's expectations. The input at this stage is contextual and cultural background information, and the output is the generated translation.

[0370] Step 6:

[0371] The server analyzes the user's emotions through emotion recognition mechanisms. Based on the analysis results, it adjusts the translated text. For example, if the user expresses surprise or dissatisfaction, the server incorporates those nuances into the translated text. In this process, the input is the user's emotional information, and the output is the adjusted translated text.

[0372] Step 7:

[0373] The server transmits the generated translation and supplementary information to the terminal via an information output device. The terminal makes the translation available for the user to review by displaying the translation results. Finally, the user can provide feedback on this translation. The input at this processing stage is the adjusted translation, and the output is the information displayed to the user.

[0374] (Application Example 2)

[0375] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".

[0376] In modern society, there is a growing demand for natural communication between robots and humans within the home. However, conventional technologies have not adequately achieved the ability to provide appropriate translations and explanations based on context and emotions, making it difficult to realize dialogues that accurately reflect the user's intentions. Solving these challenges is essential.

[0377] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0378] In this invention, the server includes data receiving means for receiving language data input by the user, data analysis means for analyzing the received language data, and translation generation means for generating translations and supplementary explanations based on the analysis results and making adjustments that take emotions into consideration. This makes it possible for a home robot to provide natural dialogue that is in tune with the context and the user's emotions.

[0379] A "data receiving means" is a mechanism for receiving language data transmitted by a user from a terminal.

[0380] "Data analysis means" refers to a function that performs a process to analyze the structure and meaning of received linguistic data and to understand the context.

[0381] A "contextual analysis tool" is a mechanism that uses analyzed data to understand the cultural background and specific meanings of language, thereby improving the accuracy of translation.

[0382] The "translation generation method" is a function that generates appropriate translations and supplementary explanations using contextual analysis results, and makes adjustments that take into account the user's emotions.

[0383] A "data presentation method" is a mechanism for providing the generated translation and explanation to the user in audio or visual form.

[0384] A "feedback collection mechanism" is a system that collects user feedback and uses it to improve the system's translation accuracy and user satisfaction.

[0385] "Information protection measures" are mechanisms designed to prevent unauthorized access to or leakage of data when securely transferring received data to an external information processing device.

[0386] The system for implementing this invention is primarily designed for use with household robots. The core of the invention lies in its ability to receive user-inputted natural language data in real time and analyze it appropriately. The system includes the following key functions:

[0387] The home robot receives user speech and text input via a data receiving device. The hardware used is a processor unit equipped with a high-performance speech recognition sensor, specifically utilizing a general-purpose speech technology platform for speech recognition. The received data is securely transferred to a server. During this process, the data is encrypted using information protection measures and designed to prevent unauthorized external access.

[0388] The server analyzes the received data using data analysis tools. A natural language processing engine is used for analysis to understand the format and context of the linguistic data. Next, contextual analysis tools are used to generate appropriate translations and supplementary explanations, taking into account the user's cultural background and emotions. A sentiment analysis module is also incorporated to include emotions in the translation generation process. The generated results are provided to the user via voice or display.

[0389] As a concrete example, if a user asks the robot, "Which fruit is the sweetest?", the server analyzes the input and generates and presents a friendly, emotion-conscious answer such as, "Bananas are generally considered very sweet, but it depends on how ripe the fruit is." Through this process, feedback collection mechanisms ensure that user responses and additional questions are continuously used to improve the system.

[0390] An example of a prompt using a generative AI model is as follows: "Analyze the user's question and, based on contextual understanding, generate a natural translation and a sentiment-sensitive explanation. Question: 'Which fruit is the sweetest?'"

[0391] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0392] Step 1:

[0393] The terminal receives user input as speech or text. The natural language input is converted into text using speech recognition technology if it's speech. This converted text then becomes the input for subsequent processes.

[0394] Step 2:

[0395] The terminal encrypts the received text data and sends it to the server. The encryption process protects the data from unauthorized access. The server receives this encrypted data and decrypts it to obtain analyzable text.

[0396] Step 3:

[0397] The server analyzes the structure of text data using data analysis tools. This process includes grammatical analysis and word semantic analysis, laying the foundation for contextual understanding. The result is structured language data.

[0398] Step 4:

[0399] The server understands the context and cultural background based on the analyzed data, using contextual analysis tools. Cultural elements and emotional information related to the data are then added. This contributes to improving the accuracy of translation and explanation.

[0400] Step 5:

[0401] The server uses translation generation tools to generate context-aware translations and explanations. Leveraging a generation AI model, it obtains output that appropriately reflects emotions. This results in a natural translation that includes supplementary information and nuances.

[0402] Step 6:

[0403] The server sends the generated translation results to the terminal. The terminal presents this to the user either verbally or by displaying it on the screen. The output information is based on the user's emotions and intentions.

[0404] Step 7:

[0405] Users evaluate the provided translations and explanations and provide feedback. This feedback is sent to the server via the device and used by the feedback collection system to improve the entire system.

[0406] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0407] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0408] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0409] [Third Embodiment]

[0410] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0411] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0412] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0413] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0414] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0415] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0416] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0417] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0418] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0419] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0420] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0421] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0422] One embodiment of the present invention begins with the user inputting a Japanese sentence or phrase to be translated into a terminal. The terminal converts the input natural language data into a digital format and securely transmits this data to a server using encryption. The server analyzes the received data using data analysis means and extracts basic information for understanding the context.

[0423] Subsequently, the server uses context analysis to gain a deep understanding of the natural language data, taking into account the meaning of words and their cultural context. Based on this analysis, the server's translation generation system generates an appropriate translation and its supplementary explanation. The generated translation and explanation are then transmitted from the server to the terminal via a data output system and presented to the user.

[0424] As a concrete example, consider a case where a user inputs the phrase "Gokurosama" (Thank you for your hard work). This phrase is generally used by superiors to subordinates, but a direct translation would not fully convey its nuance. The server uses context analysis to recognize this difference and simultaneously provides the user with the translation and explanation: "Thank you for your effort, typically said to someone of lesser status." In this way, the present invention goes beyond mere translation of words and phrases, and can present information that takes into account the underlying meaning and cultural context.

[0425] Furthermore, users can send feedback on the provided translations and explanations to the server via their device. The server utilizes feedback collection methods to accumulate this information and use it to improve the model, thereby enhancing the accuracy of translations and the quality of understanding. In this way, a system is realized that can overcome language barriers between cultures and support smooth communication.

[0426] The following describes the processing flow.

[0427] Step 1:

[0428] The user inputs Japanese sentences or phrases that need translation or understanding into the terminal. The terminal confirms this input and prepares the system to begin analysis.

[0429] Step 2:

[0430] The terminal converts the received natural language data into packet format and encrypts it for secure transmission to the server. The converted data is then sent to the server via the network.

[0431] Step 3:

[0432] The server receives data from the terminal. After verifying the accuracy of the received data, it performs preprocessing of the natural language data using data analysis tools. Here, the sentence structure is analyzed, and unnecessary spaces and grammatical errors are corrected as needed.

[0433] Step 4:

[0434] The context analysis mechanism within the server uses pre-processed data to gain a deep understanding of the text's context. This includes grasping the meaning of words within the text and their underlying cultural background. In this process, the server consults an internal database and leverages past similar cases to perform semantic analysis.

[0435] Step 5:

[0436] The server generates translations using translation generation tools based on the results of context analysis. The translations include supplementary explanations that are easy for foreign users to understand, going beyond simple literal translations.

[0437] Step 6:

[0438] The generated translation and its explanation are sent from the server to the terminal. The terminal decodes this data upon receipt and displays it in a user-readable format.

[0439] Step 7:

[0440] Regarding the translation results, users can send feedback to the server via their device based on their level of understanding and satisfaction. The feedback collection system stores this information and uses it to improve the system and increase accuracy in the future.

[0441] (Example 1)

[0442] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0443] In today's world, with the increasing frequency of intercultural communication, natural language translation is required to go beyond mere word-for-word conversion and encompass a deeper understanding that includes context and cultural background. Furthermore, there is a need for mechanisms to effectively utilize user feedback to improve translation accuracy, and for means of securely exchanging data while protecting personal information.

[0444] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0445] In this invention, the server includes an input receiving means that receives natural language information entered by a user and converts it into a digital format, an encryption means that encrypts the received natural language information to maintain confidentiality, and a context analysis means that understands the context from the analyzed information and grasps the cultural background and specific meanings. This allows the user to securely transmit natural language information to the server and receive an appropriate translation and supplementary explanation that takes cultural background into account.

[0446] An "input receiving means" is a mechanism for receiving natural language information entered by a user into a terminal.

[0447] "Encryption methods" refer to technologies that encrypt received natural language information in order to maintain confidentiality.

[0448] "Communication methods" refer to protocols and technologies for securely transmitting encrypted natural language information to a server.

[0449] "Data analysis means" refers to techniques or tools for analyzing received natural language information to understand the structure and basic meaning of a text.

[0450] "Contextual analysis methods" are technologies that perform processing to understand the background, context, and cultural meaning of analyzed information.

[0451] A "translation generation method" is a technology that generates translations based on the analyzed results and adds cultural supplementary explanations using a generation AI.

[0452] "Data output means" refers to means for displaying the generated translation and supplementary explanations to the user.

[0453] "Feedback collection methods" refer to technologies used to collect user feedback on translations and explanations and utilize it to improve the system.

[0454] A "communication protocol" is a set of rules and procedures used to securely transmit encrypted information to a server.

[0455] This system accurately translates natural language information entered by users and provides supplementary explanations that take into account the cultural background and context. Users input Japanese sentences or phrases they wish to translate using a terminal. The terminal converts this input information into a digital format and encrypts it using encryption methods, thereby maintaining the confidentiality of the information.

[0456] The terminal sends encrypted information to the server using a secure communication protocol. For example, the security of the information is ensured by using the HTTPS protocol. The receiving server analyzes the input information using data analysis tools to understand its grammar and basic meaning.

[0457] Next, the server uses context analysis tools to gain a deep understanding of the information's context. Specifically, it utilizes generative AI models (such as natural language processing models) to analyze the cultural background and specific meanings of the input phrases. Based on this analysis, the translation generation tool generates the optimal translation and adds supplementary explanations as needed.

[0458] Subsequently, the generated translation and supplementary explanations are sent to the terminal using a data output device and presented to the user. The user can review this and provide feedback as needed. The feedback is sent to the server through a feedback collection device and used to improve the model.

[0459] For example, if a user enters the phrase "Gokurosama," the system will not simply provide a direct translation of the words, but will also consider that the expression is generally used by superiors to subordinates, and will generate a translation and explanation such as "Thank you for your effort, typically said to someone of lesser status."

[0460] This allows users to gain valuable information that goes beyond simple translation, leading to a deeper understanding of its meaning. This system can be used to facilitate smoother intercultural communication.

[0461] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0462] Step 1:

[0463] The user inputs natural language sentences or phrases they want to translate into the device. The input information is saved to the device in text format. The device then converts this text data into a digital format and encrypts it using encryption methods. This ensures that the data is protected from unauthorized external access. The input is natural language text data, and the output is encrypted digital data.

[0464] Step 2:

[0465] The terminal sends encrypted digital data to the server using a secure communication protocol (e.g., HTTPS). This protocol ensures the confidentiality and integrity of the data. The input is encrypted digital data, and the output is a secure data transfer to the server.

[0466] Step 3:

[0467] The server decrypts the received encrypted data, returning it to its original digital format. It then uses data analysis tools to analyze the text structure. This analysis identifies grammatical elements and tokens within the text. The input is encrypted digital data, and the output is the analyzed text data.

[0468] Step 4:

[0469] The server deeply understands the context of the text data analyzed using contextual analysis tools. Specifically, it leverages generative AI models to extract the cultural background and specific meanings of the input text. The information obtained through this process is used as the context necessary for translation generation. The input is the analyzed text data, and the output is contextual information.

[0470] Step 5:

[0471] The server uses a translation generation mechanism to generate the optimal translation and its supplementary explanation based on the acquired contextual information. A generative AI model participates in this process to create natural and culturally appropriate translations. The input is contextual information, and the output is the translation result and its supplementary explanation.

[0472] Step 6:

[0473] The server sends the generated translation and supplementary explanations to the terminal via a data output mechanism. The terminal receives this data and renders it in an appropriate format for visual presentation to the user. The input is the translation result and its supplementary explanations, and the output is the visual display on the terminal.

[0474] Step 7:

[0475] If a user has feedback on the presented translation results and explanations, they send that feedback to the server via their device. The server receives this information using feedback collection tools and uses it to improve the translation model and analysis methods. This improves the overall quality of the system. The input is user feedback, and the output is the accumulation of feedback data to the server.

[0476] (Application Example 1)

[0477] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0478] Overcoming language barriers that people from different linguistic and cultural backgrounds face in natural communication is crucial. This challenge cannot be solved without translations that consider context, cultural background, and linguistic nuances, rather than simply providing literal translations. However, current technology suffers from a lack of naturalness and accuracy, particularly in real-time, two-way oral communication.

[0479] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0480] In this invention, the server includes input receiving means for receiving natural language information in the form of voice or text entered by a user, data analysis means for analyzing the format of the received natural language information, and context analysis means for understanding the context based on the analyzed information and grasping the cultural background and unique meaning. This enables natural and highly accurate communication between people with different languages and cultures.

[0481] "Input receiving means" refers to a device or function that has the role of receiving natural language information, such as voice or text, entered by a user.

[0482] "Data analysis means" refers to a device or function for analyzing the format of received natural language information and extracting necessary information.

[0483] A "context analysis tool" is a device or function used to understand context based on analyzed information and to grasp cultural background and unique meanings.

[0484] "Translation generation means" refers to a device or function for generating a translation and its supplementary explanation based on the analyzed results.

[0485] "Data output means" refers to a device or function that is responsible for presenting the generated translation and supplementary explanations to the user.

[0486] "Audio output means" refers to a device or function for outputting translations and supplementary explanations as audio.

[0487] "Means for converting speech to text" refers to a device or function for converting speech information into text.

[0488] "Feedback collection means" refers to a device or function that receives feedback from users and uses it to improve the performance of the system.

[0489] "Encryption means" refers to a device or function that encrypts information in order to securely transfer received natural language information to a server.

[0490] To implement this invention, first, the user inputs natural language information to be translated in voice or text format into a terminal. The terminal is equipped with an input receiving means to receive the input information and processes this information digitally. When using voice input, a voice-to-text conversion means such as the Google Cloud Speech-to-Text API is used to convert the voice information into text data.

[0491] The converted text information is encrypted using AES encryption with Python's cryptography library and securely transmitted to the server. The server analyzes the received information using data analysis tools such as the BERT model from the Transformers library to understand the context, and generates a contextually appropriate translation and supplementary explanation using a translation generation tool such as the DeepL API. The generated information is sent to the terminal by a data output tool and presented to the user, while the generated translation and supplementary explanation are simultaneously output as audio using a speech output tool that utilizes the Google Text-to-Speech API.

[0492] A concrete example would be a user who wants to try to converse with a local while traveling. The user could type "Thank you for your help" and receive a translation that accurately expresses its nuance. The system would generate a translation and explanation such as "Thank you for the care and support you have provided," supporting the conversation in a natural flow. An example of a prompt to the generating AI model in this case would be text like, "Please enter the Japanese sentence or phrase you would like to translate. Then, please provide an appropriate translation and explanation, taking its cultural context into consideration."

[0493] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0494] Step 1:

[0495] The device receives voice or text input from the user. Specifically, the user inputs the phrase they want to translate into voice or text on their smartphone or tablet. In the case of voice input, the Google Cloud Speech-to-Text API is used to convert the voice data into text. At this point, the input is voice or text information, and the output is digital text data.

[0496] Step 2:

[0497] The terminal encrypts the received text information using AES encryption. Specifically, it uses a cryptography library to encrypt the text data and ensure its security. This encrypted text data is then sent to the server.

[0498] Step 3:

[0499] The server receives encrypted data sent from the terminal and decrypts it. Using the decrypted text data as input, it performs data analysis using models such as the BERT model from the Transformers library. Through data analysis, it extracts basic information for context and semantic understanding, and outputs contextual information as the analysis result.

[0500] Step 4:

[0501] The server performs contextual analysis based on the extracted contextual information. This analysis utilizes contextual analysis tools to understand cultural backgrounds and language-specific nuances from the extracted information. At this stage, the input is contextual information, and the output is detailed contextual information derived from further analysis.

[0502] Step 5:

[0503] The server generates translations using translation generation methods such as the DeepL API, based on detailed contextual information, and also creates supplementary explanations. These translations include contextually appropriate expressions. The input is detailed contextual information, and the output is the translated result and its supplementary explanations.

[0504] Step 6:

[0505] The server sends the generated translation and supplementary explanations to the terminal, which displays the translation results on the screen as a data output. Simultaneously, it uses the Google Text-to-Speech API to convert the translated content into speech and play it back. The input is the translation results and supplementary explanations, and the output is the screen display and audio output.

[0506] Step 7:

[0507] The user inputs feedback on the provided translation into the terminal. The terminal sends the feedback information to the server, which stores it using a feedback collection mechanism and uses it to improve context analysis and translation generation. The input in this step is user feedback, and the output is feedback data used for improvement.

[0508] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0509] In an embodiment of the present invention, first, the user inputs Japanese text or phrases they wish to translate or understand using a terminal. This input natural language data is received by the terminal and securely transmitted to a server using encryption. The server analyzes the received data and obtains basic information to facilitate contextual understanding.

[0510] The analyzed data is then analyzed by a context analysis tool on the server to understand its context and cultural background. Subsequently, a translation generation tool generates appropriate translations and supplementary explanations. This process incorporates mechanisms to present the information requested by the user as accurately and naturally as possible.

[0511] Furthermore, the server uses an emotion engine to recognize the user's emotions from the analyzed data. Based on this emotion recognition, the translation and supplementary explanations are adjusted. For example, if the emotion engine detects that the user is surprised or dissatisfied with a particular phrase, it will present a translation result that takes into account an appropriate response to that emotion.

[0512] For example, if a user enters the phrase "Why is it so cold?", the emotion engine will detect feelings of surprise or dissatisfaction. As a result, the generated translation will include an explanation that reflects the emotion, such as "It's surprisingly cold today, isn't it?".

[0513] Finally, the translation and explanation generated by the server are sent to the terminal and displayed to the user. Users can provide feedback on the information provided, and this feedback is collected by a feedback collection mechanism and used to improve the overall system, including the sentiment engine. Continuous system optimization based on feedback makes it possible to provide a more accurate and personalized translation service.

[0514] The following describes the processing flow.

[0515] Step 1:

[0516] The user inputs Japanese text that needs to be translated or understood into the device. The device receives this natural language data and prepares it for processing.

[0517] Step 2:

[0518] The terminal encrypts the entered data to securely transmit it to the server, then packets it and sends it to the server via the network.

[0519] Step 3:

[0520] The server receives data from the terminal for analysis. It checks the format of the received data, parses the sentence structure as a preprocessing step, and performs sorting and necessary transformations.

[0521] Step 4:

[0522] The context analysis mechanism within the server uses the analyzed data to understand the context. At this stage, cultural backgrounds and specific nuances are also grasped, which influence subsequent processing.

[0523] Step 5:

[0524] The emotion engine recognizes the user's emotions from pre-processed data. For example, emotions such as joy, anger, sadness, and happiness can be inferred from the tone of the text and specific keywords.

[0525] Step 6:

[0526] The server uses a translation generation mechanism to generate translations and supplementary explanations that take into account the user's emotional state. Based on the recognized emotion, the most appropriate expression is selected.

[0527] Step 7:

[0528] After formalizing and encrypting the generated translation and supplementary explanations, the server sends this data back to the terminal.

[0529] Step 8:

[0530] The terminal receives and decodes the translation results sent from the server and displays them to the user. The user reviews the translation and enters feedback if necessary.

[0531] Step 9:

[0532] User feedback is sent to the server via the device and collected by feedback collection tools. This feedback is used to improve the sentiment engine and translation accuracy.

[0533] (Example 2)

[0534] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0535] Conventional machine translation systems often fail to adequately consider context and cultural background, and cannot reflect the user's emotions, making it difficult to provide the information the user truly needs. There is a need to overcome this lack of accuracy and unnaturalness, and to provide a translation service that satisfies users.

[0536] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0537] In this invention, the server includes an input receiving means for receiving natural language information entered by the user, an information analysis means for analyzing the form of the received natural language information, a context analysis means for understanding the context based on the analyzed information and grasping the cultural background and unique meaning, a translation generation means for generating translations and supplementary information based on the analyzed results, and an emotion recognition means for analyzing the user's emotions using generational AI technology and adjusting the translated content based on those emotions. This makes it possible to provide a natural and accurate translation service that takes into account the user's context and emotions.

[0538] "Natural language information" refers to data expressed in human language that users input into a computer system.

[0539] "Input receiving means" refers to a function or device for receiving natural language information sent by a user.

[0540] "Information analysis means" refers to functions or technologies for examining and analyzing the form and structure of received natural language information.

[0541] "Contextual analysis means" refers to functions and methods for understanding the context, background, and cultural elements of analyzed information.

[0542] "Translation generation means" refers to a function or technology that generates translations and supplementary information in a natural way based on the results of contextual analysis.

[0543] "Emotion recognition means" refers to functions and technologies that use generative AI technology to analyze a user's emotions and adjust information and services accordingly.

[0544] "Encryption methods" refer to technologies and devices that encrypt data in order to securely protect information received or transmitted.

[0545] "Information output means" refers to methods or devices for providing the generated translation or supplementary explanation to the user.

[0546] "Feedback collection methods" refer to functions or technologies that collect user reactions and opinions and use them to improve the system.

[0547] First, the user uses their device to input natural language information in Japanese that they want to translate or understand. This input information is received by the device and securely transmitted to the server using encryption. Common encryption protocols are used for this encryption.

[0548] The server first analyzes the received information using information analysis tools. Natural language processing techniques are used for this analysis. Specifically, morphological and syntactic analysis are performed to understand the form and structure of the information. At this stage, contextual analysis tools consider the cultural background and emotions of the information to understand its context and grasp its meaning.

[0549] Next, the server uses translation generation tools to generate a translation based on the analyzed context. This process utilizes a generative AI model. The generated translation includes supplementary information for the user. At this stage, the user's emotions are analyzed by emotion recognition tools, and the translation is adjusted accordingly. Specifically, appropriate nuances of emotion are incorporated into the translation.

[0550] For example, if a user types "Why is it so cold?" into their device, the AI model will instruct the engine using the prompt "Translate the following sentence and consider any emotional cues: 'Why is it so cold?'". Based on this instruction, the system will generate an emotionally sensitive translation such as "It's surprisingly cold today, isn't it?".

[0551] Finally, the server sends the generated translation and supplementary explanations to the terminal via an information output device and displays them to the user. The user can provide feedback based on the presented information, and this feedback is used to improve the system through a feedback collection device. In this way, a more accurate and personalized translation service can be provided.

[0552] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0553] Step 1:

[0554] The user uses a terminal to input Japanese natural language information that they wish to translate or understand. This input information is received by the terminal's input receiving mechanism. The received data is prepared for processing as string data. At this stage, the input is the user's natural language sentence, and the output is that sentence data.

[0555] Step 2:

[0556] The terminal encrypts the received natural language information using encryption methods and securely transmits it to the server. Encryption protects the information from being intercepted by third parties. This encrypted data is received by the server and ready for analysis. At this stage, the input is the user's encrypted text data, and the output is the encrypted data transferred to the server.

[0557] Step 3:

[0558] The server analyzes the received data using information analysis tools. First, it decrypts the data by removing the encryption. Next, it performs morphological analysis to break it down into individual words. It uses syntactic analysis to understand the structure of the entire sentence. This analysis extracts grammatical information and semantic information of words, which becomes the basis for the next processing. At this stage, the input is the decrypted text data, and the output is the analyzed grammatical information and semantic information of words.

[0559] Step 4:

[0560] The server uses contextual analysis tools to understand the context and cultural background from the analyzed information. Contextual understanding may involve retrieving relevant information from external databases. This allows the server to grasp the cultural meanings and usage contexts of words and sentences. The input for this process is the grammatical and semantic information obtained in the previous stage, while the output is contextual and cultural background information.

[0561] Step 5:

[0562] The server generates translations using a translation generation method that leverages a generative AI model. In this process, a prompt sentence is formed, prompting the AI model to consider translation and sentiment. An example prompt sentence is used: "Translate the following sentence and consider any emotional cues: 'Why is it so cold?'" The generated translation will reflect nuances that match the user's expectations. The input at this stage is contextual and cultural background information, and the output is the generated translation.

[0563] Step 6:

[0564] The server analyzes the user's emotions through emotion recognition mechanisms. Based on the analysis results, it adjusts the translated text. For example, if the user expresses surprise or dissatisfaction, the server incorporates those nuances into the translated text. In this process, the input is the user's emotional information, and the output is the adjusted translated text.

[0565] Step 7:

[0566] The server transmits the generated translation and supplementary information to the terminal via an information output device. The terminal makes the translation available for the user to review by displaying the translation results. Finally, the user can provide feedback on this translation. The input at this processing stage is the adjusted translation, and the output is the information displayed to the user.

[0567] (Application Example 2)

[0568] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0569] In modern society, there is a growing demand for natural communication between robots and humans within the home. However, conventional technologies have not adequately achieved the ability to provide appropriate translations and explanations based on context and emotions, making it difficult to realize dialogues that accurately reflect the user's intentions. Solving these challenges is essential.

[0570] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0571] In this invention, the server includes data receiving means for receiving language data input by the user, data analysis means for analyzing the received language data, and translation generation means for generating translations and supplementary explanations based on the analysis results and making adjustments that take emotions into consideration. This makes it possible for a home robot to provide natural dialogue that is in tune with the context and the user's emotions.

[0572] A "data receiving means" is a mechanism for receiving language data transmitted by a user from a terminal.

[0573] "Data analysis means" refers to a function that performs a process to analyze the structure and meaning of received linguistic data and to understand the context.

[0574] A "contextual analysis tool" is a mechanism that uses analyzed data to understand the cultural background and specific meanings of language, thereby improving the accuracy of translation.

[0575] The "translation generation method" is a function that generates appropriate translations and supplementary explanations using contextual analysis results, and makes adjustments that take into account the user's emotions.

[0576] A "data presentation method" is a mechanism for providing the generated translation and explanation to the user in audio or visual form.

[0577] A "feedback collection mechanism" is a system that collects user feedback and uses it to improve the system's translation accuracy and user satisfaction.

[0578] "Information protection measures" are mechanisms designed to prevent unauthorized access to or leakage of data when securely transferring received data to an external information processing device.

[0579] The system for implementing this invention is primarily designed for use with household robots. The core of the invention lies in its ability to receive user-inputted natural language data in real time and analyze it appropriately. The system includes the following key functions:

[0580] The home robot receives user speech and text input via a data receiving device. The hardware used is a processor unit equipped with a high-performance speech recognition sensor, specifically utilizing a general-purpose speech technology platform for speech recognition. The received data is securely transferred to a server. During this process, the data is encrypted using information protection measures and designed to prevent unauthorized external access.

[0581] The server analyzes the received data using data analysis tools. A natural language processing engine is used for analysis to understand the format and context of the linguistic data. Next, contextual analysis tools are used to generate appropriate translations and supplementary explanations, taking into account the user's cultural background and emotions. A sentiment analysis module is also incorporated to include emotions in the translation generation process. The generated results are provided to the user via voice or display.

[0582] As a concrete example, if a user asks the robot, "Which fruit is the sweetest?", the server analyzes the input and generates and presents a friendly, emotion-conscious answer such as, "Bananas are generally considered very sweet, but it depends on how ripe the fruit is." Through this process, feedback collection mechanisms ensure that user responses and additional questions are continuously used to improve the system.

[0583] An example of a prompt using a generative AI model is as follows: "Analyze the user's question and, based on contextual understanding, generate a natural translation and a sentiment-sensitive explanation. Question: 'Which fruit is the sweetest?'"

[0584] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0585] Step 1:

[0586] The terminal receives user input as speech or text. The natural language input is converted into text using speech recognition technology if it's speech. This converted text then becomes the input for subsequent processes.

[0587] Step 2:

[0588] The terminal encrypts the received text data and sends it to the server. The encryption process protects the data from unauthorized access. The server receives this encrypted data and decrypts it to obtain analyzable text.

[0589] Step 3:

[0590] The server analyzes the structure of text data using data analysis tools. This process includes grammatical analysis and word semantic analysis, laying the foundation for contextual understanding. The result is structured language data.

[0591] Step 4:

[0592] The server understands the context and cultural background based on the analyzed data, using contextual analysis tools. Cultural elements and emotional information related to the data are then added. This contributes to improving the accuracy of translation and explanation.

[0593] Step 5:

[0594] The server uses translation generation tools to generate context-aware translations and explanations. Leveraging a generation AI model, it obtains output that appropriately reflects emotions. This results in a natural translation that includes supplementary information and nuances.

[0595] Step 6:

[0596] The server sends the generated translation results to the terminal. The terminal presents this to the user either verbally or by displaying it on the screen. The output information is based on the user's emotions and intentions.

[0597] Step 7:

[0598] Users evaluate the provided translations and explanations and provide feedback. This feedback is sent to the server via the device and used by the feedback collection system to improve the entire system.

[0599] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0600] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0601] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0602] [Fourth Embodiment]

[0603] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0604] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0605] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0606] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0607] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0608] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0609] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0610] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0611] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0612] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0613] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0614] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0615] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0616] One embodiment of the present invention begins with the user inputting a Japanese sentence or phrase to be translated into a terminal. The terminal converts the input natural language data into a digital format and securely transmits this data to a server using encryption. The server analyzes the received data using data analysis means and extracts basic information for understanding the context.

[0617] Subsequently, the server uses context analysis to gain a deep understanding of the natural language data, taking into account the meaning of words and their cultural context. Based on this analysis, the server's translation generation system generates an appropriate translation and its supplementary explanation. The generated translation and explanation are then transmitted from the server to the terminal via a data output system and presented to the user.

[0618] As a concrete example, consider a case where a user inputs the phrase "Gokurosama" (Thank you for your hard work). This phrase is generally used by superiors to subordinates, but a direct translation would not fully convey its nuance. The server uses context analysis to recognize this difference and simultaneously provides the user with the translation and explanation: "Thank you for your effort, typically said to someone of lesser status." In this way, the present invention goes beyond mere translation of words and phrases, and can present information that takes into account the underlying meaning and cultural context.

[0619] Furthermore, users can send feedback on the provided translations and explanations to the server via their device. The server utilizes feedback collection methods to accumulate this information and use it to improve the model, thereby enhancing the accuracy of translations and the quality of understanding. In this way, a system is realized that can overcome language barriers between cultures and support smooth communication.

[0620] The following describes the processing flow.

[0621] Step 1:

[0622] The user inputs Japanese sentences or phrases that need translation or understanding into the terminal. The terminal confirms this input and prepares the system to begin analysis.

[0623] Step 2:

[0624] The terminal converts the received natural language data into packet format and encrypts it for secure transmission to the server. The converted data is then sent to the server via the network.

[0625] Step 3:

[0626] The server receives data from the terminal. After verifying the accuracy of the received data, it performs preprocessing of the natural language data using data analysis tools. Here, the sentence structure is analyzed, and unnecessary spaces and grammatical errors are corrected as needed.

[0627] Step 4:

[0628] The context analysis mechanism within the server uses pre-processed data to gain a deep understanding of the text's context. This includes grasping the meaning of words within the text and their underlying cultural background. In this process, the server consults an internal database and leverages past similar cases to perform semantic analysis.

[0629] Step 5:

[0630] The server generates translations using translation generation tools based on the results of context analysis. The translations include supplementary explanations that are easy for foreign users to understand, going beyond simple literal translations.

[0631] Step 6:

[0632] The generated translation and its explanation are sent from the server to the terminal. The terminal decodes this data upon receipt and displays it in a user-readable format.

[0633] Step 7:

[0634] Regarding the translation results, users can send feedback to the server via their device based on their level of understanding and satisfaction. The feedback collection system stores this information and uses it to improve the system and increase accuracy in the future.

[0635] (Example 1)

[0636] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0637] In today's world, with the increasing frequency of intercultural communication, natural language translation is required to go beyond mere word-for-word conversion and encompass a deeper understanding that includes context and cultural background. Furthermore, there is a need for mechanisms to effectively utilize user feedback to improve translation accuracy, and for means of securely exchanging data while protecting personal information.

[0638] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0639] In this invention, the server includes an input receiving means that receives natural language information entered by a user and converts it into a digital format, an encryption means that encrypts the received natural language information to maintain confidentiality, and a context analysis means that understands the context from the analyzed information and grasps the cultural background and specific meanings. This allows the user to securely transmit natural language information to the server and receive an appropriate translation and supplementary explanation that takes cultural background into account.

[0640] An "input receiving means" is a mechanism for receiving natural language information entered by a user into a terminal.

[0641] "Encryption methods" refer to technologies that encrypt received natural language information in order to maintain confidentiality.

[0642] "Communication methods" refer to protocols and technologies for securely transmitting encrypted natural language information to a server.

[0643] "Data analysis means" refers to techniques or tools for analyzing received natural language information to understand the structure and basic meaning of a text.

[0644] "Contextual analysis methods" are technologies that perform processing to understand the background, context, and cultural meaning of analyzed information.

[0645] A "translation generation method" is a technology that generates translations based on the analyzed results and adds cultural supplementary explanations using a generation AI.

[0646] "Data output means" refers to means for displaying the generated translation and supplementary explanations to the user.

[0647] "Feedback collection methods" refer to technologies used to collect user feedback on translations and explanations and utilize it to improve the system.

[0648] A "communication protocol" is a set of rules and procedures used to securely transmit encrypted information to a server.

[0649] This system accurately translates natural language information entered by users and provides supplementary explanations that take into account the cultural background and context. Users input Japanese sentences or phrases they wish to translate using a terminal. The terminal converts this input information into a digital format and encrypts it using encryption methods, thereby maintaining the confidentiality of the information.

[0650] The terminal sends encrypted information to the server using a secure communication protocol. For example, the security of the information is ensured by using the HTTPS protocol. The receiving server analyzes the input information using data analysis tools to understand its grammar and basic meaning.

[0651] Next, the server uses context analysis tools to gain a deep understanding of the information's context. Specifically, it utilizes generative AI models (such as natural language processing models) to analyze the cultural background and specific meanings of the input phrases. Based on this analysis, the translation generation tool generates the optimal translation and adds supplementary explanations as needed.

[0652] Subsequently, the generated translation and supplementary explanations are sent to the terminal using a data output device and presented to the user. The user can review this and provide feedback as needed. The feedback is sent to the server through a feedback collection device and used to improve the model.

[0653] For example, if a user enters the phrase "Gokurosama," the system will not simply provide a direct translation of the words, but will also consider that the expression is generally used by superiors to subordinates, and will generate a translation and explanation such as "Thank you for your effort, typically said to someone of lesser status."

[0654] This allows users to gain valuable information that goes beyond simple translation, leading to a deeper understanding of its meaning. This system can be used to facilitate smoother intercultural communication.

[0655] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0656] Step 1:

[0657] The user inputs natural language sentences or phrases they want to translate into the device. The input information is saved to the device in text format. The device then converts this text data into a digital format and encrypts it using encryption methods. This ensures that the data is protected from unauthorized external access. The input is natural language text data, and the output is encrypted digital data.

[0658] Step 2:

[0659] The terminal sends encrypted digital data to the server using a secure communication protocol (e.g., HTTPS). This protocol ensures the confidentiality and integrity of the data. The input is encrypted digital data, and the output is a secure data transfer to the server.

[0660] Step 3:

[0661] The server decrypts the received encrypted data, returning it to its original digital format. It then uses data analysis tools to analyze the text structure. This analysis identifies grammatical elements and tokens within the text. The input is encrypted digital data, and the output is the analyzed text data.

[0662] Step 4:

[0663] The server deeply understands the context of the text data analyzed using contextual analysis tools. Specifically, it leverages generative AI models to extract the cultural background and specific meanings of the input text. The information obtained through this process is used as the context necessary for translation generation. The input is the analyzed text data, and the output is contextual information.

[0664] Step 5:

[0665] The server uses a translation generation mechanism to generate the optimal translation and its supplementary explanation based on the acquired contextual information. A generative AI model participates in this process to create natural and culturally appropriate translations. The input is contextual information, and the output is the translation result and its supplementary explanation.

[0666] Step 6:

[0667] The server sends the generated translation and supplementary explanations to the terminal via a data output mechanism. The terminal receives this data and renders it in an appropriate format for visual presentation to the user. The input is the translation result and its supplementary explanations, and the output is the visual display on the terminal.

[0668] Step 7:

[0669] If a user has feedback on the presented translation results and explanations, they send that feedback to the server via their device. The server receives this information using feedback collection tools and uses it to improve the translation model and analysis methods. This improves the overall quality of the system. The input is user feedback, and the output is the accumulation of feedback data to the server.

[0670] (Application Example 1)

[0671] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0672] Overcoming language barriers that people from different linguistic and cultural backgrounds face in natural communication is crucial. This challenge cannot be solved without translations that consider context, cultural background, and linguistic nuances, rather than simply providing literal translations. However, current technology suffers from a lack of naturalness and accuracy, particularly in real-time, two-way oral communication.

[0673] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0674] In this invention, the server includes input receiving means for receiving natural language information in the form of voice or text entered by a user, data analysis means for analyzing the format of the received natural language information, and context analysis means for understanding the context based on the analyzed information and grasping the cultural background and unique meaning. This enables natural and highly accurate communication between people with different languages and cultures.

[0675] "Input receiving means" refers to a device or function that has the role of receiving natural language information, such as voice or text, entered by a user.

[0676] "Data analysis means" refers to a device or function for analyzing the format of received natural language information and extracting necessary information.

[0677] A "context analysis tool" is a device or function used to understand context based on analyzed information and to grasp cultural background and unique meanings.

[0678] "Translation generation means" refers to a device or function for generating a translation and its supplementary explanation based on the analyzed results.

[0679] "Data output means" refers to a device or function that is responsible for presenting the generated translation and supplementary explanations to the user.

[0680] "Audio output means" refers to a device or function for outputting translations and supplementary explanations as audio.

[0681] "Means for converting speech to text" refers to a device or function for converting speech information into text.

[0682] "Feedback collection means" refers to a device or function that receives feedback from users and uses it to improve the performance of the system.

[0683] "Encryption means" refers to a device or function that encrypts information in order to securely transfer received natural language information to a server.

[0684] To implement this invention, first, the user inputs natural language information to be translated in voice or text format into a terminal. The terminal is equipped with an input receiving means to receive the input information and processes this information digitally. When using voice input, a voice-to-text conversion means such as the Google Cloud Speech-to-Text API is used to convert the voice information into text data.

[0685] The converted text information is encrypted using AES encryption with Python's cryptography library and securely transmitted to the server. The server analyzes the received information using data analysis tools such as the BERT model from the Transformers library to understand the context, and generates a contextually appropriate translation and supplementary explanation using a translation generation tool such as the DeepL API. The generated information is sent to the terminal by a data output tool and presented to the user, while the generated translation and supplementary explanation are simultaneously output as audio using a speech output tool that utilizes the Google Text-to-Speech API.

[0686] A concrete example would be a user who wants to try to converse with a local while traveling. The user could type "Thank you for your help" and receive a translation that accurately expresses its nuance. The system would generate a translation and explanation such as "Thank you for the care and support you have provided," supporting the conversation in a natural flow. An example of a prompt to the generating AI model in this case would be text like, "Please enter the Japanese sentence or phrase you would like to translate. Then, please provide an appropriate translation and explanation, taking its cultural context into consideration."

[0687] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0688] Step 1:

[0689] The device receives voice or text input from the user. Specifically, the user inputs the phrase they want to translate into voice or text on their smartphone or tablet. In the case of voice input, the Google Cloud Speech-to-Text API is used to convert the voice data into text. At this point, the input is voice or text information, and the output is digital text data.

[0690] Step 2:

[0691] The terminal encrypts the received text information using AES encryption. Specifically, it uses a cryptography library to encrypt the text data and ensure its security. This encrypted text data is then sent to the server.

[0692] Step 3:

[0693] The server receives encrypted data sent from the terminal and decrypts it. Using the decrypted text data as input, it performs data analysis using models such as the BERT model from the Transformers library. Through data analysis, it extracts basic information for context and semantic understanding, and outputs contextual information as the analysis result.

[0694] Step 4:

[0695] The server performs contextual analysis based on the extracted contextual information. This analysis utilizes contextual analysis tools to understand cultural backgrounds and language-specific nuances from the extracted information. At this stage, the input is contextual information, and the output is detailed contextual information derived from further analysis.

[0696] Step 5:

[0697] The server generates translations using translation generation methods such as the DeepL API, based on detailed contextual information, and also creates supplementary explanations. These translations include contextually appropriate expressions. The input is detailed contextual information, and the output is the translated result and its supplementary explanations.

[0698] Step 6:

[0699] The server sends the generated translation and supplementary explanations to the terminal, which displays the translation results on the screen as a data output. Simultaneously, it uses the Google Text-to-Speech API to convert the translated content into speech and play it back. The input is the translation results and supplementary explanations, and the output is the screen display and audio output.

[0700] Step 7:

[0701] The user inputs feedback on the provided translation into the terminal. The terminal sends the feedback information to the server, which stores it using a feedback collection mechanism and uses it to improve context analysis and translation generation. The input in this step is user feedback, and the output is feedback data used for improvement.

[0702] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0703] In an embodiment of the present invention, first, the user inputs Japanese text or phrases they wish to translate or understand using a terminal. This input natural language data is received by the terminal and securely transmitted to a server using encryption. The server analyzes the received data and obtains basic information to facilitate contextual understanding.

[0704] The analyzed data is then analyzed by a context analysis tool on the server to understand its context and cultural background. Subsequently, a translation generation tool generates appropriate translations and supplementary explanations. This process incorporates mechanisms to present the information requested by the user as accurately and naturally as possible.

[0705] Furthermore, the server uses an emotion engine to recognize the user's emotions from the analyzed data. Based on this emotion recognition, the translation and supplementary explanations are adjusted. For example, if the emotion engine detects that the user is surprised or dissatisfied with a particular phrase, it will present a translation result that takes into account an appropriate response to that emotion.

[0706] For example, if a user enters the phrase "Why is it so cold?", the emotion engine will detect feelings of surprise or dissatisfaction. As a result, the generated translation will include an explanation that reflects the emotion, such as "It's surprisingly cold today, isn't it?".

[0707] Finally, the translation and explanation generated by the server are sent to the terminal and displayed to the user. Users can provide feedback on the information provided, and this feedback is collected by a feedback collection mechanism and used to improve the overall system, including the sentiment engine. Continuous system optimization based on feedback makes it possible to provide a more accurate and personalized translation service.

[0708] The following describes the processing flow.

[0709] Step 1:

[0710] The user inputs Japanese text that needs to be translated or understood into the device. The device receives this natural language data and prepares it for processing.

[0711] Step 2:

[0712] The terminal encrypts the entered data to securely transmit it to the server, then packets it and sends it to the server via the network.

[0713] Step 3:

[0714] The server receives data from the terminal for analysis. It checks the format of the received data, parses the sentence structure as a preprocessing step, and performs sorting and necessary transformations.

[0715] Step 4:

[0716] The context analysis mechanism within the server uses the analyzed data to understand the context. At this stage, cultural backgrounds and specific nuances are also grasped, which influence subsequent processing.

[0717] Step 5:

[0718] The emotion engine recognizes the user's emotions from pre-processed data. For example, emotions such as joy, anger, sadness, and happiness can be inferred from the tone of the text and specific keywords.

[0719] Step 6:

[0720] The server uses a translation generation mechanism to generate translations and supplementary explanations that take into account the user's emotional state. Based on the recognized emotion, the most appropriate expression is selected.

[0721] Step 7:

[0722] After formalizing and encrypting the generated translation and supplementary explanations, the server sends this data back to the terminal.

[0723] Step 8:

[0724] The terminal receives and decodes the translation results sent from the server and displays them to the user. The user reviews the translation and enters feedback if necessary.

[0725] Step 9:

[0726] User feedback is sent to the server via the device and collected by feedback collection tools. This feedback is used to improve the sentiment engine and translation accuracy.

[0727] (Example 2)

[0728] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0729] Conventional machine translation systems often fail to adequately consider context and cultural background, and cannot reflect the user's emotions, making it difficult to provide the information the user truly needs. There is a need to overcome this lack of accuracy and unnaturalness, and to provide a translation service that satisfies users.

[0730] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0731] In this invention, the server includes an input receiving means for receiving natural language information entered by the user, an information analysis means for analyzing the form of the received natural language information, a context analysis means for understanding the context based on the analyzed information and grasping the cultural background and unique meaning, a translation generation means for generating translations and supplementary information based on the analyzed results, and an emotion recognition means for analyzing the user's emotions using generational AI technology and adjusting the translated content based on those emotions. This makes it possible to provide a natural and accurate translation service that takes into account the user's context and emotions.

[0732] "Natural language information" refers to data expressed in human language that users input into a computer system.

[0733] "Input receiving means" refers to a function or device for receiving natural language information sent by a user.

[0734] "Information analysis means" refers to functions or technologies for examining and analyzing the form and structure of received natural language information.

[0735] "Contextual analysis means" refers to functions and methods for understanding the context, background, and cultural elements of analyzed information.

[0736] "Translation generation means" refers to a function or technology that generates translations and supplementary information in a natural way based on the results of contextual analysis.

[0737] "Emotion recognition means" refers to functions and technologies that use generative AI technology to analyze a user's emotions and adjust information and services accordingly.

[0738] "Encryption methods" refer to technologies and devices that encrypt data in order to securely protect information received or transmitted.

[0739] "Information output means" refers to methods or devices for providing the generated translation or supplementary explanation to the user.

[0740] "Feedback collection methods" refer to functions or technologies that collect user reactions and opinions and use them to improve the system.

[0741] First, the user uses their device to input natural language information in Japanese that they want to translate or understand. This input information is received by the device and securely transmitted to the server using encryption. Common encryption protocols are used for this encryption.

[0742] The server first analyzes the received information using information analysis tools. Natural language processing techniques are used for this analysis. Specifically, morphological and syntactic analysis are performed to understand the form and structure of the information. At this stage, contextual analysis tools consider the cultural background and emotions of the information to understand its context and grasp its meaning.

[0743] Next, the server uses translation generation tools to generate a translation based on the analyzed context. This process utilizes a generative AI model. The generated translation includes supplementary information for the user. At this stage, the user's emotions are analyzed by emotion recognition tools, and the translation is adjusted accordingly. Specifically, appropriate nuances of emotion are incorporated into the translation.

[0744] For example, if a user types "Why is it so cold?" into their device, the AI model will instruct the engine using the prompt "Translate the following sentence and consider any emotional cues: 'Why is it so cold?'". Based on this instruction, the system will generate an emotionally sensitive translation such as "It's surprisingly cold today, isn't it?".

[0745] Finally, the server sends the generated translation and supplementary explanations to the terminal via an information output device and displays them to the user. The user can provide feedback based on the presented information, and this feedback is used to improve the system through a feedback collection device. In this way, a more accurate and personalized translation service can be provided.

[0746] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0747] Step 1:

[0748] The user uses a terminal to input Japanese natural language information that they wish to translate or understand. This input information is received by the terminal's input receiving mechanism. The received data is prepared for processing as string data. At this stage, the input is the user's natural language sentence, and the output is that sentence data.

[0749] Step 2:

[0750] The terminal encrypts the received natural language information using encryption methods and securely transmits it to the server. Encryption protects the information from being intercepted by third parties. This encrypted data is received by the server and ready for analysis. At this stage, the input is the user's encrypted text data, and the output is the encrypted data transferred to the server.

[0751] Step 3:

[0752] The server analyzes the received data using information analysis tools. First, it decrypts the data by removing the encryption. Next, it performs morphological analysis to break it down into individual words. It uses syntactic analysis to understand the structure of the entire sentence. This analysis extracts grammatical information and semantic information of words, which becomes the basis for the next processing. At this stage, the input is the decrypted text data, and the output is the analyzed grammatical information and semantic information of words.

[0753] Step 4:

[0754] The server uses contextual analysis tools to understand the context and cultural background from the analyzed information. Contextual understanding may involve retrieving relevant information from external databases. This allows the server to grasp the cultural meanings and usage contexts of words and sentences. The input for this process is the grammatical and semantic information obtained in the previous stage, while the output is contextual and cultural background information.

[0755] Step 5:

[0756] The server generates translations using a translation generation method that leverages a generative AI model. In this process, a prompt sentence is formed, prompting the AI model to consider translation and sentiment. An example prompt sentence is used: "Translate the following sentence and consider any emotional cues: 'Why is it so cold?'" The generated translation will reflect nuances that match the user's expectations. The input at this stage is contextual and cultural background information, and the output is the generated translation.

[0757] Step 6:

[0758] The server analyzes the user's emotions through emotion recognition mechanisms. Based on the analysis results, it adjusts the translated text. For example, if the user expresses surprise or dissatisfaction, the server incorporates those nuances into the translated text. In this process, the input is the user's emotional information, and the output is the adjusted translated text.

[0759] Step 7:

[0760] The server transmits the generated translation and supplementary information to the terminal via an information output device. The terminal makes the translation available for the user to review by displaying the translation results. Finally, the user can provide feedback on this translation. The input at this processing stage is the adjusted translation, and the output is the information displayed to the user.

[0761] (Application Example 2)

[0762] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0763] In modern society, there is a growing demand for natural communication between robots and humans within the home. However, conventional technologies have not adequately achieved the ability to provide appropriate translations and explanations based on context and emotions, making it difficult to realize dialogues that accurately reflect the user's intentions. Solving these challenges is essential.

[0764] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0765] In this invention, the server includes data receiving means for receiving language data input by the user, data analysis means for analyzing the received language data, and translation generation means for generating translations and supplementary explanations based on the analysis results and making adjustments that take emotions into consideration. This makes it possible for a home robot to provide natural dialogue that is in tune with the context and the user's emotions.

[0766] A "data receiving means" is a mechanism for receiving language data transmitted by a user from a terminal.

[0767] "Data analysis means" refers to a function that performs a process to analyze the structure and meaning of received linguistic data and to understand the context.

[0768] A "contextual analysis tool" is a mechanism that uses analyzed data to understand the cultural background and specific meanings of language, thereby improving the accuracy of translation.

[0769] The "translation generation method" is a function that generates appropriate translations and supplementary explanations using contextual analysis results, and makes adjustments that take into account the user's emotions.

[0770] A "data presentation method" is a mechanism for providing the generated translation and explanation to the user in audio or visual form.

[0771] A "feedback collection mechanism" is a system that collects user feedback and uses it to improve the system's translation accuracy and user satisfaction.

[0772] "Information protection measures" are mechanisms designed to prevent unauthorized access to or leakage of data when securely transferring received data to an external information processing device.

[0773] The system for implementing this invention is primarily designed for use with household robots. The core of the invention lies in its ability to receive user-inputted natural language data in real time and analyze it appropriately. The system includes the following key functions:

[0774] The home robot receives user speech and text input via a data receiving device. The hardware used is a processor unit equipped with a high-performance speech recognition sensor, specifically utilizing a general-purpose speech technology platform for speech recognition. The received data is securely transferred to a server. During this process, the data is encrypted using information protection measures and designed to prevent unauthorized external access.

[0775] The server analyzes the received data using data analysis tools. A natural language processing engine is used for analysis to understand the format and context of the linguistic data. Next, contextual analysis tools are used to generate appropriate translations and supplementary explanations, taking into account the user's cultural background and emotions. A sentiment analysis module is also incorporated to include emotions in the translation generation process. The generated results are provided to the user via voice or display.

[0776] As a concrete example, if a user asks the robot, "Which fruit is the sweetest?", the server analyzes the input and generates and presents a friendly, emotion-conscious answer such as, "Bananas are generally considered very sweet, but it depends on how ripe the fruit is." Through this process, feedback collection mechanisms ensure that user responses and additional questions are continuously used to improve the system.

[0777] An example of a prompt using a generative AI model is as follows: "Analyze the user's question and, based on contextual understanding, generate a natural translation and a sentiment-sensitive explanation. Question: 'Which fruit is the sweetest?'"

[0778] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0779] Step 1:

[0780] The terminal receives user input as speech or text. The natural language input is converted into text using speech recognition technology if it's speech. This converted text then becomes the input for subsequent processes.

[0781] Step 2:

[0782] The terminal encrypts the received text data and sends it to the server. The encryption process protects the data from unauthorized access. The server receives this encrypted data and decrypts it to obtain analyzable text.

[0783] Step 3:

[0784] The server analyzes the structure of text data using data analysis tools. This process includes grammatical analysis and word semantic analysis, laying the foundation for contextual understanding. The result is structured language data.

[0785] Step 4:

[0786] The server understands the context and cultural background based on the analyzed data, using contextual analysis tools. Cultural elements and emotional information related to the data are then added. This contributes to improving the accuracy of translation and explanation.

[0787] Step 5:

[0788] The server uses translation generation tools to generate context-aware translations and explanations. Leveraging a generation AI model, it obtains output that appropriately reflects emotions. This results in a natural translation that includes supplementary information and nuances.

[0789] Step 6:

[0790] The server sends the generated translation results to the terminal. The terminal presents this to the user either verbally or by displaying it on the screen. The output information is based on the user's emotions and intentions.

[0791] Step 7:

[0792] Users evaluate the provided translations and explanations and provide feedback. This feedback is sent to the server via the device and used by the feedback collection system to improve the entire system.

[0793] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0794] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0795] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0796] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0797] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0798] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0799] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0800] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0801] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0802] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0803] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0804] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0805] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0806] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0807] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0808] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0809] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0810] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0811] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0812] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0813] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0814] The following is further disclosed regarding the embodiments described above.

[0815] (Claim 1)

[0816] An input receiving means for receiving natural language data entered by the user,

[0817] A data analysis means for analyzing the format of the received natural language data,

[0818] Contextual analysis tools that understand the context based on analyzed data and grasp cultural backgrounds and unique meanings,

[0819] Translation generation means for generating translations and supplementary explanations based on the analyzed results,

[0820] A system including data output means for presenting the aforementioned translation and supplementary explanations.

[0821] (Claim 2)

[0822] The system according to claim 1, further comprising a feedback collection means for receiving user feedback and improving the performance of the context analysis means and the translation generation means.

[0823] (Claim 3)

[0824] The system according to claim 1, further comprising encryption means for securely transferring natural language data received by the input receiving means to a server.

[0825] "Example 1"

[0826] (Claim 1)

[0827] An input receiving means for receiving natural language information entered by the user,

[0828] The received natural language information is converted into a digital format, and encryption means are used to maintain confidentiality.

[0829] A means of communication for securely transmitting encrypted natural language information to a server,

[0830] A data analysis means for analyzing received natural language information,

[0831] Contextual analysis methods that understand the context from the analyzed information and grasp the cultural background and unique meanings,

[0832] A translation generation means that generates a translation and its supplementary explanation using a generative AI model based on the analyzed results,

[0833] A system including a data output means for displaying the generated translation and supplementary explanations on a terminal.

[0834] (Claim 2)

[0835] The system according to claim 1, further comprising a feedback collection means for receiving evaluations from users and improving the performance of the context analysis means and the translation generation means.

[0836] (Claim 3)

[0837] The system according to claim 1, comprising a secure communication protocol used when transmitting encrypted natural language information to a server in order to ensure the security of communications.

[0838] "Application Example 1"

[0839] (Claim 1)

[0840] An input receiving means for receiving natural language information, either voice or text, entered by the user,

[0841] A data analysis means for analyzing the format of the received natural language information,

[0842] Contextual analysis tools that understand the context based on analyzed information and grasp cultural backgrounds and unique meanings,

[0843] Translation generation means for generating translations and supplementary explanations based on the analyzed results,

[0844] Data output means for presenting the aforementioned translation and supplementary explanations,

[0845] Audio output means for outputting the aforementioned translation and supplementary explanations as audio,

[0846] A system including a means for converting the aforementioned audio information into text.

[0847] (Claim 2)

[0848] The system according to claim 1, further comprising a feedback collection means for receiving user feedback and improving the performance of the context analysis means and the translation generation means.

[0849] (Claim 3)

[0850] The system according to claim 1, further comprising encryption means for securely transferring natural language information received by the input receiving means to a server.

[0851] "Example 2 of combining an emotion engine"

[0852] (Claim 1)

[0853] An input receiving means for receiving natural language information entered by the user,

[0854] Information analysis means for analyzing the form of the received natural language information,

[0855] Contextual analysis tools that understand the context based on analyzed information and grasp cultural background and unique meanings,

[0856] Translation generation means for generating translations and supplementary information based on the analyzed results,

[0857] An emotion recognition means that uses generative AI technology to analyze the user's emotions and adjust the translated content based on those emotions,

[0858] A system including information output means for presenting the aforementioned translation and supplementary explanations.

[0859] (Claim 2)

[0860] The system according to claim 1, further comprising a feedback collection means for receiving user feedback and improving the performance of the contextual analysis means and the translation generation means.

[0861] (Claim 3)

[0862] The system according to claim 1, further comprising encryption means for securely transferring natural language information received by the input receiving means to a computing device.

[0863] "Application example 2 when combining with an emotional engine"

[0864] (Claim 1)

[0865] A data receiving means for receiving language data entered by the user,

[0866] A data analysis means for analyzing the format of the received language data,

[0867] Contextual analysis tools for understanding context based on analyzed data and grasping cultural background and unique meanings,

[0868] A translation generation means that generates translations and supplementary explanations based on the analyzed results and makes adjustments that take emotions into consideration,

[0869] A system including data presentation means for presenting the aforementioned translation and supplementary explanations, and for providing audio or visual output.

[0870] (Claim 2)

[0871] The system according to claim 1, further comprising a feedback collection means for receiving user feedback and improving the performance of the contextual analysis means and the translation generation means.

[0872] (Claim 3)

[0873] The system according to claim 1, further comprising information protection means for securely transferring language data received by the data receiving means to an information processing device. [Explanation of symbols]

[0874] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. An input receiving means for receiving natural language information, either voice or text, entered by the user, A data analysis means for analyzing the format of the received natural language information, Contextual analysis tools that understand the context based on analyzed information and grasp cultural backgrounds and unique meanings, Translation generation means for generating translations and supplementary explanations based on the analyzed results, Data output means for presenting the aforementioned translation and supplementary explanations, Audio output means for outputting the aforementioned translation and supplementary explanations as audio, A system including a means for converting the aforementioned audio information into text.

2. The system according to claim 1, further comprising a feedback collection means for receiving user feedback and improving the performance of the context analysis means and the translation generation means.

3. The system according to claim 1, further comprising encryption means for securely transferring natural language information received by the input receiving means to a server.