system

The system addresses language barriers and emergencies through real-time voice translation, personalized travel planning, and anomaly detection, ensuring effective communication and flexible responses during travel.

JP2026100624APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing travel technologies lack responsiveness and flexibility in addressing language barriers, information shortages, and sudden emergencies, making it difficult for travelers to communicate effectively and adapt to unexpected situations.

Method used

A system equipped with a translation device for real-time voice-to-text conversion, personalized travel planning based on historical data, and anomaly detection for emergency response, utilizing a server and terminal devices to facilitate seamless communication and quick adjustments.

Benefits of technology

Enables travelers to communicate across language barriers, create tailored travel plans, and respond effectively to emergencies, enhancing the overall travel experience by providing personalized and flexible support.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100624000001_ABST
    Figure 2026100624000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A means of converting voice input to text in real time, A means comprising a translation device for translating the aforementioned text into multiple languages, Means for displaying or outputting the translated text as audio on the user interface, A means of analyzing travelers' historical data to generate personalized travel plans, A means of detecting travel-related emergencies early using an anomaly detection algorithm and proposing alternative solutions, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0005]

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] The present invention aims to solve the problems that there is a lack of responsiveness and flexibility in dealing with language barriers, information shortages during travel, and sudden emergencies. The purpose is to enable travelers to communicate with confidence in a foreign country, make personalized travel plans, and appropriately respond to unexpected situations.

Means for Solving the Problems

[0005] The present invention first provides a means equipped with a translation device that converts voice input into text in real time and translates that text into multiple languages. Furthermore, it includes means for automatically generating personalized travel plans by analyzing the traveler's history data. In addition, it solves these problems by providing means for early detection of travel-related emergencies using an anomaly detection algorithm and for quickly providing the user with alternative solutions.

[0006] "Voice input" is an input method that processes the voice spoken by the user as digital data.

[0007] "Converting to text" refers to the process of converting audio data into text data.

[0008] A "translation device" refers to a hardware or software system that has the function of converting one source language into another language.

[0009] A "user interface" is a means for a system and a user to interact with each other, and includes input and output devices.

[0010] "Historical data" refers to data about a user's past behavior and preferences, and is a collection of information that is analyzed.

[0011] A "personalized travel plan" refers to a plan that provides customized travel itineraries and suggestions based on the user's historical data and current preferences.

[0012] An "anomaly detection algorithm" is a computational method for detecting data or events that deviate from normal patterns.

[0013] An "emergency situation" refers to an unexpected and immediate situation that may occur during travel that requires immediate attention from the user.

[0014] An "alternative plan" refers to other options or action plans presented in case the original plan cannot be implemented.

Brief Description of the Drawings

[0015] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Modes for Carrying Out the Invention

[0016] Next, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0017] First, the terms used in the following description will be explained.

[0018] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0019] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0020] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0023] [First Embodiment]

[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0036] This invention provides a system that enables travelers to communicate across language barriers, create individually optimized travel plans, and respond quickly to emergencies. Specific embodiments are described below.

[0037] First, the user launches an application installed on their smartphone. The device displays the home screen, and the user can choose from functions such as language translation, travel planning, or emergency response.

[0038] For example, if the multilingual translation function is selected, the user provides voice input. The device captures this voice data and converts it into text data using speech recognition technology. This text data is then sent to a server and translated into the user's chosen language using a multilingual translation device. The translated text is returned to the device and displayed on the screen or output as audio to the user. For example, when a Japanese user orders at a German restaurant, they can speak into the app and have it translated into English or German.

[0039] Next, in the personalized travel planning feature, the user enters their travel destination and activities of interest into the device. The device sends this information to a server, where a machine learning algorithm analyzes the user's past history and preferences. As a result, the server generates an optimal travel plan and sends it back to the device, suggesting it to the user. For example, if the user is interested in art in France, local museums and art festivals will be suggested.

[0040] Finally, in the emergency response function, the server continuously monitors flight status and weather, and promptly notifies the user if an anomaly is detected. The terminal receives this notification and provides the user with detailed alternatives, including alternative routes and accommodations. For example, if a flight is delayed, the terminal can immediately suggest train or bus arrangements, allowing the user to make a choice for quick and effective travel.

[0041] As described above, this embodiment provides comprehensive support to enable travelers to enjoy their trips with peace of mind.

[0042] The following describes the processing flow.

[0043] Step 1:

[0044] The user launches the application on their smartphone. The home screen appears, displaying options for language translation, travel planning, and emergency response.

[0045] Step 2:

[0046] The user selects the language translation function and presses the microphone button to input the text they want to translate by voice.

[0047] Step 3:

[0048] The device records the audio data, activates the speech recognition engine to convert the audio into text, and prepares to send this converted text to the server.

[0049] Step 4:

[0050] The server inputs the received text into a multilingual translation API and translates it into the specified target language in real time.

[0051] Step 5:

[0052] The server sends the translated text back to the terminal. The receiving terminal displays the result on its user interface and provides audio output as needed.

[0053] Step 6:

[0054] The user selects the travel planning function, enters their destination and activities of interest, and specifies detailed conditions.

[0055] Step 7:

[0056] The terminal sends the entered travel information to the server. The server refers to the accumulated historical data and the user's profile, and analyzes the data using machine learning algorithms.

[0057] Step 8:

[0058] Based on the analysis results, the server automatically generates an optimal travel plan tailored to the user and sends the result to the terminal.

[0059] Step 9:

[0060] The terminal displays the generated travel plan to the user and provides an interface for viewing the plan details.

[0061] Step 10:

[0062] The server monitors external information sources and collects real-time data on flights and weather.

[0063] Step 11:

[0064] If an anomaly is detected, the server activates the anomaly detection algorithm and generates an emergency notification.

[0065] Step 12:

[0066] The server sends an alternative solution, including a notification, to the device. The device then notifies the user of this information and presents the appropriate course of action.

[0067] Step 13:

[0068] When a problem arises, users evaluate alternatives and decide on an action based on the available options.

[0069] (Example 1)

[0070] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0071] When travelers visit countries with different natural languages, they may encounter difficulties in smooth communication and in planning their trips and responding flexibly to emergencies. Traditional technologies have lacked the means to comprehensively and efficiently address these challenges. In particular, there is a need for the generation of individually optimized travel plans and rapid responses to emergencies.

[0072] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0073] In this invention, the server includes a device that converts voice information into text information, a conversion device that translates text information into multiple natural languages, and a processing device that analyzes the traveler's history information and generates a personalized travel plan. This enables smooth communication across language barriers, the provision of optimal travel plans tailored to the user's needs, and a quick and flexible response to emergencies.

[0074] A "device that converts audio information into text information" is a device that analyzes information input via voice and converts its content into digital text information.

[0075] A "text information translation device for multiple natural languages" is a device that has the function of automatically converting input text information into different natural languages ​​selected by the user.

[0076] A "device that displays translated text information on a user interface or presents it audibly" is a device that has the function of visually displaying the translated result or outputting it audibly.

[0077] A "processing device that analyzes travelers' historical information and generates personalized travel plans" is a device that analyzes travelers' past behavioral data and preferences and creates individual travel plans accordingly.

[0078] An "algorithm for detecting anomalies" is a computational processing method that analyzes data acquired in real time to detect travel-related problems and malfunctions at an early stage.

[0079] A "processing device that presents alternative plans" is a device that has the function of planning and presenting alternative actions or means to the user in response to detected anomalies.

[0080] A "processing device that generates the optimal response from input information using a generation algorithm" is a device equipped with an algorithm that analyzes information received from a user and automatically creates an appropriate and effective response.

[0081] This invention provides a system that enables travelers to communicate smoothly across language barriers, create personalized travel plans, and respond quickly in emergencies.

[0082] The system is primarily composed of three elements: servers, terminals, and users.

[0083] The terminal is a mobile information terminal such as a smartphone or tablet, and the user interacts with the system through this device. The terminal is equipped with speech recognition software that converts speech information into text information; specifically, it is possible to convert speech to text using a general speech recognition API, for example. The translated text information is then displayed on the screen or output as speech using speech synthesis technology.

[0084] The server handles complex data processing and includes a translation device for translating text information into multiple natural languages. Cloud translation services are used here to expedite the translation process. Furthermore, the server includes a generative AI model to analyze the user's past history and generate personalized travel plans. This ensures that the user receives an optimal travel plan tailored to their preferences. In addition, the server incorporates algorithms to detect travel-related anomalies, enabling early detection and the suggestion of countermeasures.

[0085] When a user speaks into the device, it generates a prompt, such as "I want to plan an art tour in Spain. What are some recommended museums and events?", and sends it to the server. The server uses this information to create an optimal travel plan and returns it to the device. As a result, the user can plan their trip and make necessary reservations based on that information.

[0086] These systems not only solve communication problems in countries with diverse cultural backgrounds, but also provide support tailored to the individual needs of each traveler.

[0087] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0088] Step 1:

[0089] The user launches the application using a device such as a smartphone. The device displays the app's home screen and prompts the user to select a menu item such as language translation, travel planning, or emergency response. The menu screen displayed here corresponds to the user's selection.

[0090] Step 2:

[0091] The user selects the language translation function and performs voice input. The device receives the voice data as input and converts it into text information using its internal speech recognition software. The converted text is sent from the device to the server. This results in the output of the converted voice data into text data.

[0092] Step 3:

[0093] The server sends the received text information to a multilingual translation service, where it is translated into the specified language. The translation service used utilizes a cloud-based API. The translated text information is then output and returned from the server to the terminal.

[0094] Step 4:

[0095] The device receives the translated text information and presents it to the user. Presentation methods include displaying the text on the screen or playing it back as audio using speech synthesis technology. The user confirms it and continues communication. The translation result is then output to the user.

[0096] Step 5:

[0097] The user selects the travel planning function and enters the places they want to visit and the activities they are interested in into the device. The device sends this information to the server. The entered information includes the user's travel destination and hobbies.

[0098] Step 6:

[0099] The server generates appropriate travel plans using a generative AI model based on the input information. This involves analyzing the user's past history and current preferences to output a personalized plan.

[0100] Step 7:

[0101] The server sends the generated travel plan to the terminal and proposes it to the user. The terminal displays this plan as a list on the screen, allowing the user to select and decide. This helps the user actually put their travel plan into action.

[0102] Step 8:

[0103] If the emergency response function is selected, the server monitors travel-related information in real time and immediately notifies the user if an anomaly is detected. This includes information on weather and transportation delays. The notification will include suggestions for alternative routes and accommodations. The server uses an algorithm to identify anomaly information in order to detect it.

[0104] (Application Example 1)

[0105] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0106] When travelers visit different cultural regions, language barriers and a lack of transportation information can make smooth travel and effective communication difficult. Furthermore, it can be challenging to respond appropriately in emergencies, often causing anxiety for travelers. Therefore, improving the quality of travel is essential.

[0107] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0108] In this invention, the server includes means for converting voice input to text in real time, means equipped with a translation device for translating into multiple languages, and means for displaying or outputting the translated information on a user interface. This enables smooth communication in different cultural contexts and facilitates travel within cities visited by tourists.

[0109] "A means of converting voice input to text in real time" refers to a technology that instantly converts voice data into text information, making it available for subsequent processing.

[0110] "Means equipped with a translation device" refers to a technical configuration that has the function of converting text from one language to another language.

[0111] "Means of displaying or outputting audio on a user interface" refers to technical methods for providing information to users visually or aurally.

[0112] "Means for analyzing travelers' historical information and generating personalized travel plans" refers to technology for creating optimal travel plans for each traveler based on past behavioral data.

[0113] "A means of detecting travel-related emergencies early using anomaly detection algorithms and proposing alternative solutions" refers to analytical techniques for quickly providing countermeasures in response to unexpected situations.

[0114] "Means of acquiring public transport information within a city and translating and presenting it in the user's native language" refers to a method of collecting data on public transport in a city, converting it into a language understandable to the user, and providing it to the user.

[0115] The system for realizing this invention consists of a smartphone, a server, and network communication. Users can obtain various information through voice input using a dedicated application on their smartphone. This application uses speech recognition technology to convert voice data into text data in real time. Typically, this process uses speech recognition software on the smartphone (e.g., Google® Speech-to-Text API).

[0116] The converted text data is sent to a server via the internet. The server is equipped with a translation device (e.g., Google Translate API) to translate the text into various languages, converts the text to the user's desired language, and sends it to the smartphone. This information is either displayed visually on the smartphone's user interface or provided as audio through a voice output device.

[0117] Furthermore, this system analyzes data, including the user's travel history, on a server. Using machine learning algorithms, it generates personalized travel plans and presents the most suitable options. It can also collect public transport data within cities in real time, translate it into the user's native language, and present it to them. This allows users to navigate smoothly within their destination cities.

[0118] Furthermore, the server uses anomaly detection algorithms to promptly detect various emergencies related to travelers and suggest alternative solutions as needed. This requires the acquisition and analysis of real-time data such as traffic and weather information, enabling travelers to respond flexibly.

[0119] For example, when a user uses public transport in a city they are visiting for the first time, scanning the QR code (registered trademark) at a bus stop with their smartphone will display information on available transportation options and scheduled times in their native language. Furthermore, if their travel schedule changes, the system will suggest alternative tourist destinations and accommodations.

[0120] An example of a prompt message is, "Please suggest activities that the user would like to do while sightseeing in Tokyo. The user is interested in art."

[0121] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0122] Step 1:

[0123] The user launches the smartphone app and begins voice input. The app receives voice data as input and converts it into text data using speech recognition software on the smartphone. This data conversion generates text extracted from the voice.

[0124] Step 2:

[0125] The terminal sends the converted text data to the server via the internet. The server receives the text data as input and translates it into the specified language using its translation device. As part of the data processing, a translation API is used to generate organized text data.

[0126] Step 3:

[0127] The server sends the translated text back to the terminal. The terminal receives this translation and either displays it on the user interface or plays it back as audio through a speech output device. This allows the user to obtain information in their native language.

[0128] Step 4:

[0129] The user provides past travel history information, which is sent from the device to the server. The server, receiving the history data as input, uses a machine learning algorithm to generate a personalized travel plan and sends the result to the device. This allows the user to be offered a travel plan that is suitable for them.

[0130] Step 5:

[0131] The device scans a QR code containing public transport information obtained within the city. It receives the QR code information as input and sends that data to a server. The server retrieves real-time traffic information, translates it into the specified language, and then sends it back to the device. This allows the user to obtain information that will help them travel smoothly in their destination.

[0132] Step 6:

[0133] The server uses an anomaly detection algorithm to detect travel-related emergencies early. It receives current travel status and weather information as input and detects anomalies through data calculations. As a result, it generates necessary alternatives and sends them to the terminal. This allows users to continue their travels smoothly even in emergencies.

[0134] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0135] This invention provides a system incorporating an emotion engine to optimize the user experience during travel. In addition to real-time speech-to-text conversion and multilingual translation, the system recognizes the user's emotions and provides personalized suggestions based on them. Furthermore, in the event of an emergency, it provides support adapted to the user's emotional state.

[0136] First, the user launches the app on their device. The application includes translation, travel planning, and emergency response functions, which the user can freely choose from.

[0137] For example, in a language translation function with an integrated emotion engine, when a user communicates a message via voice input, the device converts this voice into text and sends it to the server. The server translates the text and also performs emotion analysis, adjusting the translation result to reflect the user's emotions. For instance, if the user is feeling anxious, the translation result might be rephrased to be more polite and reassuring.

[0138] Next, in the travel planning function, the device sends data to the server based on the user's input. The server uses machine learning algorithms to analyze the user's preferences and history. Furthermore, an emotion engine takes the user's emotional state into consideration, and an optimal travel plan tailored to those emotions is generated and sent back. For example, if the emotion of wanting to relax is detected, suggestions will focus on activities that will help the user refresh.

[0139] In the emergency response function, the server monitors real-time data and notifies the terminal if an anomaly is detected. At this time, the emotion engine evaluates the user's stress level and suggests countermeasures appropriate to the user's state. For example, if the user is extremely stressed, it will help them clearly explain the situation, along with providing instructions on how to quickly contact the support center.

[0140] In this way, by utilizing the emotion engine, users can enjoy a more comfortable and safer travel experience.

[0141] The following describes the processing flow.

[0142] Step 1:

[0143] The user launches an application with an integrated emotion engine on their smartphone. Options for translation, travel planning, and emergency response are displayed on the home screen.

[0144] Step 2:

[0145] The user selects the translation function and sets the language they want to translate into and their preference for sentiment analysis.

[0146] Step 3:

[0147] The user taps the microphone button and inputs the text they want to translate by voice.

[0148] Step 4:

[0149] The device records the audio as data and converts it into text using a speech recognition engine.

[0150] Step 5:

[0151] The terminal sends the converted text data to the server.

[0152] Step 6:

[0153] The server uses a translation device to translate text into multiple specified languages, while simultaneously analyzing the user's emotions through an emotion engine.

[0154] Step 7:

[0155] Based on the analysis results and translated text, the server adjusts the translation results according to the user's sentiment and corrects them to more appropriate expressions.

[0156] Step 8:

[0157] The server sends the edited text to the terminal.

[0158] Step 9:

[0159] The terminal displays the received translation results on the user interface or outputs them as audio to the user.

[0160] Step 10:

[0161] The user selects the travel planning function and enters their destination and interests into the device.

[0162] Step 11:

[0163] The terminal sends user information and input data to the server and requests analysis in conjunction with the emotion engine.

[0164] Step 12:

[0165] Based on the received data, the server uses machine learning to generate travel plans that are based on history and emotional state.

[0166] Step 13:

[0167] The server sends the generated travel plan to the terminal and adjusts the display order and plan based on the user's emotions.

[0168] Step 14:

[0169] In the event of an emergency, the server analyzes real-time data to detect anomalies.

[0170] Step 15:

[0171] The server considers the user's emotions, generates the optimal response through an emotion engine, and sends it to the terminal.

[0172] Step 16:

[0173] The device notifies the user of the generated information and provides emotionally responsive support and suggestions for the next course of action.

[0174] (Example 2)

[0175] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0176] There is a need to eliminate the inconveniences travelers face due to language barriers and sudden schedule changes, and to provide a more comfortable and personalized travel experience. Furthermore, when responding to emergencies, flexible responses that take into account the user's psychological state are crucial. However, current technology has not been able to comprehensively address these challenges.

[0177] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0178] In this invention, the server includes means for converting voice signals into digital text information, means for providing a device for converting the digital text information into various languages, and means for analyzing the user's psychological state and making emotion-based adjustments to the generated plan and converted information. This provides users with a sense of security that transcends language barriers, and enables a personalized travel experience and flexible emergency response that takes into account the user's psychological state.

[0179] An "audio signal" is a signal obtained by converting sound into an electrical signal, making it usable for digital processing and communication.

[0180] "Digital text information" refers to information such as audio that has been converted into text, and is in a format that can be processed and displayed by a computer.

[0181] A "conversion device" is a device or software used to convert data in one format to data in another format.

[0182] A "display device" is a device used to visually display information in digital format, and includes screens and monitors.

[0183] "Users" refer to individuals who utilize this system and wish to have a comfortable travel experience.

[0184] "History information" refers to recorded data about a user's past actions and choices, and is used to provide personalized services.

[0185] An "anomaly detection algorithm" is a mathematical or computational method for detecting phenomena that deviate from the normal state.

[0186] "Psychological state" refers to the user's emotions and mental condition, and is a factor that influences the services provided by the system.

[0187] "Flexible response" refers to adaptive actions and measures that can be changed according to the user's condition and circumstances.

[0188] This invention is a system that reduces language barriers for travelers and provides an individually optimized travel experience. The system operates through an application installed on the user's device during travel. Specific embodiments of the system are described below.

[0189] First, the user launches the application on a mobile device or tablet. This application includes key functions such as voice input, language conversion, travel planning, and emergency response.

[0190] The device utilizes speech recognition software (such as the Google Speech-to-Text API) to convert audio signals into digital text. This allows for real-time conversion of user speech into text. The converted text is then sent from the device to the server.

[0191] The server utilizes a language conversion device (e.g., DeepL API) to convert textual information into a defined language. It also employs sentiment analysis software (e.g., Google Cloud Natural Language API) to recognize and analyze user emotions. Based on the results of this sentiment analysis, the server adjusts the converted language information to match the user's emotional state.

[0192] In the travel planning function, one of the key features, the server analyzes data using a machine learning algorithm that has learned the user's preferences based on past travel history, and generates a personalized travel plan. This plan takes into account the user's emotional state; for example, if the user requests relaxation, it will suggest activities such as relaxation facilities.

[0193] In emergency situations, the server monitors various types of information in real time. When an abnormal situation is detected, it quickly sends a notification to the terminal, assesses the user's mental state, and then suggests appropriate countermeasures.

[0194] A concrete example of this prompt is: "Explain the best translated response for when a user feels anxious while traveling, and how that response improves the user's feelings."

[0195] This enables a system that combines language translation, personalized travel planning, and consideration of psychological state to provide a more comfortable and fulfilling travel experience. In this embodiment, the system overcomes language barriers in real time, reduces traveler stress, and enables safer travel.

[0196] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0197] Step 1:

[0198] The user launches the application on the device. The device provides interfaces for voice input, language conversion, travel planning, and emergency response. The user selects the necessary function. Based on this input, the application proceeds to the next processing step.

[0199] Step 2:

[0200] If the user selects voice input, the device uses its microphone to collect the user's voice. This voice signal is treated as input and converted into text information using speech recognition software. Specifically, the Google Speech-to-Text API is used to convert the voice signal into text data. This converted text data is then generated as output and sent to the server.

[0201] Step 3:

[0202] The server takes text data received from the terminal as input and converts it to the specified language using the DeepL API. The converted text is then used with the Google Cloud Natural Language API to perform sentiment analysis. Based on the analysis results, the text is adjusted to reflect emotions, and the final translation is generated as output.

[0203] Step 4:

[0204] The terminal receives the final translation result sent from the server. This translation result is either displayed to the user visually or output via speech synthesis. The user can then use this translation to communicate.

[0205] Step 5:

[0206] If a user selects the travel planning function, the device collects travel-related preferences and history from the user as input. This data is sent to a server for analysis using machine learning algorithms. Based on the analysis results, a personalized travel plan tailored to the user's preferences and emotions is generated and provided as output.

[0207] Step 6:

[0208] The server monitors for emergencies during travel using real-time data transmitted from the terminal. It utilizes an anomaly detection algorithm, receiving anomalies as input and generating appropriate response suggestions as output. These suggestions take into account the user's current psychological state.

[0209] Step 7:

[0210] The terminal receives and displays emergency notifications and appropriate response suggestions from the server to the user. This allows the user to quickly implement the response. Throughout this entire process, the system can respond to user prompts through a generated AI model.

[0211] (Application Example 2)

[0212] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0213] Current travel support systems often fail to provide a sense of security or satisfaction because they deliver information without considering the user's emotional state. Furthermore, language barriers and emergency situations can lead to inappropriate information being provided, potentially causing stress. There is a need to address these issues and provide a more personalized travel experience.

[0214] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0215] In this invention, the server includes means for converting voice input into text in real time, means for recognizing emotional states and adjusting information based on those emotions, and means for providing emotionally tailored information in public transportation and tourist spots. This makes it possible to provide detailed information that responds to the user's emotions.

[0216] "A means of converting voice input to text in real time" refers to a technology that has the function of instantly converting the voice spoken by the user into text data.

[0217] "Means equipped with a translation device for translating into multiple languages" refers to a technology that has the ability to instantly translate text data into different specified languages.

[0218] "Means of displaying or outputting audio on a user interface" refers to technologies that have the function of conveying translated text to the user via a screen or speaker.

[0219] "Methods for analyzing travelers' historical information and generating personalized travel plans" refers to technologies that analyze data on travelers' past behavior and preferences and create customized travel plans based on that data.

[0220] "A means of detecting travel-related emergencies early using anomaly detection algorithms and proposing alternative solutions" refers to a technology that quickly identifies potential risks and anomalies during travel and proposes appropriate countermeasures.

[0221] "Means of recognizing emotional states and adjusting information based on those emotions" refers to technologies that analyze the user's emotions and optimize the content and expression of information based on the results.

[0222] "Means of providing information tailored to emotions in public transportation and tourist attractions" refers to technologies that provide information related to public transportation and tourist attractions in a way that is appropriate to the user's emotions.

[0223] This invention is a system that personalizes the user's travel experience, making it safer and more comfortable, through an application running on a smart device. The main elements for realizing this system and their operation are described below.

[0224] First, the user launches an application installed on their smart device. This application has a function to convert speech to text in real time, and uses the Google Cloud Speech-to-Text API to enable fast and accurate conversion. The converted text data is then translated into multiple languages ​​by the Google Cloud Translation API. At the same time, IBM Watson® Tone Analyzer is used to analyze the user's emotional state from the speech content. The results of this emotion analysis are used to adjust the translation results and optimize the information provided.

[0225] A serverless architecture utilizing AWS® Lambda is employed to generate appropriate travel plans and information based on users' past travel data and current emotional states. This enables the personalization of travel plans and allows for personalized suggestions using user history information recorded in Amazon DynamoDB. Information on public transportation and tourist attractions is provided to users based on real-time data, with particular emphasis on emotional customization.

[0226] For example, if a user feels anxious while riding the subway, the app recognizes that emotion and immediately displays a follow-up message such as, "A recommended tourist spot where you can relax is XX. Enjoy yourself." An example of a prompt to the generating AI model in this case would be, "Based on the user's current emotional state, please provide reassuring information."

[0227] In this way, this system combines speech recognition, translation, and sentiment analysis to provide comprehensive support for users' travel experiences.

[0228] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0229] Step 1:

[0230] When a user launches the application on their smart device, it is ready to begin voice input. Once the user's voice data is input, the device calls the Google Cloud Speech-to-Text API to convert the voice data into text. The converted text is then output to the device.

[0231] Step 2:

[0232] The text data obtained from the device is sent to the server. The server uses the Google Cloud Translation API to translate the text data into the specified languages. The translated results are sent back from the server to the device, and the device displays the translated text to the user.

[0233] Step 3:

[0234] Simultaneously, the server uses IBM Watson Tone Analyzer to analyze the user's emotional state from the text data. The results of the emotional analysis are stored on the server and become input data for the next information delivery. Based on the analysis results, the translated information is adjusted to be more user-friendly.

[0235] Step 4:

[0236] The server retrieves user history information from Amazon DynamoDB and performs analysis using AWS Lambda. This generates personalized travel plans based on the retrieved emotional states and past history data. Machine learning algorithms are used in this planning process, and the generated plans take into account the user's browsing history and emotional states.

[0237] Step 5:

[0238] The terminal receives a personalized travel plan sent back from the server and provides the user with information on relaxing tourist spots and other relevant details. A specific suggestion might be displayed, such as, "We recommend [○○] as a relaxing tourist destination. Enjoy your visit!" The generative AI model utilizes prompts to provide such personalized suggestions.

[0239] Step 6:

[0240] When a user requests new information or their emotional state changes, the application restarts the cycle and takes appropriate action based on the updated data. Each step is seamlessly repeated to enable real-time responses.

[0241] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0242] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0243] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0244] [Second Embodiment]

[0245] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0246] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0247] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0248] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0249] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0250] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0251] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0252] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0253] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0254] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0255] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0256] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0257] This invention provides a system that enables travelers to communicate across language barriers, create individually optimized travel plans, and respond quickly to emergencies. Specific embodiments are described below.

[0258] First, the user launches an application installed on their smartphone. The device displays the home screen, and the user can choose from functions such as language translation, travel planning, or emergency response.

[0259] For example, if the multilingual translation function is selected, the user provides voice input. The device captures this voice data and converts it into text data using speech recognition technology. This text data is then sent to a server and translated into the user's chosen language using a multilingual translation device. The translated text is returned to the device and displayed on the screen or output as audio to the user. For example, when a Japanese user orders at a German restaurant, they can speak into the app and have it translated into English or German.

[0260] Next, in the personalized travel planning feature, the user enters their travel destination and activities of interest into the device. The device sends this information to a server, where a machine learning algorithm analyzes the user's past history and preferences. As a result, the server generates an optimal travel plan and sends it back to the device, suggesting it to the user. For example, if the user is interested in art in France, local museums and art festivals will be suggested.

[0261] Finally, in the emergency response function, the server continuously monitors flight status and weather, and promptly notifies the user if an anomaly is detected. The terminal receives this notification and provides the user with detailed alternatives, including alternative routes and accommodations. For example, if a flight is delayed, the terminal can immediately suggest train or bus arrangements, allowing the user to make a choice for quick and effective travel.

[0262] As described above, this embodiment provides comprehensive support to enable travelers to enjoy their trips with peace of mind.

[0263] The following describes the processing flow.

[0264] Step 1:

[0265] The user launches the application on their smartphone. The home screen appears, displaying options for language translation, travel planning, and emergency response.

[0266] Step 2:

[0267] The user selects the language translation function and presses the microphone button to input the text they want to translate by voice.

[0268] Step 3:

[0269] The device records the audio data, activates the speech recognition engine to convert the audio into text, and prepares to send this converted text to the server.

[0270] Step 4:

[0271] The server inputs the received text into a multilingual translation API and translates it into the specified target language in real time.

[0272] Step 5:

[0273] The server sends the translated text back to the terminal. The receiving terminal displays the result on its user interface and provides audio output as needed.

[0274] Step 6:

[0275] The user selects the travel planning function, enters their destination and activities of interest, and specifies detailed conditions.

[0276] Step 7:

[0277] The terminal sends the entered travel information to the server. The server refers to the accumulated historical data and the user's profile, and analyzes the data using machine learning algorithms.

[0278] Step 8:

[0279] Based on the analysis results, the server automatically generates an optimal travel plan tailored to the user and sends the result to the terminal.

[0280] Step 9:

[0281] The terminal displays the generated travel plan to the user and provides an interface for the user to check the details of the plan.

[0282] Step 10:

[0283] The server monitors external information sources and collects real-time data on flights and weather.

[0284] Step 11:

[0285] When an abnormality is detected, the server activates an abnormality detection algorithm and generates an emergency notification.

[0286] Step 12:

[0287] The server sends an alternative including the notification to the terminal. The terminal notifies the user of this information and presents countermeasures.

[0288] Step 13:

[0289] The user evaluates the alternatives when a problem occurs and decides on an action based on the options.

[0290] (Example 1)

[0291] Next, Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0292] When travelers visit countries that use different natural languages, they may experience difficulties in smooth communication and have difficulty in flexibly responding to travel plans and emergencies. In the conventional technology, there has been a lack of means to comprehensively and efficiently solve these problems. In particular, there is a need for the generation of travel plans optimized individually and a rapid response to emergencies.

[0293] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0294] In this invention, the server includes a device that converts voice information into text information, a conversion device that translates text information into multiple natural languages, and a processing device that analyzes the traveler's history information and generates a personalized travel plan. This enables smooth communication across language barriers, the provision of optimal travel plans tailored to the user's needs, and a quick and flexible response to emergencies.

[0295] A "device that converts audio information into text information" is a device that analyzes information input via voice and converts its content into digital text information.

[0296] A "text information translation device for multiple natural languages" is a device that has the function of automatically converting input text information into different natural languages ​​selected by the user.

[0297] A "device that displays translated text information on a user interface or presents it audibly" is a device that has the function of visually displaying the translated result or outputting it audibly.

[0298] A "processing device that analyzes travelers' historical information and generates personalized travel plans" is a device that analyzes travelers' past behavioral data and preferences and creates individual travel plans accordingly.

[0299] An "algorithm for detecting anomalies" is a computational processing method that analyzes data acquired in real time to detect travel-related problems and malfunctions at an early stage.

[0300] A "processing device that presents alternative plans" is a device that has the function of planning and presenting alternative actions or means to the user in response to detected anomalies.

[0301] A "processing device that generates the optimal response from input information using a generation algorithm" is a device equipped with an algorithm that analyzes information received from a user and automatically creates an appropriate and effective response.

[0302] This invention provides a system that enables travelers to communicate smoothly across language barriers, create personalized travel plans, and respond quickly in emergencies.

[0303] The system is primarily composed of three elements: servers, terminals, and users.

[0304] The terminal is a mobile information terminal such as a smartphone or tablet, and the user interacts with the system through this device. The terminal is equipped with speech recognition software that converts speech information into text information; specifically, it is possible to convert speech to text using a general speech recognition API, for example. The translated text information is then displayed on the screen or output as speech using speech synthesis technology.

[0305] The server handles complex data processing and includes a translation device for translating text information into multiple natural languages. Cloud translation services are used here to expedite the translation process. Furthermore, the server includes a generative AI model to analyze the user's past history and generate personalized travel plans. This ensures that the user receives an optimal travel plan tailored to their preferences. In addition, the server incorporates algorithms to detect travel-related anomalies, enabling early detection and the suggestion of countermeasures.

[0306] When a user speaks into the device, it generates a prompt, such as "I want to plan an art tour in Spain. What are some recommended museums and events?", and sends it to the server. The server uses this information to create an optimal travel plan and returns it to the device. As a result, the user can plan their trip and make necessary reservations based on that information.

[0307] Such systems not only comprehensively solve communication problems in countries with different cultural backgrounds, but also provide support according to the individual needs of each traveler.

[0308] The flow of the specific process in Example 1 will be described using FIG. 11.

[0309] Step 1:

[0310] The user launches the application using a terminal such as a smartphone. The terminal displays the home screen of the application and allows the user to select a menu for language translation, travel planning, or emergency response. Here, a menu screen corresponding to the user's operation is output.

[0311] Step 2:

[0312] The user selects the language translation function and performs voice input. The terminal receives the voice data as input and converts it into text information using internal voice recognition software. The converted text is transmitted from the terminal to the server. As a result, the conversion of voice data into text data is output.

[0313] Step 3:

[0314] The server transmits the received text information to a multilingual translation service and translates it into the specified language. The translation service used employs a cloud-based API. As a result, the translated text information is output and returned from the server to the terminal.

[0315] Step 4:

[0316] The terminal receives the translated text information and presents it to the user. The presentation methods include displaying the text on the display and playing it as voice using voice synthesis technology. The user checks it and continues the communication. The translation result is output to the user.

[0317] Step 5:

[0318] The user selects the travel planning function and enters the places they want to visit and the activities they are interested in into the device. The device sends this information to the server. The entered information includes the user's travel destination and hobbies.

[0319] Step 6:

[0320] The server generates appropriate travel plans using a generative AI model based on the input information. This involves analyzing the user's past history and current preferences to output a personalized plan.

[0321] Step 7:

[0322] The server sends the generated travel plan to the terminal and proposes it to the user. The terminal displays this plan as a list on the screen, allowing the user to select and decide. This helps the user actually put their travel plan into action.

[0323] Step 8:

[0324] If the emergency response function is selected, the server monitors travel-related information in real time and immediately notifies the user if an anomaly is detected. This includes information on weather and transportation delays. The notification will include suggestions for alternative routes and accommodations. The server uses an algorithm to identify anomaly information in order to detect it.

[0325] (Application Example 1)

[0326] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0327] When travelers visit different cultural regions, language barriers and a lack of transportation information can make smooth travel and effective communication difficult. Furthermore, it can be challenging to respond appropriately in emergencies, often causing anxiety for travelers. Therefore, improving the quality of travel is essential.

[0328] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0329] In this invention, the server includes means for converting voice input to text in real time, means equipped with a translation device for translating into multiple languages, and means for displaying or outputting the translated information on a user interface. This enables smooth communication in different cultural contexts and facilitates travel within cities visited by tourists.

[0330] "A means of converting voice input to text in real time" refers to a technology that instantly converts voice data into text information, making it available for subsequent processing.

[0331] "Means equipped with a translation device" refers to a technical configuration that has the function of converting text from one language to another language.

[0332] "Means of displaying or outputting audio on a user interface" refers to technical methods for providing information to users visually or aurally.

[0333] "Means for analyzing travelers' historical information and generating personalized travel plans" refers to technology for creating optimal travel plans for each traveler based on past behavioral data.

[0334] "A means of detecting travel-related emergencies early using anomaly detection algorithms and proposing alternative solutions" refers to analytical techniques for quickly providing countermeasures in response to unexpected situations.

[0335] "Means of acquiring public transport information within a city and translating and presenting it in the user's native language" refers to a method of collecting data on public transport in a city, converting it into a language understandable to the user, and providing it to the user.

[0336] The system for realizing this invention consists of a smartphone, a server, and network communication. Users can obtain various information through voice input using a dedicated application on their smartphone. This application uses speech recognition technology to convert voice data into text data in real time. Typically, this process uses speech recognition software on the smartphone (e.g., Google Speech-to-Text API).

[0337] The converted text data is sent to a server via the internet. The server is equipped with a translation device (e.g., Google Translate API) to translate the text into various languages, converts the text to the user's desired language, and sends it to the smartphone. This information is either displayed visually on the smartphone's user interface or provided as audio through a voice output device.

[0338] Furthermore, this system analyzes data, including the user's travel history, on a server. Using machine learning algorithms, it generates personalized travel plans and presents the most suitable options. It can also collect public transport data within cities in real time, translate it into the user's native language, and present it to them. This allows users to navigate smoothly within their destination cities.

[0339] Furthermore, the server uses anomaly detection algorithms to promptly detect various emergencies related to travelers and suggest alternative solutions as needed. This requires the acquisition and analysis of real-time data such as traffic and weather information, enabling travelers to respond flexibly.

[0340] For example, when a user uses public transport in a city they are visiting for the first time, scanning the QR code at a bus stop with their smartphone will display information on available transportation options and scheduled times in their native language. Furthermore, if their travel schedule changes, the system will suggest alternative tourist destinations and accommodations.

[0341] An example of a prompt message is, "Please suggest activities that the user would like to do while sightseeing in Tokyo. The user is interested in art."

[0342] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0343] Step 1:

[0344] The user launches the smartphone app and begins voice input. The app receives voice data as input and converts it into text data using speech recognition software on the smartphone. This data conversion generates text extracted from the voice.

[0345] Step 2:

[0346] The terminal sends the converted text data to the server via the internet. The server receives the text data as input and translates it into the specified language using its translation device. As part of the data processing, a translation API is used to generate organized text data.

[0347] Step 3:

[0348] The server sends the translated text back to the terminal. The terminal receives this translation and either displays it on the user interface or plays it back as audio through a speech output device. This allows the user to obtain information in their native language.

[0349] Step 4:

[0350] The user provides past travel history information, which is sent from the device to the server. The server, receiving the history data as input, uses a machine learning algorithm to generate a personalized travel plan and sends the result to the device. This allows the user to be offered a travel plan that is suitable for them.

[0351] Step 5:

[0352] The device scans a QR code containing public transport information obtained within the city. It receives the QR code information as input and sends that data to a server. The server retrieves real-time traffic information, translates it into the specified language, and then sends it back to the device. This allows the user to obtain information that will help them travel smoothly in their destination.

[0353] Step 6:

[0354] The server uses an anomaly detection algorithm to detect travel-related emergencies early. It receives current travel status and weather information as input and detects anomalies through data calculations. As a result, it generates necessary alternatives and sends them to the terminal. This allows users to continue their travels smoothly even in emergencies.

[0355] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0356] This invention provides a system incorporating an emotion engine to optimize the user experience during travel. In addition to real-time speech-to-text conversion and multilingual translation, the system recognizes the user's emotions and provides personalized suggestions based on them. Furthermore, in the event of an emergency, it provides support adapted to the user's emotional state.

[0357] First, the user launches the app on their device. The application includes translation, travel planning, and emergency response functions, which the user can freely choose from.

[0358] For example, in a language translation function with an integrated emotion engine, when a user communicates a message via voice input, the device converts this voice into text and sends it to the server. The server translates the text and also performs emotion analysis, adjusting the translation result to reflect the user's emotions. For instance, if the user is feeling anxious, the translation result might be rephrased to be more polite and reassuring.

[0359] Next, in the travel planning function, the device sends data to the server based on the user's input. The server uses machine learning algorithms to analyze the user's preferences and history. Furthermore, an emotion engine takes the user's emotional state into consideration, and an optimal travel plan tailored to those emotions is generated and sent back. For example, if the emotion of wanting to relax is detected, suggestions will focus on activities that will help the user refresh.

[0360] In the emergency response function, the server monitors real-time data and notifies the terminal if an anomaly is detected. At this time, the emotion engine evaluates the user's stress level and suggests countermeasures appropriate to the user's state. For example, if the user is extremely stressed, it will help them clearly explain the situation, along with providing instructions on how to quickly contact the support center.

[0361] In this way, by utilizing the emotion engine, users can enjoy a more comfortable and safer travel experience.

[0362] The following describes the processing flow.

[0363] Step 1:

[0364] The user launches an application with an integrated emotion engine on their smartphone. Options for translation, travel planning, and emergency response are displayed on the home screen.

[0365] Step 2:

[0366] The user selects the translation function and sets the language they want to translate into and their preference for sentiment analysis.

[0367] Step 3:

[0368] The user taps the microphone button and inputs the text they want to translate by voice.

[0369] Step 4:

[0370] The device records the audio as data and converts it into text using a speech recognition engine.

[0371] Step 5:

[0372] The terminal sends the converted text data to the server.

[0373] Step 6:

[0374] The server uses a translation device to translate text into multiple specified languages, while simultaneously analyzing the user's emotions through an emotion engine.

[0375] Step 7:

[0376] Based on the analysis results and translated text, the server adjusts the translation results according to the user's sentiment and corrects them to more appropriate expressions.

[0377] Step 8:

[0378] The server sends the edited text to the terminal.

[0379] Step 9:

[0380] The terminal displays the received translation results on the user interface or outputs them as audio to the user.

[0381] Step 10:

[0382] The user selects the travel planning function and enters their destination and interests into the device.

[0383] Step 11:

[0384] The terminal sends user information and input data to the server and requests analysis in conjunction with the emotion engine.

[0385] Step 12:

[0386] Based on the received data, the server uses machine learning to generate travel plans that are based on history and emotional state.

[0387] Step 13:

[0388] The server sends the generated travel plan to the terminal and adjusts the display order and plan based on the user's emotions.

[0389] Step 14:

[0390] In the event of an emergency, the server analyzes real-time data to detect anomalies.

[0391] Step 15:

[0392] The server considers the user's emotions, generates the optimal response through an emotion engine, and sends it to the terminal.

[0393] Step 16:

[0394] The device notifies the user of the generated information and provides emotionally responsive support and suggestions for the next course of action.

[0395] (Example 2)

[0396] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0397] There is a need to eliminate the inconveniences travelers face due to language barriers and sudden schedule changes, and to provide a more comfortable and personalized travel experience. Furthermore, when responding to emergencies, flexible responses that take into account the user's psychological state are crucial. However, current technology has not been able to comprehensively address these challenges.

[0398] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0399] In this invention, the server includes means for converting voice signals into digital text information, means for providing a device for converting the digital text information into various languages, and means for analyzing the user's psychological state and making emotion-based adjustments to the generated plan and converted information. This provides users with a sense of security that transcends language barriers, and enables a personalized travel experience and flexible emergency response that takes into account the user's psychological state.

[0400] An "audio signal" is a signal obtained by converting sound into an electrical signal, making it usable for digital processing and communication.

[0401] "Digital text information" refers to information such as audio that has been converted into text, and is in a format that can be processed and displayed by a computer.

[0402] A "conversion device" is a device or software used to convert data in one format to data in another format.

[0403] A "display device" is a device used to visually display information in digital format, and includes screens and monitors.

[0404] "Users" refer to individuals who utilize this system and wish to have a comfortable travel experience.

[0405] "History information" refers to recorded data about a user's past actions and choices, and is used to provide personalized services.

[0406] An "anomaly detection algorithm" is a mathematical or computational method for detecting phenomena that deviate from the normal state.

[0407] "Psychological state" refers to the user's emotions and mental condition, and is a factor that influences the services provided by the system.

[0408] "Flexible response" refers to adaptive actions and measures that can be changed according to the user's condition and circumstances.

[0409] This invention is a system that reduces language barriers for travelers and provides an individually optimized travel experience. The system operates through an application installed on the user's device during travel. Specific embodiments of the system are described below.

[0410] First, the user launches the application on a mobile device or tablet. This application includes key functions such as voice input, language conversion, travel planning, and emergency response.

[0411] The device utilizes speech recognition software (such as the Google Speech-to-Text API) to convert audio signals into digital text. This allows for real-time conversion of user speech into text. The converted text is then sent from the device to the server.

[0412] The server utilizes a language conversion device (e.g., DeepL API) to convert textual information into a defined language. It also employs sentiment analysis software (e.g., Google Cloud Natural Language API) to recognize and analyze user emotions. Based on the results of this sentiment analysis, the server adjusts the converted language information to match the user's emotional state.

[0413] In the travel planning function, one of the key features, the server analyzes data using a machine learning algorithm that has learned the user's preferences based on past travel history, and generates a personalized travel plan. This plan takes into account the user's emotional state; for example, if the user requests relaxation, it will suggest activities such as relaxation facilities.

[0414] In emergency situations, the server monitors various types of information in real time. When an abnormal situation is detected, it quickly sends a notification to the terminal, assesses the user's mental state, and then suggests appropriate countermeasures.

[0415] A concrete example of this prompt is: "Explain the best translated response for when a user feels anxious while traveling, and how that response improves the user's feelings."

[0416] This enables a system that combines language translation, personalized travel planning, and consideration of psychological state to provide a more comfortable and fulfilling travel experience. In this embodiment, the system overcomes language barriers in real time, reduces traveler stress, and enables safer travel.

[0417] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0418] Step 1:

[0419] The user launches the application on the device. The device provides interfaces for voice input, language conversion, travel planning, and emergency response. The user selects the necessary function. Based on this input, the application proceeds to the next processing step.

[0420] Step 2:

[0421] If the user selects voice input, the device uses its microphone to collect the user's voice. This voice signal is treated as input and converted into text information using speech recognition software. Specifically, the Google Speech-to-Text API is used to convert the voice signal into text data. This converted text data is then generated as output and sent to the server.

[0422] Step 3:

[0423] The server takes text data received from the terminal as input and converts it to the specified language using the DeepL API. The converted text is then used with the Google Cloud Natural Language API to perform sentiment analysis. Based on the analysis results, the text is adjusted to reflect emotions, and the final translation is generated as output.

[0424] Step 4:

[0425] The terminal receives the final translation result sent from the server. This translation result is either displayed to the user visually or output via speech synthesis. The user can then use this translation to communicate.

[0426] Step 5:

[0427] If a user selects the travel planning function, the device collects travel-related preferences and history from the user as input. This data is sent to a server for analysis using machine learning algorithms. Based on the analysis results, a personalized travel plan tailored to the user's preferences and emotions is generated and provided as output.

[0428] Step 6:

[0429] The server monitors for emergencies during travel using real-time data transmitted from the terminal. It utilizes an anomaly detection algorithm, receiving anomalies as input and generating appropriate response suggestions as output. These suggestions take into account the user's current psychological state.

[0430] Step 7:

[0431] The terminal receives and displays emergency notifications and appropriate response suggestions from the server to the user. This allows the user to quickly implement the response. Throughout this entire process, the system can respond to user prompts through a generated AI model.

[0432] (Application Example 2)

[0433] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0434] Current travel support systems often fail to provide a sense of security or satisfaction because they deliver information without considering the user's emotional state. Furthermore, language barriers and emergency situations can lead to inappropriate information being provided, potentially causing stress. There is a need to address these issues and provide a more personalized travel experience.

[0435] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0436] In this invention, the server includes means for converting voice input into text in real time, means for recognizing emotional states and adjusting information based on those emotions, and means for providing emotionally tailored information in public transportation and tourist spots. This makes it possible to provide detailed information that responds to the user's emotions.

[0437] "A means of converting voice input to text in real time" refers to a technology that has the function of instantly converting the voice spoken by the user into text data.

[0438] "Means equipped with a translation device for translating into multiple languages" refers to a technology that has the ability to instantly translate text data into different specified languages.

[0439] "Means of displaying or outputting audio on a user interface" refers to technologies that have the function of conveying translated text to the user via a screen or speaker.

[0440] "Methods for analyzing travelers' historical information and generating personalized travel plans" refers to technologies that analyze data on travelers' past behavior and preferences and create customized travel plans based on that data.

[0441] "A means of detecting travel-related emergencies early using anomaly detection algorithms and proposing alternative solutions" refers to a technology that quickly identifies potential risks and anomalies during travel and proposes appropriate countermeasures.

[0442] "Means of recognizing emotional states and adjusting information based on those emotions" refers to technologies that analyze the user's emotions and optimize the content and expression of information based on the results.

[0443] "Means of providing information tailored to emotions in public transportation and tourist attractions" refers to technologies that provide information related to public transportation and tourist attractions in a way that is appropriate to the user's emotions.

[0444] This invention is a system that personalizes the user's travel experience, making it safer and more comfortable, through an application running on a smart device. The main elements for realizing this system and their operation are described below.

[0445] First, the user launches an application installed on their smart device. This application has a function to convert speech to text in real time, and uses the Google Cloud Speech-to-Text API to enable fast and accurate conversion. The converted text data is then translated into multiple languages ​​by the Google Cloud Translation API. At the same time, IBM Watson Tone Analyzer is used to analyze the user's emotional state from the speech content. The results of this emotion analysis are used to adjust the translation results and optimize the information provided.

[0446] A serverless architecture using AWS Lambda is employed to generate appropriate travel plans and information based on users' past travel data and current emotional states. This enables the personalization of travel plans and allows for personalized suggestions using user history information recorded in Amazon DynamoDB. Information on public transportation and tourist attractions is provided to users based on real-time data, and is particularly customized based on emotional responses.

[0447] For example, if a user feels anxious while riding the subway, the app recognizes that emotion and immediately displays a follow-up message such as, "A recommended tourist spot where you can relax is XX. Enjoy yourself." An example of a prompt to the generating AI model in this case would be, "Based on the user's current emotional state, please provide reassuring information."

[0448] In this way, this system combines speech recognition, translation, and sentiment analysis to provide comprehensive support for users' travel experiences.

[0449] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0450] Step 1:

[0451] When a user launches the application on their smart device, it is ready to begin voice input. Once the user's voice data is input, the device calls the Google Cloud Speech-to-Text API to convert the voice data into text. The converted text is then output to the device.

[0452] Step 2:

[0453] The text data obtained from the device is sent to the server. The server uses the Google Cloud Translation API to translate the text data into the specified languages. The translated results are sent back from the server to the device, and the device displays the translated text to the user.

[0454] Step 3:

[0455] Simultaneously, the server uses IBM Watson Tone Analyzer to analyze the user's emotional state from the text data. The results of the emotional analysis are stored on the server and become input data for the next information delivery. Based on the analysis results, the translated information is adjusted to be more user-friendly.

[0456] Step 4:

[0457] The server retrieves user history information from Amazon DynamoDB and performs analysis using AWS Lambda. This generates personalized travel plans based on the retrieved emotional states and past history data. Machine learning algorithms are used in this planning process, and the generated plans take into account the user's browsing history and emotional states.

[0458] Step 5:

[0459] The terminal receives a personalized travel plan sent back from the server and provides the user with information on relaxing tourist spots and other relevant details. A specific suggestion might be displayed, such as, "We recommend [○○] as a relaxing tourist destination. Enjoy your visit!" The generative AI model utilizes prompts to provide such personalized suggestions.

[0460] Step 6:

[0461] When a user requests new information or their emotional state changes, the application restarts the cycle and takes appropriate action based on the updated data. Each step is seamlessly repeated to enable real-time responses.

[0462] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0463] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0464] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0465] [Third Embodiment]

[0466] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0467] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0468] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0469] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0470] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0471] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0472] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0473] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0474] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0475] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0476] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0477] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0478] This invention provides a system that enables travelers to communicate across language barriers, create individually optimized travel plans, and respond quickly to emergencies. Specific embodiments are described below.

[0479] First, the user launches an application installed on their smartphone. The device displays the home screen, and the user can choose from functions such as language translation, travel planning, or emergency response.

[0480] For example, if the multilingual translation function is selected, the user provides voice input. The device captures this voice data and converts it into text data using speech recognition technology. This text data is then sent to a server and translated into the user's chosen language using a multilingual translation device. The translated text is returned to the device and displayed on the screen or output as audio to the user. For example, when a Japanese user orders at a German restaurant, they can speak into the app and have it translated into English or German.

[0481] Next, in the personalized travel planning feature, the user enters their travel destination and activities of interest into the device. The device sends this information to a server, where a machine learning algorithm analyzes the user's past history and preferences. As a result, the server generates an optimal travel plan and sends it back to the device, suggesting it to the user. For example, if the user is interested in art in France, local museums and art festivals will be suggested.

[0482] Finally, in the emergency response function, the server continuously monitors flight status and weather, and promptly notifies the user if an anomaly is detected. The terminal receives this notification and provides the user with detailed alternatives, including alternative routes and accommodations. For example, if a flight is delayed, the terminal can immediately suggest train or bus arrangements, allowing the user to make a choice for quick and effective travel.

[0483] As described above, this embodiment provides comprehensive support to enable travelers to enjoy their trips with peace of mind.

[0484] The following describes the processing flow.

[0485] Step 1:

[0486] The user launches the application on their smartphone. The home screen appears, displaying options for language translation, travel planning, and emergency response.

[0487] Step 2:

[0488] The user selects the language translation function and presses the microphone button to input the text they want to translate by voice.

[0489] Step 3:

[0490] The device records the audio data, activates the speech recognition engine to convert the audio into text, and prepares to send this converted text to the server.

[0491] Step 4:

[0492] The server inputs the received text into a multilingual translation API and translates it into the specified target language in real time.

[0493] Step 5:

[0494] The server sends the translated text back to the terminal. The receiving terminal displays the result on its user interface and provides audio output as needed.

[0495] Step 6:

[0496] The user selects the travel planning function, enters their destination and activities of interest, and specifies detailed conditions.

[0497] Step 7:

[0498] The terminal sends the entered travel information to the server. The server refers to the accumulated historical data and the user's profile, and analyzes the data using machine learning algorithms.

[0499] Step 8:

[0500] Based on the analysis results, the server automatically generates an optimal travel plan tailored to the user and sends the result to the terminal.

[0501] Step 9:

[0502] The terminal displays the generated travel plan to the user and provides an interface for viewing the plan details.

[0503] Step 10:

[0504] The server monitors external information sources and collects real-time data on flights and weather.

[0505] Step 11:

[0506] If an anomaly is detected, the server activates the anomaly detection algorithm and generates an emergency notification.

[0507] Step 12:

[0508] The server sends an alternative solution, including a notification, to the device. The device then notifies the user of this information and presents the appropriate course of action.

[0509] Step 13:

[0510] When a problem arises, users evaluate alternatives and decide on an action based on the available options.

[0511] (Example 1)

[0512] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0513] When travelers visit countries with different natural languages, they may encounter difficulties in smooth communication and in planning their trips and responding flexibly to emergencies. Traditional technologies have lacked the means to comprehensively and efficiently address these challenges. In particular, there is a need for the generation of individually optimized travel plans and rapid responses to emergencies.

[0514] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0515] In this invention, the server includes a device that converts voice information into text information, a conversion device that translates text information into multiple natural languages, and a processing device that analyzes the traveler's history information and generates a personalized travel plan. This enables smooth communication across language barriers, the provision of optimal travel plans tailored to the user's needs, and a quick and flexible response to emergencies.

[0516] A "device that converts audio information into text information" is a device that analyzes information input via voice and converts its content into digital text information.

[0517] A "text information translation device for multiple natural languages" is a device that has the function of automatically converting input text information into different natural languages ​​selected by the user.

[0518] A "device that displays translated text information on a user interface or presents it audibly" is a device that has the function of visually displaying the translated result or outputting it audibly.

[0519] A "processing device that analyzes travelers' historical information and generates personalized travel plans" is a device that analyzes travelers' past behavioral data and preferences and creates individual travel plans accordingly.

[0520] An "algorithm for detecting anomalies" is a computational processing method that analyzes data acquired in real time to detect travel-related problems and malfunctions at an early stage.

[0521] A "processing device that presents alternative plans" is a device that has the function of planning and presenting alternative actions or means to the user in response to detected anomalies.

[0522] A "processing device that generates the optimal response from input information using a generation algorithm" is a device equipped with an algorithm that analyzes information received from a user and automatically creates an appropriate and effective response.

[0523] This invention provides a system that enables travelers to communicate smoothly across language barriers, create personalized travel plans, and respond quickly in emergencies.

[0524] The system is primarily composed of three elements: servers, terminals, and users.

[0525] The terminal is a mobile information terminal such as a smartphone or tablet, and the user interacts with the system through this device. The terminal is equipped with speech recognition software that converts speech information into text information; specifically, it is possible to convert speech to text using a general speech recognition API, for example. The translated text information is then displayed on the screen or output as speech using speech synthesis technology.

[0526] The server handles complex data processing and includes a translation device for translating text information into multiple natural languages. Cloud translation services are used here to expedite the translation process. Furthermore, the server includes a generative AI model to analyze the user's past history and generate personalized travel plans. This ensures that the user receives an optimal travel plan tailored to their preferences. In addition, the server incorporates algorithms to detect travel-related anomalies, enabling early detection and the suggestion of countermeasures.

[0527] When a user speaks into the device, it generates a prompt, such as "I want to plan an art tour in Spain. What are some recommended museums and events?", and sends it to the server. The server uses this information to create an optimal travel plan and returns it to the device. As a result, the user can plan their trip and make necessary reservations based on that information.

[0528] These systems not only solve communication problems in countries with diverse cultural backgrounds, but also provide support tailored to the individual needs of each traveler.

[0529] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0530] Step 1:

[0531] The user launches the application using a device such as a smartphone. The device displays the app's home screen and prompts the user to select a menu item such as language translation, travel planning, or emergency response. The menu screen displayed here corresponds to the user's selection.

[0532] Step 2:

[0533] The user selects the language translation function and performs voice input. The device receives the voice data as input and converts it into text information using its internal speech recognition software. The converted text is sent from the device to the server. This results in the output of the converted voice data into text data.

[0534] Step 3:

[0535] The server sends the received text information to a multilingual translation service, where it is translated into the specified language. The translation service used utilizes a cloud-based API. The translated text information is then output and returned from the server to the terminal.

[0536] Step 4:

[0537] The device receives the translated text information and presents it to the user. Presentation methods include displaying the text on the screen or playing it back as audio using speech synthesis technology. The user confirms it and continues communication. The translation result is then output to the user.

[0538] Step 5:

[0539] The user selects the travel planning function and enters the places they want to visit and the activities they are interested in into the device. The device sends this information to the server. The entered information includes the user's travel destination and hobbies.

[0540] Step 6:

[0541] The server generates appropriate travel plans using a generative AI model based on the input information. This involves analyzing the user's past history and current preferences to output a personalized plan.

[0542] Step 7:

[0543] The server sends the generated travel plan to the terminal and proposes it to the user. The terminal displays this plan as a list on the screen, allowing the user to select and decide. This helps the user actually put their travel plan into action.

[0544] Step 8:

[0545] If the emergency response function is selected, the server monitors travel-related information in real time and immediately notifies the user if an anomaly is detected. This includes information on weather and transportation delays. The notification will include suggestions for alternative routes and accommodations. The server uses an algorithm to identify anomaly information in order to detect it.

[0546] (Application Example 1)

[0547] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0548] When travelers visit different cultural regions, language barriers and a lack of transportation information can make smooth travel and effective communication difficult. Furthermore, it can be challenging to respond appropriately in emergencies, often causing anxiety for travelers. Therefore, improving the quality of travel is essential.

[0549] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0550] In this invention, the server includes means for converting voice input to text in real time, means equipped with a translation device for translating into multiple languages, and means for displaying or outputting the translated information on a user interface. This enables smooth communication in different cultural contexts and facilitates travel within cities visited by tourists.

[0551] "A means of converting voice input to text in real time" refers to a technology that instantly converts voice data into text information, making it available for subsequent processing.

[0552] "Means equipped with a translation device" refers to a technical configuration that has the function of converting text from one language to another language.

[0553] "Means of displaying or outputting audio on a user interface" refers to technical methods for providing information to users visually or aurally.

[0554] "Means for analyzing travelers' historical information and generating personalized travel plans" refers to technology for creating optimal travel plans for each traveler based on past behavioral data.

[0555] "A means of detecting travel-related emergencies early using anomaly detection algorithms and proposing alternative solutions" refers to analytical techniques for quickly providing countermeasures in response to unexpected situations.

[0556] "Means of acquiring public transport information within a city and translating and presenting it in the user's native language" refers to a method of collecting data on public transport in a city, converting it into a language understandable to the user, and providing it to the user.

[0557] The system for realizing this invention consists of a smartphone, a server, and network communication. Users can obtain various information through voice input using a dedicated application on their smartphone. This application uses speech recognition technology to convert voice data into text data in real time. Typically, this process uses speech recognition software on the smartphone (e.g., Google Speech-to-Text API).

[0558] The converted text data is sent to a server via the internet. The server is equipped with a translation device (e.g., Google Translate API) to translate the text into various languages, converts the text to the user's desired language, and sends it to the smartphone. This information is either displayed visually on the smartphone's user interface or provided as audio through a voice output device.

[0559] Furthermore, this system analyzes data, including the user's travel history, on a server. Using machine learning algorithms, it generates personalized travel plans and presents the most suitable options. It can also collect public transport data within cities in real time, translate it into the user's native language, and present it to them. This allows users to navigate smoothly within their destination cities.

[0560] Furthermore, the server uses anomaly detection algorithms to promptly detect various emergencies related to travelers and suggest alternative solutions as needed. This requires the acquisition and analysis of real-time data such as traffic and weather information, enabling travelers to respond flexibly.

[0561] For example, when a user uses public transport in a city they are visiting for the first time, scanning the QR code at a bus stop with their smartphone will display information on available transportation options and scheduled times in their native language. Furthermore, if their travel schedule changes, the system will suggest alternative tourist destinations and accommodations.

[0562] An example of a prompt message is, "Please suggest activities that the user would like to do while sightseeing in Tokyo. The user is interested in art."

[0563] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0564] Step 1:

[0565] The user launches the smartphone app and begins voice input. The app receives voice data as input and converts it into text data using speech recognition software on the smartphone. This data conversion generates text extracted from the voice.

[0566] Step 2:

[0567] The terminal sends the converted text data to the server via the internet. The server receives the text data as input and translates it into the specified language using its translation device. As part of the data processing, a translation API is used to generate organized text data.

[0568] Step 3:

[0569] The server sends the translated text back to the terminal. The terminal receives this translation and either displays it on the user interface or plays it back as audio through a speech output device. This allows the user to obtain information in their native language.

[0570] Step 4:

[0571] The user provides past travel history information, which is sent from the device to the server. The server, receiving the history data as input, uses a machine learning algorithm to generate a personalized travel plan and sends the result to the device. This allows the user to be offered a travel plan that is suitable for them.

[0572] Step 5:

[0573] The device scans a QR code containing public transport information obtained within the city. It receives the QR code information as input and sends that data to a server. The server retrieves real-time traffic information, translates it into the specified language, and then sends it back to the device. This allows the user to obtain information that will help them travel smoothly in their destination.

[0574] Step 6:

[0575] The server uses an anomaly detection algorithm to detect travel-related emergencies early. It receives current travel status and weather information as input and detects anomalies through data calculations. As a result, it generates necessary alternatives and sends them to the terminal. This allows users to continue their travels smoothly even in emergencies.

[0576] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0577] This invention provides a system incorporating an emotion engine to optimize the user experience during travel. In addition to real-time speech-to-text conversion and multilingual translation, the system recognizes the user's emotions and provides personalized suggestions based on them. Furthermore, in the event of an emergency, it provides support adapted to the user's emotional state.

[0578] First, the user launches the app on their device. The application includes translation, travel planning, and emergency response functions, which the user can freely choose from.

[0579] For example, in a language translation function with an integrated emotion engine, when a user communicates a message via voice input, the device converts this voice into text and sends it to the server. The server translates the text and also performs emotion analysis, adjusting the translation result to reflect the user's emotions. For instance, if the user is feeling anxious, the translation result might be rephrased to be more polite and reassuring.

[0580] Next, in the travel planning function, the device sends data to the server based on the user's input. The server uses machine learning algorithms to analyze the user's preferences and history. Furthermore, an emotion engine takes the user's emotional state into consideration, and an optimal travel plan tailored to those emotions is generated and sent back. For example, if the emotion of wanting to relax is detected, suggestions will focus on activities that will help the user refresh.

[0581] In the emergency response function, the server monitors real-time data and notifies the terminal if an anomaly is detected. At this time, the emotion engine evaluates the user's stress level and suggests countermeasures appropriate to the user's state. For example, if the user is extremely stressed, it will help them clearly explain the situation, along with providing instructions on how to quickly contact the support center.

[0582] In this way, by utilizing the emotion engine, users can enjoy a more comfortable and safer travel experience.

[0583] The following describes the processing flow.

[0584] Step 1:

[0585] The user launches an application with an integrated emotion engine on their smartphone. Options for translation, travel planning, and emergency response are displayed on the home screen.

[0586] Step 2:

[0587] The user selects the translation function and sets the language they want to translate into and their preference for sentiment analysis.

[0588] Step 3:

[0589] The user taps the microphone button and inputs the text they want to translate by voice.

[0590] Step 4:

[0591] The device records the audio as data and converts it into text using a speech recognition engine.

[0592] Step 5:

[0593] The terminal sends the converted text data to the server.

[0594] Step 6:

[0595] The server uses a translation device to translate text into multiple specified languages, while simultaneously analyzing the user's emotions through an emotion engine.

[0596] Step 7:

[0597] Based on the analysis results and translated text, the server adjusts the translation results according to the user's sentiment and corrects them to more appropriate expressions.

[0598] Step 8:

[0599] The server sends the edited text to the terminal.

[0600] Step 9:

[0601] The terminal displays the received translation results on the user interface or outputs them as audio to the user.

[0602] Step 10:

[0603] The user selects the travel planning function and enters their destination and interests into the device.

[0604] Step 11:

[0605] The terminal sends user information and input data to the server and requests analysis in conjunction with the emotion engine.

[0606] Step 12:

[0607] Based on the received data, the server uses machine learning to generate travel plans that are based on history and emotional state.

[0608] Step 13:

[0609] The server sends the generated travel plan to the terminal and adjusts the display order and plan based on the user's emotions.

[0610] Step 14:

[0611] In the event of an emergency, the server analyzes real-time data to detect anomalies.

[0612] Step 15:

[0613] The server considers the user's emotions, generates the optimal response through an emotion engine, and sends it to the terminal.

[0614] Step 16:

[0615] The device notifies the user of the generated information and provides emotionally responsive support and suggestions for the next course of action.

[0616] (Example 2)

[0617] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0618] There is a need to eliminate the inconveniences travelers face due to language barriers and sudden schedule changes, and to provide a more comfortable and personalized travel experience. Furthermore, when responding to emergencies, flexible responses that take into account the user's psychological state are crucial. However, current technology has not been able to comprehensively address these challenges.

[0619] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0620] In this invention, the server includes means for converting voice signals into digital text information, means for providing a device for converting the digital text information into various languages, and means for analyzing the user's psychological state and making emotion-based adjustments to the generated plan and converted information. This provides users with a sense of security that transcends language barriers, and enables a personalized travel experience and flexible emergency response that takes into account the user's psychological state.

[0621] An "audio signal" is a signal obtained by converting sound into an electrical signal, making it usable for digital processing and communication.

[0622] "Digital text information" refers to information such as audio that has been converted into text, and is in a format that can be processed and displayed by a computer.

[0623] A "conversion device" is a device or software used to convert data in one format to data in another format.

[0624] A "display device" is a device used to visually display information in digital format, and includes screens and monitors.

[0625] "Users" refer to individuals who utilize this system and wish to have a comfortable travel experience.

[0626] "History information" refers to recorded data about a user's past actions and choices, and is used to provide personalized services.

[0627] An "anomaly detection algorithm" is a mathematical or computational method for detecting phenomena that deviate from the normal state.

[0628] "Psychological state" refers to the user's emotions and mental condition, and is a factor that influences the services provided by the system.

[0629] "Flexible response" refers to adaptive actions and measures that can be changed according to the user's condition and circumstances.

[0630] This invention is a system that reduces language barriers for travelers and provides an individually optimized travel experience. The system operates through an application installed on the user's device during travel. Specific embodiments of the system are described below.

[0631] First, the user launches the application on a mobile device or tablet. This application includes key functions such as voice input, language conversion, travel planning, and emergency response.

[0632] The device utilizes speech recognition software (such as the Google Speech-to-Text API) to convert audio signals into digital text. This allows for real-time conversion of user speech into text. The converted text is then sent from the device to the server.

[0633] The server utilizes a language conversion device (e.g., DeepL API) to convert textual information into a defined language. It also employs sentiment analysis software (e.g., Google Cloud Natural Language API) to recognize and analyze user emotions. Based on the results of this sentiment analysis, the server adjusts the converted language information to match the user's emotional state.

[0634] In the travel planning function, one of the key features, the server analyzes data using a machine learning algorithm that has learned the user's preferences based on past travel history, and generates a personalized travel plan. This plan takes into account the user's emotional state; for example, if the user requests relaxation, it will suggest activities such as relaxation facilities.

[0635] In emergency situations, the server monitors various types of information in real time. When an abnormal situation is detected, it quickly sends a notification to the terminal, assesses the user's mental state, and then suggests appropriate countermeasures.

[0636] A concrete example of this prompt is: "Explain the best translated response for when a user feels anxious while traveling, and how that response improves the user's feelings."

[0637] This enables a system that combines language translation, personalized travel planning, and consideration of psychological state to provide a more comfortable and fulfilling travel experience. In this embodiment, the system overcomes language barriers in real time, reduces traveler stress, and enables safer travel.

[0638] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0639] Step 1:

[0640] The user launches the application on the device. The device provides interfaces for voice input, language conversion, travel planning, and emergency response. The user selects the necessary function. Based on this input, the application proceeds to the next processing step.

[0641] Step 2:

[0642] If the user selects voice input, the device uses its microphone to collect the user's voice. This voice signal is treated as input and converted into text information using speech recognition software. Specifically, the Google Speech-to-Text API is used to convert the voice signal into text data. This converted text data is then generated as output and sent to the server.

[0643] Step 3:

[0644] The server takes text data received from the terminal as input and converts it to the specified language using the DeepL API. The converted text is then used with the Google Cloud Natural Language API to perform sentiment analysis. Based on the analysis results, the text is adjusted to reflect emotions, and the final translation is generated as output.

[0645] Step 4:

[0646] The terminal receives the final translation result sent from the server. This translation result is either displayed to the user visually or output via speech synthesis. The user can then use this translation to communicate.

[0647] Step 5:

[0648] If a user selects the travel planning function, the device collects travel-related preferences and history from the user as input. This data is sent to a server for analysis using machine learning algorithms. Based on the analysis results, a personalized travel plan tailored to the user's preferences and emotions is generated and provided as output.

[0649] Step 6:

[0650] The server monitors for emergencies during travel using real-time data transmitted from the terminal. It utilizes an anomaly detection algorithm, receiving anomalies as input and generating appropriate response suggestions as output. These suggestions take into account the user's current psychological state.

[0651] Step 7:

[0652] The terminal receives and displays emergency notifications and appropriate response suggestions from the server to the user. This allows the user to quickly implement the response. Throughout this entire process, the system can respond to user prompts through a generated AI model.

[0653] (Application Example 2)

[0654] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0655] Current travel support systems often fail to provide a sense of security or satisfaction because they deliver information without considering the user's emotional state. Furthermore, language barriers and emergency situations can lead to inappropriate information being provided, potentially causing stress. There is a need to address these issues and provide a more personalized travel experience.

[0656] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0657] In this invention, the server includes means for converting voice input into text in real time, means for recognizing emotional states and adjusting information based on those emotions, and means for providing emotionally tailored information in public transportation and tourist spots. This makes it possible to provide detailed information that responds to the user's emotions.

[0658] "A means of converting voice input to text in real time" refers to a technology that has the function of instantly converting the voice spoken by the user into text data.

[0659] "Means equipped with a translation device for translating into multiple languages" refers to a technology that has the ability to instantly translate text data into different specified languages.

[0660] "Means of displaying or outputting audio on a user interface" refers to technologies that have the function of conveying translated text to the user via a screen or speaker.

[0661] "Methods for analyzing travelers' historical information and generating personalized travel plans" refers to technologies that analyze data on travelers' past behavior and preferences and create customized travel plans based on that data.

[0662] "A means of detecting travel-related emergencies early using anomaly detection algorithms and proposing alternative solutions" refers to a technology that quickly identifies potential risks and anomalies during travel and proposes appropriate countermeasures.

[0663] "Means of recognizing emotional states and adjusting information based on those emotions" refers to technologies that analyze the user's emotions and optimize the content and expression of information based on the results.

[0664] "Means of providing information tailored to emotions in public transportation and tourist attractions" refers to technologies that provide information related to public transportation and tourist attractions in a way that is appropriate to the user's emotions.

[0665] This invention is a system that personalizes the user's travel experience, making it safer and more comfortable, through an application running on a smart device. The main elements for realizing this system and their operation are described below.

[0666] First, the user launches an application installed on their smart device. This application has a function to convert speech to text in real time, and uses the Google Cloud Speech-to-Text API to enable fast and accurate conversion. The converted text data is then translated into multiple languages ​​by the Google Cloud Translation API. At the same time, IBM Watson Tone Analyzer is used to analyze the user's emotional state from the speech content. The results of this emotion analysis are used to adjust the translation results and optimize the information provided.

[0667] A serverless architecture using AWS Lambda is employed to generate appropriate travel plans and information based on users' past travel data and current emotional states. This enables the personalization of travel plans and allows for personalized suggestions using user history information recorded in Amazon DynamoDB. Information on public transportation and tourist attractions is provided to users based on real-time data, and is particularly customized based on emotional responses.

[0668] For example, if a user feels anxious while riding the subway, the app recognizes that emotion and immediately displays a follow-up message such as, "A recommended tourist spot where you can relax is XX. Enjoy yourself." An example of a prompt to the generating AI model in this case would be, "Based on the user's current emotional state, please provide reassuring information."

[0669] In this way, this system combines speech recognition, translation, and sentiment analysis to provide comprehensive support for users' travel experiences.

[0670] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0671] Step 1:

[0672] When a user launches the application on their smart device, it is ready to begin voice input. Once the user's voice data is input, the device calls the Google Cloud Speech-to-Text API to convert the voice data into text. The converted text is then output to the device.

[0673] Step 2:

[0674] The text data obtained from the device is sent to the server. The server uses the Google Cloud Translation API to translate the text data into the specified languages. The translated results are sent back from the server to the device, and the device displays the translated text to the user.

[0675] Step 3:

[0676] Simultaneously, the server uses IBM Watson Tone Analyzer to analyze the user's emotional state from the text data. The results of the emotional analysis are stored on the server and become input data for the next information delivery. Based on the analysis results, the translated information is adjusted to be more user-friendly.

[0677] Step 4:

[0678] The server retrieves user history information from Amazon DynamoDB and performs analysis using AWS Lambda. This generates personalized travel plans based on the retrieved emotional states and past history data. Machine learning algorithms are used in this planning process, and the generated plans take into account the user's browsing history and emotional states.

[0679] Step 5:

[0680] The terminal receives a personalized travel plan sent back from the server and provides the user with information on relaxing tourist spots and other relevant details. A specific suggestion might be displayed, such as, "We recommend [○○] as a relaxing tourist destination. Enjoy your visit!" The generative AI model utilizes prompts to provide such personalized suggestions.

[0681] Step 6:

[0682] When a user requests new information or their emotional state changes, the application restarts the cycle and takes appropriate action based on the updated data. Each step is seamlessly repeated to enable real-time responses.

[0683] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0684] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0685] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0686] [Fourth Embodiment]

[0687] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0688] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0689] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0690] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0691] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0692] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0693] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0694] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0695] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0696] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0697] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0698] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0699] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0700] This invention provides a system that enables travelers to communicate across language barriers, create individually optimized travel plans, and respond quickly to emergencies. Specific embodiments are described below.

[0701] First, the user launches an application installed on their smartphone. The device displays the home screen, and the user can choose from functions such as language translation, travel planning, or emergency response.

[0702] For example, if the multilingual translation function is selected, the user provides voice input. The device captures this voice data and converts it into text data using speech recognition technology. This text data is then sent to a server and translated into the user's chosen language using a multilingual translation device. The translated text is returned to the device and displayed on the screen or output as audio to the user. For example, when a Japanese user orders at a German restaurant, they can speak into the app and have it translated into English or German.

[0703] Next, in the personalized travel planning feature, the user enters their travel destination and activities of interest into the device. The device sends this information to a server, where a machine learning algorithm analyzes the user's past history and preferences. As a result, the server generates an optimal travel plan and sends it back to the device, suggesting it to the user. For example, if the user is interested in art in France, local museums and art festivals will be suggested.

[0704] Finally, in the emergency response function, the server continuously monitors flight status and weather, and promptly notifies the user if an anomaly is detected. The terminal receives this notification and provides the user with detailed alternatives, including alternative routes and accommodations. For example, if a flight is delayed, the terminal can immediately suggest train or bus arrangements, allowing the user to make a choice for quick and effective travel.

[0705] As described above, this embodiment provides comprehensive support to enable travelers to enjoy their trips with peace of mind.

[0706] The following describes the processing flow.

[0707] Step 1:

[0708] The user launches the application on their smartphone. The home screen appears, displaying options for language translation, travel planning, and emergency response.

[0709] Step 2:

[0710] The user selects the language translation function and presses the microphone button to input the text they want to translate by voice.

[0711] Step 3:

[0712] The device records the audio data, activates the speech recognition engine to convert the audio into text, and prepares to send this converted text to the server.

[0713] Step 4:

[0714] The server inputs the received text into a multilingual translation API and translates it into the specified target language in real time.

[0715] Step 5:

[0716] The server sends the translated text back to the terminal. The receiving terminal displays the result on its user interface and provides audio output as needed.

[0717] Step 6:

[0718] The user selects the travel planning function, enters their destination and activities of interest, and specifies detailed conditions.

[0719] Step 7:

[0720] The terminal sends the entered travel information to the server. The server refers to the accumulated historical data and the user's profile, and analyzes the data using machine learning algorithms.

[0721] Step 8:

[0722] Based on the analysis results, the server automatically generates an optimal travel plan tailored to the user and sends the result to the terminal.

[0723] Step 9:

[0724] The terminal displays the generated travel plan to the user and provides an interface for viewing the plan details.

[0725] Step 10:

[0726] The server monitors external information sources and collects real-time data on flights and weather.

[0727] Step 11:

[0728] If an anomaly is detected, the server activates the anomaly detection algorithm and generates an emergency notification.

[0729] Step 12:

[0730] The server sends an alternative solution, including a notification, to the device. The device then notifies the user of this information and presents the appropriate course of action.

[0731] Step 13:

[0732] When a problem arises, users evaluate alternatives and decide on an action based on the available options.

[0733] (Example 1)

[0734] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0735] When travelers visit countries with different natural languages, they may encounter difficulties in smooth communication and in planning their trips and responding flexibly to emergencies. Traditional technologies have lacked the means to comprehensively and efficiently address these challenges. In particular, there is a need for the generation of individually optimized travel plans and rapid responses to emergencies.

[0736] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0737] In this invention, the server includes a device that converts voice information into text information, a conversion device that translates text information into multiple natural languages, and a processing device that analyzes the traveler's history information and generates a personalized travel plan. This enables smooth communication across language barriers, the provision of optimal travel plans tailored to the user's needs, and a quick and flexible response to emergencies.

[0738] A "device that converts audio information into text information" is a device that analyzes information input via voice and converts its content into digital text information.

[0739] A "text information translation device for multiple natural languages" is a device that has the function of automatically converting input text information into different natural languages ​​selected by the user.

[0740] A "device that displays translated text information on a user interface or presents it audibly" is a device that has the function of visually displaying the translated result or outputting it audibly.

[0741] A "processing device that analyzes travelers' historical information and generates personalized travel plans" is a device that analyzes travelers' past behavioral data and preferences and creates individual travel plans accordingly.

[0742] An "algorithm for detecting anomalies" is a computational processing method that analyzes data acquired in real time to detect travel-related problems and malfunctions at an early stage.

[0743] A "processing device that presents alternative plans" is a device that has the function of planning and presenting alternative actions or means to the user in response to detected anomalies.

[0744] A "processing device that generates the optimal response from input information using a generation algorithm" is a device equipped with an algorithm that analyzes information received from a user and automatically creates an appropriate and effective response.

[0745] This invention provides a system that enables travelers to communicate smoothly across language barriers, create personalized travel plans, and respond quickly in emergencies.

[0746] The system is primarily composed of three elements: servers, terminals, and users.

[0747] The terminal is a mobile information terminal such as a smartphone or tablet, and the user interacts with the system through this device. The terminal is equipped with speech recognition software that converts speech information into text information; specifically, it is possible to convert speech to text using a general speech recognition API, for example. The translated text information is then displayed on the screen or output as speech using speech synthesis technology.

[0748] The server handles complex data processing and includes a translation device for translating text information into multiple natural languages. Cloud translation services are used here to expedite the translation process. Furthermore, the server includes a generative AI model to analyze the user's past history and generate personalized travel plans. This ensures that the user receives an optimal travel plan tailored to their preferences. In addition, the server incorporates algorithms to detect travel-related anomalies, enabling early detection and the suggestion of countermeasures.

[0749] When a user speaks into the device, it generates a prompt, such as "I want to plan an art tour in Spain. What are some recommended museums and events?", and sends it to the server. The server uses this information to create an optimal travel plan and returns it to the device. As a result, the user can plan their trip and make necessary reservations based on that information.

[0750] These systems not only solve communication problems in countries with diverse cultural backgrounds, but also provide support tailored to the individual needs of each traveler.

[0751] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0752] Step 1:

[0753] The user launches the application using a device such as a smartphone. The device displays the app's home screen and prompts the user to select a menu item such as language translation, travel planning, or emergency response. The menu screen displayed here corresponds to the user's selection.

[0754] Step 2:

[0755] The user selects the language translation function and performs voice input. The device receives the voice data as input and converts it into text information using its internal speech recognition software. The converted text is sent from the device to the server. This results in the output of the converted voice data into text data.

[0756] Step 3:

[0757] The server sends the received text information to a multilingual translation service, where it is translated into the specified language. The translation service used utilizes a cloud-based API. The translated text information is then output and returned from the server to the terminal.

[0758] Step 4:

[0759] The device receives the translated text information and presents it to the user. Presentation methods include displaying the text on the screen or playing it back as audio using speech synthesis technology. The user confirms it and continues communication. The translation result is then output to the user.

[0760] Step 5:

[0761] The user selects the travel planning function and enters the places they want to visit and the activities they are interested in into the device. The device sends this information to the server. The entered information includes the user's travel destination and hobbies.

[0762] Step 6:

[0763] The server generates appropriate travel plans using a generative AI model based on the input information. This involves analyzing the user's past history and current preferences to output a personalized plan.

[0764] Step 7:

[0765] The server sends the generated travel plan to the terminal and proposes it to the user. The terminal displays this plan as a list on the screen, allowing the user to select and decide. This helps the user actually put their travel plan into action.

[0766] Step 8:

[0767] If the emergency response function is selected, the server monitors travel-related information in real time and immediately notifies the user if an anomaly is detected. This includes information on weather and transportation delays. The notification will include suggestions for alternative routes and accommodations. The server uses an algorithm to identify anomaly information in order to detect it.

[0768] (Application Example 1)

[0769] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0770] When travelers visit different cultural regions, language barriers and a lack of transportation information can make smooth travel and effective communication difficult. Furthermore, it can be challenging to respond appropriately in emergencies, often causing anxiety for travelers. Therefore, improving the quality of travel is essential.

[0771] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0772] In this invention, the server includes means for converting voice input to text in real time, means equipped with a translation device for translating into multiple languages, and means for displaying or outputting the translated information on a user interface. This enables smooth communication in different cultural contexts and facilitates travel within cities visited by tourists.

[0773] "A means of converting voice input to text in real time" refers to a technology that instantly converts voice data into text information, making it available for subsequent processing.

[0774] "Means equipped with a translation device" refers to a technical configuration that has the function of converting text from one language to another language.

[0775] "Means of displaying or outputting audio on a user interface" refers to technical methods for providing information to users visually or aurally.

[0776] "Means for analyzing travelers' historical information and generating personalized travel plans" refers to technology for creating optimal travel plans for each traveler based on past behavioral data.

[0777] "A means of detecting travel-related emergencies early using anomaly detection algorithms and proposing alternative solutions" refers to analytical techniques for quickly providing countermeasures in response to unexpected situations.

[0778] "Means of acquiring public transport information within a city and translating and presenting it in the user's native language" refers to a method of collecting data on public transport in a city, converting it into a language understandable to the user, and providing it to the user.

[0779] The system for realizing this invention consists of a smartphone, a server, and network communication. Users can obtain various information through voice input using a dedicated application on their smartphone. This application uses speech recognition technology to convert voice data into text data in real time. Typically, this process uses speech recognition software on the smartphone (e.g., Google Speech-to-Text API).

[0780] The converted text data is sent to a server via the internet. The server is equipped with a translation device (e.g., Google Translate API) to translate the text into various languages, converts the text to the user's desired language, and sends it to the smartphone. This information is either displayed visually on the smartphone's user interface or provided as audio through a voice output device.

[0781] Furthermore, this system analyzes data, including the user's travel history, on a server. Using machine learning algorithms, it generates personalized travel plans and presents the most suitable options. It can also collect public transport data within cities in real time, translate it into the user's native language, and present it to them. This allows users to navigate smoothly within their destination cities.

[0782] Furthermore, the server uses anomaly detection algorithms to promptly detect various emergencies related to travelers and suggest alternative solutions as needed. This requires the acquisition and analysis of real-time data such as traffic and weather information, enabling travelers to respond flexibly.

[0783] For example, when a user uses public transport in a city they are visiting for the first time, scanning the QR code at a bus stop with their smartphone will display information on available transportation options and scheduled times in their native language. Furthermore, if their travel schedule changes, the system will suggest alternative tourist destinations and accommodations.

[0784] An example of a prompt message is, "Please suggest activities that the user would like to do while sightseeing in Tokyo. The user is interested in art."

[0785] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0786] Step 1:

[0787] The user launches the smartphone app and begins voice input. The app receives voice data as input and converts it into text data using speech recognition software on the smartphone. This data conversion generates text extracted from the voice.

[0788] Step 2:

[0789] The terminal sends the converted text data to the server via the internet. The server receives the text data as input and translates it into the specified language using its translation device. As part of the data processing, a translation API is used to generate organized text data.

[0790] Step 3:

[0791] The server sends the translated text back to the terminal. The terminal receives this translation and either displays it on the user interface or plays it back as audio through a speech output device. This allows the user to obtain information in their native language.

[0792] Step 4:

[0793] The user provides past travel history information, which is sent from the device to the server. The server, receiving the history data as input, uses a machine learning algorithm to generate a personalized travel plan and sends the result to the device. This allows the user to be offered a travel plan that is suitable for them.

[0794] Step 5:

[0795] The device scans a QR code containing public transport information obtained within the city. It receives the QR code information as input and sends that data to a server. The server retrieves real-time traffic information, translates it into the specified language, and then sends it back to the device. This allows the user to obtain information that will help them travel smoothly in their destination.

[0796] Step 6:

[0797] The server uses an anomaly detection algorithm to detect travel-related emergencies early. It receives current travel status and weather information as input and detects anomalies through data calculations. As a result, it generates necessary alternatives and sends them to the terminal. This allows users to continue their travels smoothly even in emergencies.

[0798] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0799] This invention provides a system incorporating an emotion engine to optimize the user experience during travel. In addition to real-time speech-to-text conversion and multilingual translation, the system recognizes the user's emotions and provides personalized suggestions based on them. Furthermore, in the event of an emergency, it provides support adapted to the user's emotional state.

[0800] First, the user launches the app on their device. The application includes translation, travel planning, and emergency response functions, which the user can freely choose from.

[0801] For example, in a language translation function with an integrated emotion engine, when a user communicates a message via voice input, the device converts this voice into text and sends it to the server. The server translates the text and also performs emotion analysis, adjusting the translation result to reflect the user's emotions. For instance, if the user is feeling anxious, the translation result might be rephrased to be more polite and reassuring.

[0802] Next, in the travel planning function, the device sends data to the server based on the user's input. The server uses machine learning algorithms to analyze the user's preferences and history. Furthermore, an emotion engine takes the user's emotional state into consideration, and an optimal travel plan tailored to those emotions is generated and sent back. For example, if the emotion of wanting to relax is detected, suggestions will focus on activities that will help the user refresh.

[0803] In the emergency response function, the server monitors real-time data and notifies the terminal if an anomaly is detected. At this time, the emotion engine evaluates the user's stress level and suggests countermeasures appropriate to the user's state. For example, if the user is extremely stressed, it will help them clearly explain the situation, along with providing instructions on how to quickly contact the support center.

[0804] In this way, by utilizing the emotion engine, users can enjoy a more comfortable and safer travel experience.

[0805] The following describes the processing flow.

[0806] Step 1:

[0807] The user launches an application with an integrated emotion engine on their smartphone. Options for translation, travel planning, and emergency response are displayed on the home screen.

[0808] Step 2:

[0809] The user selects the translation function and sets the language they want to translate into and their preference for sentiment analysis.

[0810] Step 3:

[0811] The user taps the microphone button and inputs the text they want to translate by voice.

[0812] Step 4:

[0813] The device records the audio as data and converts it into text using a speech recognition engine.

[0814] Step 5:

[0815] The terminal sends the converted text data to the server.

[0816] Step 6:

[0817] The server uses a translation device to translate text into multiple specified languages, while simultaneously analyzing the user's emotions through an emotion engine.

[0818] Step 7:

[0819] Based on the analysis results and translated text, the server adjusts the translation results according to the user's sentiment and corrects them to more appropriate expressions.

[0820] Step 8:

[0821] The server sends the edited text to the terminal.

[0822] Step 9:

[0823] The terminal displays the received translation results on the user interface or outputs them as audio to the user.

[0824] Step 10:

[0825] The user selects the travel planning function and enters their destination and interests into the device.

[0826] Step 11:

[0827] The terminal sends user information and input data to the server and requests analysis in conjunction with the emotion engine.

[0828] Step 12:

[0829] Based on the received data, the server uses machine learning to generate travel plans that are based on history and emotional state.

[0830] Step 13:

[0831] The server sends the generated travel plan to the terminal and adjusts the display order and plan based on the user's emotions.

[0832] Step 14:

[0833] In the event of an emergency, the server analyzes real-time data to detect anomalies.

[0834] Step 15:

[0835] The server considers the user's emotions, generates the optimal response through an emotion engine, and sends it to the terminal.

[0836] Step 16:

[0837] The device notifies the user of the generated information and provides emotionally responsive support and suggestions for the next course of action.

[0838] (Example 2)

[0839] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0840] There is a need to eliminate the inconveniences travelers face due to language barriers and sudden schedule changes, and to provide a more comfortable and personalized travel experience. Furthermore, when responding to emergencies, flexible responses that take into account the user's psychological state are crucial. However, current technology has not been able to comprehensively address these challenges.

[0841] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0842] In this invention, the server includes means for converting voice signals into digital text information, means for providing a device for converting the digital text information into various languages, and means for analyzing the user's psychological state and making emotion-based adjustments to the generated plan and converted information. This provides users with a sense of security that transcends language barriers, and enables a personalized travel experience and flexible emergency response that takes into account the user's psychological state.

[0843] An "audio signal" is a signal obtained by converting sound into an electrical signal, making it usable for digital processing and communication.

[0844] "Digital text information" refers to information such as audio that has been converted into text, and is in a format that can be processed and displayed by a computer.

[0845] A "conversion device" is a device or software used to convert data in one format to data in another format.

[0846] A "display device" is a device used to visually display information in digital format, and includes screens and monitors.

[0847] "Users" refer to individuals who utilize this system and wish to have a comfortable travel experience.

[0848] "History information" refers to recorded data about a user's past actions and choices, and is used to provide personalized services.

[0849] An "anomaly detection algorithm" is a mathematical or computational method for detecting phenomena that deviate from the normal state.

[0850] "Psychological state" refers to the user's emotions and mental condition, and is a factor that influences the services provided by the system.

[0851] "Flexible response" refers to adaptive actions and measures that can be changed according to the user's condition and circumstances.

[0852] This invention is a system that reduces language barriers for travelers and provides an individually optimized travel experience. The system operates through an application installed on the user's device during travel. Specific embodiments of the system are described below.

[0853] First, the user launches the application on a mobile device or tablet. This application includes key functions such as voice input, language conversion, travel planning, and emergency response.

[0854] The device utilizes speech recognition software (such as the Google Speech-to-Text API) to convert audio signals into digital text. This allows for real-time conversion of user speech into text. The converted text is then sent from the device to the server.

[0855] The server utilizes a language conversion device (e.g., DeepL API) to convert textual information into a defined language. It also employs sentiment analysis software (e.g., Google Cloud Natural Language API) to recognize and analyze user emotions. Based on the results of this sentiment analysis, the server adjusts the converted language information to match the user's emotional state.

[0856] In the travel planning function, one of the key features, the server analyzes data using a machine learning algorithm that has learned the user's preferences based on past travel history, and generates a personalized travel plan. This plan takes into account the user's emotional state; for example, if the user requests relaxation, it will suggest activities such as relaxation facilities.

[0857] In emergency situations, the server monitors various types of information in real time. When an abnormal situation is detected, it quickly sends a notification to the terminal, assesses the user's mental state, and then suggests appropriate countermeasures.

[0858] A concrete example of this prompt is: "Explain the best translated response for when a user feels anxious while traveling, and how that response improves the user's feelings."

[0859] This enables a system that combines language translation, personalized travel planning, and consideration of psychological state to provide a more comfortable and fulfilling travel experience. In this embodiment, the system overcomes language barriers in real time, reduces traveler stress, and enables safer travel.

[0860] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0861] Step 1:

[0862] The user launches the application on the device. The device provides interfaces for voice input, language conversion, travel planning, and emergency response. The user selects the necessary function. Based on this input, the application proceeds to the next processing step.

[0863] Step 2:

[0864] If the user selects voice input, the device uses its microphone to collect the user's voice. This voice signal is treated as input and converted into text information using speech recognition software. Specifically, the Google Speech-to-Text API is used to convert the voice signal into text data. This converted text data is then generated as output and sent to the server.

[0865] Step 3:

[0866] The server takes text data received from the terminal as input and converts it to the specified language using the DeepL API. The converted text is then used with the Google Cloud Natural Language API to perform sentiment analysis. Based on the analysis results, the text is adjusted to reflect emotions, and the final translation is generated as output.

[0867] Step 4:

[0868] The terminal receives the final translation result sent from the server. This translation result is either displayed to the user visually or output via speech synthesis. The user can then use this translation to communicate.

[0869] Step 5:

[0870] If a user selects the travel planning function, the device collects travel-related preferences and history from the user as input. This data is sent to a server for analysis using machine learning algorithms. Based on the analysis results, a personalized travel plan tailored to the user's preferences and emotions is generated and provided as output.

[0871] Step 6:

[0872] The server monitors for emergencies during travel using real-time data transmitted from the terminal. It utilizes an anomaly detection algorithm, receiving anomalies as input and generating appropriate response suggestions as output. These suggestions take into account the user's current psychological state.

[0873] Step 7:

[0874] The terminal receives and displays emergency notifications and appropriate response suggestions from the server to the user. This allows the user to quickly implement the response. Throughout this entire process, the system can respond to user prompts through a generated AI model.

[0875] (Application Example 2)

[0876] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0877] Current travel support systems often fail to provide a sense of security or satisfaction because they deliver information without considering the user's emotional state. Furthermore, language barriers and emergency situations can lead to inappropriate information being provided, potentially causing stress. There is a need to address these issues and provide a more personalized travel experience.

[0878] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0879] In this invention, the server includes means for converting voice input into text in real time, means for recognizing emotional states and adjusting information based on those emotions, and means for providing emotionally tailored information in public transportation and tourist spots. This makes it possible to provide detailed information that responds to the user's emotions.

[0880] "A means of converting voice input to text in real time" refers to a technology that has the function of instantly converting the voice spoken by the user into text data.

[0881] "Means equipped with a translation device for translating into multiple languages" refers to a technology that has the ability to instantly translate text data into different specified languages.

[0882] "Means of displaying or outputting audio on a user interface" refers to technologies that have the function of conveying translated text to the user via a screen or speaker.

[0883] "Methods for analyzing travelers' historical information and generating personalized travel plans" refers to technologies that analyze data on travelers' past behavior and preferences and create customized travel plans based on that data.

[0884] "A means of detecting travel-related emergencies early using anomaly detection algorithms and proposing alternative solutions" refers to a technology that quickly identifies potential risks and anomalies during travel and proposes appropriate countermeasures.

[0885] "Means of recognizing emotional states and adjusting information based on those emotions" refers to technologies that analyze the user's emotions and optimize the content and expression of information based on the results.

[0886] "Means of providing information tailored to emotions in public transportation and tourist attractions" refers to technologies that provide information related to public transportation and tourist attractions in a way that is appropriate to the user's emotions.

[0887] This invention is a system that personalizes the user's travel experience, making it safer and more comfortable, through an application running on a smart device. The main elements for realizing this system and their operation are described below.

[0888] First, the user launches an application installed on their smart device. This application has a function to convert speech to text in real time, and uses the Google Cloud Speech-to-Text API to enable fast and accurate conversion. The converted text data is then translated into multiple languages ​​by the Google Cloud Translation API. At the same time, IBM Watson Tone Analyzer is used to analyze the user's emotional state from the speech content. The results of this emotion analysis are used to adjust the translation results and optimize the information provided.

[0889] A serverless architecture using AWS Lambda is employed to generate appropriate travel plans and information based on users' past travel data and current emotional states. This enables the personalization of travel plans and allows for personalized suggestions using user history information recorded in Amazon DynamoDB. Information on public transportation and tourist attractions is provided to users based on real-time data, and is particularly customized based on emotional responses.

[0890] For example, if a user feels anxious while riding the subway, the app recognizes that emotion and immediately displays a follow-up message such as, "A recommended tourist spot where you can relax is XX. Enjoy yourself." An example of a prompt to the generating AI model in this case would be, "Based on the user's current emotional state, please provide reassuring information."

[0891] In this way, this system combines speech recognition, translation, and sentiment analysis to provide comprehensive support for users' travel experiences.

[0892] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0893] Step 1:

[0894] When a user launches the application on their smart device, it is ready to begin voice input. Once the user's voice data is input, the device calls the Google Cloud Speech-to-Text API to convert the voice data into text. The converted text is then output to the device.

[0895] Step 2:

[0896] The text data obtained from the device is sent to the server. The server uses the Google Cloud Translation API to translate the text data into the specified languages. The translated results are sent back from the server to the device, and the device displays the translated text to the user.

[0897] Step 3:

[0898] Simultaneously, the server uses IBM Watson Tone Analyzer to analyze the user's emotional state from the text data. The results of the emotional analysis are stored on the server and become input data for the next information delivery. Based on the analysis results, the translated information is adjusted to be more user-friendly.

[0899] Step 4:

[0900] The server retrieves user history information from Amazon DynamoDB and performs analysis using AWS Lambda. This generates personalized travel plans based on the retrieved emotional states and past history data. Machine learning algorithms are used in this planning process, and the generated plans take into account the user's browsing history and emotional states.

[0901] Step 5:

[0902] The terminal receives a personalized travel plan sent back from the server and provides the user with information on relaxing tourist spots and other relevant details. A specific suggestion might be displayed, such as, "We recommend [○○] as a relaxing tourist destination. Enjoy your visit!" The generative AI model utilizes prompts to provide such personalized suggestions.

[0903] Step 6:

[0904] When a user requests new information or their emotional state changes, the application restarts the cycle and takes appropriate action based on the updated data. Each step is seamlessly repeated to enable real-time responses.

[0905] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0906] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0907] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0908] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0909] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0910] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0911] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0912] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0913] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0914] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0915] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0916] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0917] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0918] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0919] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0920] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0921] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0922] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0923] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0924] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0925] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0926] The following is further disclosed regarding the embodiments described above.

[0927] (Claim 1)

[0928] A means of converting voice input to text in real time,

[0929] A means comprising a translation device for translating the aforementioned text into multiple languages,

[0930] Means for displaying or outputting the translated text as audio on the user interface,

[0931] A means of analyzing travelers' historical data to generate personalized travel plans,

[0932] A means of detecting travel-related emergencies early using an anomaly detection algorithm and proposing alternative solutions,

[0933] A system that includes this.

[0934] (Claim 2)

[0935] The system according to claim 1, comprising means for adjusting the output of the translated text based on the user's preferences.

[0936] (Claim 3)

[0937] The system according to claim 1, comprising means for enabling flexible responses to emergencies during travel using real-time data.

[0938] "Example 1"

[0939] (Claim 1)

[0940] A device that converts audio information into text information,

[0941] A conversion device that translates text information into multiple natural languages,

[0942] A device that displays or provides translated text information via audio on a user interface,

[0943] A processing device that analyzes a traveler's history information and generates an individualized travel plan,

[0944] A processing unit that uses an algorithm to detect anomaly information to detect travel-related emergencies early and presents alternative plans,

[0945] A processing unit that generates an optimal response from input information using a generation algorithm,

[0946] A system that includes this.

[0947] (Claim 2)

[0948] The system according to claim 1, comprising a processing device that adjusts the presentation of translated text information based on user preferences.

[0949] (Claim 3)

[0950] The system according to claim 1, comprising a processing device that enables flexible responses to emergencies during travel using dynamic information.

[0951] "Application Example 1"

[0952] (Claim 1)

[0953] A means of converting voice input to text in real time,

[0954] A means comprising a translation device for translating the aforementioned text into multiple languages,

[0955] Means for displaying or outputting the translated text as audio on a user interface,

[0956] A means of analyzing travelers' historical information and generating personalized travel plans,

[0957] A means of detecting travel-related emergencies early using an anomaly detection algorithm and proposing alternative solutions,

[0958] A means of obtaining public transport information within a city, translating it into the user's native language, and presenting it to them.

[0959] A system that includes this.

[0960] (Claim 2)

[0961] The system according to claim 1, further comprising means for adjusting the output of translated information based on the user's preferences.

[0962] (Claim 3)

[0963] The system according to claim 1, comprising means for enabling flexible responses to emergencies during travel using real-time information.

[0964] "Example 2 of combining an emotion engine"

[0965] (Claim 1)

[0966] A means of converting audio signals into digital text information,

[0967] Means equipped with a device for converting the aforementioned digital character information into various languages,

[0968] Means for displaying the converted character information on a display device or outputting it as audio,

[0969] A means for analyzing user history information and generating personalized travel plans,

[0970] A means of detecting travel-related emergencies early using anomaly detection algorithms and proposing alternative solutions,

[0971] A means of analyzing the user's psychological state and making emotion-based adjustments to the generated plans and transformed information,

[0972] A system that includes this.

[0973] (Claim 2)

[0974] The system according to claim 1, further comprising means for adjusting the output of converted character information based on the user's preferences and emotions.

[0975] (Claim 3)

[0976] The system according to claim 1, comprising means for enabling flexible responses to emergencies during travel using real-time information and providing support that takes into account the user's psychological state.

[0977] "Application example 2 when combining with an emotional engine"

[0978] (Claim 1)

[0979] A means of converting voice input to text in real time,

[0980] A means comprising a translation device for translating the aforementioned text into multiple languages,

[0981] Means for displaying or outputting the translated text as audio on the user interface,

[0982] A means of analyzing travelers' historical information and generating personalized travel plans,

[0983] A means of detecting travel-related emergencies early using an anomaly detection algorithm and proposing alternative solutions,

[0984] A means of recognizing emotional states and adjusting information based on those emotions,

[0985] Means of providing information tailored to emotions in public transportation and tourist attractions,

[0986] A system that includes this.

[0987] (Claim 2)

[0988] The system according to claim 1, comprising means for adjusting the output of translated text based on user preferences and enabling the provision of information according to emotional state.

[0989] (Claim 3)

[0990] The system according to claim 1, which includes means for flexibly responding to emergencies during travel using real-time information and providing a sense of security tailored to the user's emotional state. [Explanation of Symbols]

[0991] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of converting voice input to text in real time, A means comprising a translation device for translating the aforementioned text into multiple languages, Means for displaying or outputting the translated text as audio on the user interface, A means of analyzing travelers' historical data to generate personalized travel plans, A means of detecting travel-related emergencies early using an anomaly detection algorithm and proposing alternative solutions, A system that includes this.

2. The system according to claim 1, further comprising means for adjusting the output of the translated text based on the user's preferences.

3. The system according to claim 1, comprising means for enabling flexible responses to emergencies during travel using real-time data.