system
The assistant program addresses inefficiencies in condominium management by using natural language processing and translation to enhance communication and data management, improving resident satisfaction and asset value.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
Smart Images

Figure 2026101236000001_ABST
Abstract
Description
Technical Field
[0004] , ,
[0005] , , ,
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In conventional condominium management, it is difficult to operate the condominium continuously and efficiently because the residents lack knowledge about management. In addition, as the number of multinational residents is increasing, multilingual communication is required, but it takes a great deal of labor to do this manually. Furthermore, a lot of time and labor are spent on creating the minutes of the board meetings and data management, and it is difficult to inherit know-how. Due to these problems, the satisfaction of the residents may decrease, and the asset value of the entire condominium may also be affected.
Means for Solving the Problems
[0005] This invention solves these problems by providing an assistant program specifically for condominium management and building a support system for residents and board members. It uses natural language processing to analyze user inquiries and generate appropriate responses, enabling immediate responses to residents' questions and opinions. Furthermore, a translation system translates responses into the user's native language, facilitating smooth communication among residents internationally. By generating and saving meeting minutes using speech recognition and data organization capabilities, it promotes efficiency in board operations, facilitates data accumulation, and makes it easier to pass on know-how. This improves resident satisfaction and contributes to increasing the overall asset value of the condominium.
[0006] The "Assistant Program" is software designed to support condominium management by providing information and communication assistance to residents and board members.
[0007] "Natural language processing means" refers to technology that analyzes input text in natural language form, understands its content, and generates an appropriate response.
[0008] A "translation tool" is a technology that has the function of converting text written in one language into another language, enabling communication between multiple languages.
[0009] An "output means" is an interface for providing processed information or responses to the user visually or audibly.
[0010] A "learning tool" is a technology that analyzes a user's past interaction data and executes a process to improve the generation of responses in the future.
[0011] A "database" is a collection of information that stores information about management organizations and records of residents' activities, making it possible to search and use this information as needed.
[0012] "Speech recognition" is a technology that analyzes a user's speech and converts its content into text.
[0013] The "data organization function" is a function that classifies and organizes acquired information to enable efficient searching and use. [Brief explanation of the drawing]
[0014] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13]It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when combined with an emotion engine. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when combined with an emotion engine.
Mode for Carrying Out the Invention
[0015] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0016] First, the terms used in the following description will be explained.
[0017] In the following embodiments, a labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0018] In the following embodiments, a labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0019] In the following embodiments, a labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0020] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0022] [First Embodiment]
[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0035] The present invention's apartment management assistant system consists of a server, terminals, and users. This system uses an assistant program specifically designed for apartment management to provide information and support to residents and board members, thereby streamlining apartment management and facilitating communication.
[0036] The server manages a database containing apartment-specific rules and know-how, and is equipped with natural language processing technology to analyze each user's inquiry. It processes incoming inquiries and generates appropriate responses. It then provides a translation function to translate those responses into the user's native language. The translated responses are delivered to the user via their terminal, allowing the user to make decisions based on that information.
[0037] The terminal provides a user interface and sends user inquiries to the server via voice or text input. This allows users to easily utilize the assistant system. The terminal also receives information from the server and presents it to the user visually or audibly. This functionality enables smooth communication among residents of various nationalities.
[0038] Users can check daily management information, view board meeting minutes, and ask questions about changes to condominium rules. For example, if a user asks a question to the terminal such as, "Please tell me about the condominium's garbage disposal rules," the server immediately analyzes the content, searches the database for relevant information, translates it as needed, and provides it to the user. In this way, this system automates and streamlines many of the tasks necessary for condominium management, and acts as an aid to make residents' lives more comfortable.
[0039] The following describes the processing flow.
[0040] Step 1:
[0041] Users enter their inquiries via voice or text through their device. For example, they might ask, "What is the date of the next board meeting?"
[0042] Step 2:
[0043] If the device receives voice input, it uses speech recognition to convert the audio into text data. Then, it sends the user's inquiry to the server.
[0044] Step 3:
[0045] The server analyzes the received text data using a natural language processing engine to identify the intent of the inquiry. In this case, it understands that the user is asking about the board meeting schedule.
[0046] Step 4:
[0047] The server searches the apartment building database to retrieve the date of the next board meeting. During this process, it references the user's account information to identify the apartment building in which the user resides.
[0048] Step 5:
[0049] The server translates the search results into the user's native language. For example, for a user whose native language is Japanese, the server will generate a response in Japanese.
[0050] Step 6:
[0051] The server sends the translated response to the terminal.
[0052] Step 7:
[0053] The device displays the information it receives on the screen and, if necessary, uses speech synthesis technology to notify the user via voice. This allows the user to receive necessary information instantly through the device.
[0054] Step 8:
[0055] The server stores user inquiries and responses in a database and uses them as training data to improve the accuracy of responses to future inquiries.
[0056] (Example 1)
[0057] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0058] In condominium management, it is essential that residents and board members can quickly and efficiently obtain necessary information and communicate smoothly among residents of various nationalities. However, language barriers and the lack of timeliness of information are obstacles to achieving these goals. Furthermore, streamlining management operations while promoting resident involvement is also a crucial challenge.
[0059] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0060] In this invention, the server includes language processing means for analyzing user inquiries and generating appropriate responses, translation means for translating the responses into the user's native language, and output means for providing the responses via a user interface. This enables rapid information provision in apartment management and smooth communication between multiple languages.
[0061] "Language processing means" refers to technology that has the function of analyzing natural language queries input by users and generating appropriate responses.
[0062] "Translation means" refers to technology that converts the generated response into the user's native language, and is a means of providing information that transcends language differences.
[0063] "Output means" refers to technology for presenting translated responses visually or audibly through a user interface.
[0064] A "learning method" is a method that uses machine learning techniques to accumulate data on users' past interactions and improve responses to future inquiries.
[0065] "Information storage means" refers to a means of storing information of a management organization in a database and providing users with access to that information when needed.
[0066] "Generation means" refers to a technology that uses a generation AI model to acquire and utilize necessary information based on prompt text.
[0067] An "input conversion means" is a technology that accurately converts voice input from a user into text data.
[0068] A "speech generation means" is a device that has the function of outputting text data as speech using speech synthesis technology.
[0069] The apartment building management assistant system consists of a server, terminals, and users. The server has language processing capabilities that incorporate natural language processing technology to analyze user inquiries. This system utilizes advanced databases and machine learning technologies to provide fast and accurate responses to various inquiries.
[0070] Specifically, the server utilizes cloud services, and common cloud-based APIs are used for natural language processing. For example, natural language processing APIs can be used for language processing, and online translation services can be used for translation. The terminal also provides a user interface and uses speech recognition technology to convert the user's voice input into text. Speech recognition software can be used for this conversion. The terminal also utilizes speech synthesis technology to provide information received from the server visually or audibly.
[0071] This system allows users to inquire about a variety of management-related information. For example, by entering a prompt such as "Please tell me the date and time of the next board meeting," users can instantly obtain the necessary information. By using a generative AI model, this system helps users obtain information clearly and specifically, thereby streamlining the management of Mandala Department information.
[0072] For example, if a user tells their terminal, "Please tell me the latest rules regarding pet ownership," the server instantly searches for relevant data and provides the necessary information in the user's native language. In this way, smooth information sharing and communication are possible even in a diverse international living environment. This system contributes to increased efficiency in management operations and improved quality of life for both apartment building managers and residents.
[0073] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0074] Step 1:
[0075] The user enters their question through their device. Voice input is converted into text data using speech recognition technology. This process prepares the user's voice and input text for transmission to the server.
[0076] Step 2:
[0077] The terminal receives user inquiry data via voice or text and sends it to the server through a secure protocol. The input is the user's inquiry text, and the output is the data sent to the server.
[0078] Step 3:
[0079] The server receives user queries and analyzes the intent of those queries using natural language processing techniques. This analysis specifically includes extracting important keywords and meanings. The input is the user's text data, and the output is the analyzed data.
[0080] Step 4:
[0081] The server searches the database for relevant information based on the analysis results. This search utilizes a generative AI model to enhance the database query with prompt statements. The input is the analyzed keywords, and the output is the relevant data.
[0082] Step 5:
[0083] The server constructs a response to the user's inquiry based on the search results. The constructed response is automatically translated into the user's native language. The input is information from the database, and the output is the translated response data.
[0084] Step 6:
[0085] The server sends the final response to the terminal. The terminal presents the received translated response to the user. This presentation is done via screen display or voice output using speech synthesis technology. The input is the response data from the server, and the output is the information provided to the user.
[0086] (Application Example 1)
[0087] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0088] In urban life, with the increasing number of multinational residents, there is a need for efficient apartment management and smooth communication. However, current technology does not adequately connect these aspects, and there are challenges in establishing sufficient means of providing information that accommodates diverse languages and cultures. In particular, real-time and multilingual support are currently insufficient when it comes to understanding the usage status of public facilities and providing event information.
[0089] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0090] In this invention, the server includes natural language processing means, translation means, and output means for an assistant program specifically designed for apartment building management. This makes it possible to analyze user inquiries, generate appropriate responses, and provide those responses translated into the user's native language. Furthermore, by adding information acquisition means for providing real-time information on the usage status of public facilities and information provision means for providing event information to users, it becomes possible to facilitate communication among residents and enable efficient use of urban functions.
[0091] An "assistant program" is software designed to respond to user inquiries and provide information fairly and efficiently.
[0092] "Natural language processing means" refers to technologies for understanding and analyzing inquiries from users in the form of text or speech.
[0093] A "translation tool" is a function that converts the analyzed response into a language that is easy for the user to understand.
[0094] "Output means" refers to a device or interface for providing the generated response to the user visually or audibly.
[0095] A "learning method" is an algorithm that analyzes past user interaction data and uses that information to improve the accuracy of future responses.
[0096] A "database" is an information management system that stores information related to apartment management and user inquiries, and makes it accessible as needed.
[0097] "Information acquisition means" refers to technologies for collecting and managing information on the usage status of public facilities and other related information in real time.
[0098] An "information provision method" is a system that provides users with information on city events and other related information based on collected data.
[0099] "Speech recognition" is a technology that analyzes audio from meetings or users and converts it into text.
[0100] The "data organization function" is a function that structures acquired data and uses it for storage and analysis.
[0101] As a form for carrying out the invention, this system mainly consists of a server, a terminal, and a user. The server is equipped with natural language processing technology and analyzes user inquiries. Specifically, it converts inquiries sent by the user via voice or text into text using the Google® Cloud Speech-to-Text API. Then, it uses AWS® Lambda to generate an appropriate response based on the analyzed content.
[0102] The server retrieves necessary information from the apartment management database and determines what to provide to the user. This information is translated into the required language using the Microsoft® Translator Text API for multilingual support. The final response is sent to the user's device and output as either audio or text.
[0103] Users interact with this system via devices such as smartphones and tablets. These devices provide the user interface, accept voice and text input, and send queries to the server. The resulting information is presented to the user visually or audibly.
[0104] For example, if a user enters the question "Tell me about next week's events" into their device, the server will analyze the request and send a prompt message like the following to the AI model to retrieve relevant event information.
[0105] Example prompt: "Please provide information on community events in this area over the next week."
[0106] This embodiment allows users, regardless of their nationality, to easily access and utilize resources and event information in the city where they reside.
[0107] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0108] Step 1:
[0109] Users input questions via voice or text into a device such as a smartphone. This input is converted into text data using the device's speech recognition software. If voice input is available, the Google Cloud Speech-to-Text API is used to ensure accurate text conversion.
[0110] Step 2:
[0111] The terminal sends the converted text to the server. A secure communication protocol over the internet is used for this data transmission. The server analyzes the received text data and uses natural language processing techniques to understand what the query is.
[0112] Step 3:
[0113] The server uses AWS Lambda to parse text data and retrieve the necessary information from the database. For example, it might search a condominium management database to retrieve public event information. At this stage, the response content is determined according to the user's request.
[0114] Step 4:
[0115] The server generates a response based on the acquired information and, if necessary, performs multilingual translation using the Microsoft Translator Text API. The languages to be translated are obtained from the user's configuration information.
[0116] Step 5:
[0117] The server sends the final response to the terminal. The terminal presents the received response to the user visually or audibly. This output is done through text display or speech output using speech synthesis software.
[0118] Step 6:
[0119] Based on the information obtained, the user decides on their next action. The user interaction data obtained during this process is used to improve future response accuracy through the system's continuous learning mechanisms.
[0120] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0121] This invention provides a more personalized user experience by combining an emotion engine with an assistant system specifically designed for apartment building management. This system functions through the coordinated interaction of servers, terminals, and users.
[0122] The server has a natural language processing engine that analyzes user inquiries, analyzing the received content and generating appropriate responses. Furthermore, it uses an emotion engine to identify emotions when analyzing user input. This emotion information is stored in a database and used to improve future responses. The server also translates the generated responses into the user's native language and adjusts the content appropriately to provide personalized feedback.
[0123] The terminal functions as a user interface, receiving input from the user. For example, if a user makes an inquiry accompanied by emotion, such as "I'm tired today, so I'd like to finish this quickly," the terminal sends that voice or text to the server. The terminal also receives a response from the server, displays it on the screen, and uses speech synthesis to convey the response to the user verbally. Simultaneously, it is possible to adjust the tone of the interface and the information presented according to the user's emotional state.
[0124] By making emotionally charged inquiries, users can receive more personalized responses from the system. For example, if a user inquiring about the schedule of a board meeting expresses an emotion such as "I'm worried," the server will recognize this emotion and provide additional information, such as advice on the meeting's content and preparation. In this way, the system, equipped with an emotion engine, provides user-centric support and facilitates reliable condominium management.
[0125] The following describes the processing flow.
[0126] Step 1:
[0127] Users enter their inquiries via voice or text through their device. For example, they might ask, "I want to know the date of the next board meeting, but I'm worried."
[0128] Step 2:
[0129] If the device receives voice input, it uses speech recognition to convert the voice data into text. Then, it sends the user's inquiry to the server.
[0130] Step 3:
[0131] The server analyzes the received text data using a natural language processing engine to identify the intent of the inquiry and the user's emotional state. In this case, it understands that the user is asking about the board meeting schedule and that they are feeling "worried."
[0132] Step 4:
[0133] The server uses an emotion engine to identify emotion information, which is then stored in a database and used to generate responses, as described below. Emotion data is continuously accumulated to accommodate future inquiries.
[0134] Step 5:
[0135] The server searches the apartment building's database to retrieve the date of the next board meeting. During this process, it considers the user's emotional state and generates a response that includes a summary of the meeting content and advice on preparation.
[0136] Step 6:
[0137] The server-generated response is translated into the user's native language, preparing a more personalized result.
[0138] Step 7:
[0139] The server sends the translated response to the terminal.
[0140] Step 8:
[0141] The device displays the information it receives on the screen and, if necessary, communicates it to the user via voice using speech synthesis technology. Simultaneously, it adds supplementary information tailored to the user's emotions and reassuring messages.
[0142] Step 9:
[0143] The server records the data from this entire interaction in the system and uses it as training data to improve response accuracy in future interactions.
[0144] (Example 2)
[0145] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0146] In condominium management, there is a need to respond quickly and appropriately to a wide range of inquiries from residents. However, conventional systems have difficulty considering the feelings of residents, and language barriers exist when dealing with multinational residents, resulting in challenges to the efficiency and accuracy of communication. Furthermore, there is a lack of mechanisms for accurately recording and organizing important information such as meetings. Therefore, a new system is needed to support user-friendly and reliable condominium management.
[0147] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0148] In this invention, the server includes information processing means for analyzing user inquiries and generating appropriate responses, information conversion means for translating the responses into the user's native language, and adaptive learning means for accumulating and analyzing the user's past interaction data and emotional data to improve future responses. This enables personalized responses that take into account the user's emotions, and further facilitates smooth communication among multinational users and accurate recording and organization of information.
[0149] "Information processing means" refers to a device or program that has the function of analyzing input data and generating a response according to a specific purpose.
[0150] "Information conversion means" refers to technologies or devices for converting information output in one language into another language.
[0151] An "adaptive learning tool" is a device or program that has the ability to learn by analyzing past data and interactions to improve the quality of responses in subsequent instances.
[0152] An "information storage device" is a device or program that securely stores organizational information and user information for management purposes and retrieves that data as needed.
[0153] "Emotion analysis means" refers to a technology or device that identifies a user's emotional state based on input data and reflects that state in the response.
[0154] "Output assistance means" refers to technologies or devices that provide assistance functions for effectively conveying information to users through sight or hearing.
[0155] The "acoustic recognition and information organization function" is a function that converts audio data into text or other formats, and efficiently organizes and stores the information.
[0156] This invention is an assistant system specifically designed for apartment building management, with the entire process realized through the cooperation of a server, terminals, and users. The following describes each component and its role in detail.
[0157] server
[0158] The server has information processing capabilities to analyze inquiries received from users. Specifically, it performs analysis using natural language processing (NLP). Here, it utilizes machine learning libraries such as TENSORFLOW® and PyTorch, which are built in the Python language. Furthermore, as an information conversion method, it uses an open translation API to translate the generated response into the user's native language. This API uses a service that provides general translation technology.
[0159] Furthermore, the server accumulates past user interaction and sentiment data through adaptive learning mechanisms and learns to improve future responses. This process is combined with storage technologies such as database management systems (relational or document-based).
[0160] terminal
[0161] The terminal acts as a user interface. It receives voice or text input from the user and sends it to the server. Terminals are typically implemented on platforms using Node.js or React Native. Furthermore, as an output assistance method, it presents response information sent from the server to the user in voice or text. In this case, a speech synthesis engine is used to perform the voice output.
[0162] User
[0163] Users make emotionally charged inquiries through their devices. A concrete example of a prompt might be, "When is the next board meeting? I'm a little worried." This allows users to expect more appropriate and personalized responses from the system.
[0164] Thus, the system of the present invention utilizes bidirectional communication between the server and the terminal to provide users with intelligent and flexible support for condominium management. As a result, users can confidently entrust their condominium management to the system.
[0165] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0166] Step 1:
[0167] The terminal receives input from the user. The user makes inquiries using a simple interface, either via text or voice. This input is sent to the terminal as digital data through the microphone or keyboard. The terminal then sends the input data directly to the server. The input is the user's inquiry, and the output is the inquiry data sent to the server.
[0168] Step 2:
[0169] The server analyzes the received query data using information processing tools. Natural language processing techniques are employed, particularly generative AI models such as TensorFlow and PyTorch, to understand the meaning and context of the text. The input is the query data received from the terminal, and the output is the subject and intent of the query identified through analysis.
[0170] Step 3:
[0171] The server generates an appropriate response based on the analysis results. During this generation process, sentiment analysis is performed, incorporating content that takes the user's emotions into account. An information conversion mechanism is also used to translate the generated response into the user's native language. The input consists of the analysis results and sentiment information, while the output is the translated response data.
[0172] Step 4:
[0173] The server sends the generated response to the terminal. This process uses a real-time communication protocol for rapid data transmission. The input is the response data generated by the server, and the output is the data sent to the terminal.
[0174] Step 5:
[0175] The terminal outputs the response received from the server to the user interface. Specifically, it either displays the response as text on the screen or conveys the response to the user as voice using a speech synthesis engine. The input is the response data received from the server, and the output is the content of the response presented to the user.
[0176] Step 6:
[0177] Based on the response received, the user decides on their next action. They can either continue operating the system or enter feedback via the terminal. User feedback helps in the continuous improvement of the system. Inputs are responses from the server and new user inquiries, while outputs are feedback and subsequent inquiries via the terminal.
[0178] (Application Example 2)
[0179] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0180] In modern living environments, apartment management is required to provide prompt and emotionally sensitive responses to complex inquiries and requests. However, conventional systems struggle to provide personalized responses that are sensitive to the user's feelings, and thus fail to alleviate the stress and dissatisfaction that arises. There is a need for an interface that can solve this problem and improve user satisfaction.
[0181] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0182] In this invention, the server includes natural language processing means for analyzing user inquiries and generating appropriate responses, translation means for translating the responses into the user's native language, and emotion analysis means for analyzing the user's emotions using emotion recognition technology and providing personalized responses. This makes it possible to provide personalized and considerate responses that are in line with the user's emotions.
[0183] "Natural language processing means" refers to technologies that analyze human language queries, convert them into a format that a computer can understand, and generate appropriate responses.
[0184] "Translation means" refers to technology for appropriately converting the generated response into the user's native language.
[0185] "Output means" refers to technologies for providing responses generated through a user interface as audio or text.
[0186] "Learning methods" refer to data analysis techniques that accumulate and analyze users' past interaction data in order to improve future responses.
[0187] "Emotion analysis means" refers to technology that recognizes emotions from input user data and provides personalized responses according to the user's emotional state.
[0188] A "human-supportive interface" is a user interface that connects the user and the system, providing emotionally sensitive dialogue in response to residents' inquiries.
[0189] A "database" is an information recording system that stores information related to an administrative organization and supports user engagement.
[0190] The system for implementing this invention mainly consists of three elements: a server, a terminal, and a user.
[0191] The server analyzes inquiries sent by users using natural language processing (NLP) technology. The server is equipped with a NLP engine that converts input data into a computer-processable format. Specifically, it uses tools such as the Google Cloud Natural Language API to analyze human language and understand its meaning. Furthermore, the server utilizes sentiment analysis technology to analyze the user's emotions and uses that emotional information to generate personalized responses. This process employs a sentiment analysis engine to identify emotions and customize responses based on user input and past interaction data.
[0192] Furthermore, the generated response is translated into the user's native language using a translation tool. For this purpose, the server utilizes translation engines such as the Google Translate API. This translation makes it possible to support users from multiple nationalities.
[0193] The device functions as a user interface. For example, a smartphone or robot uses voice input to receive inquiries from the user and sends that data to a server. The response returned from the server is then communicated to the user in voice format using the device's speech synthesis software (e.g., Amazon Polly).
[0194] Users can receive more personalized responses through the system. For example, in response to an inquiry such as, "Recently, the parking lot lighting seems dim," the server can provide an emotionally sensitive response such as, "Please rest assured, we will review the situation and arrange for environmental improvements." In this way, it is possible to create a comfortable living environment for users. An example of a prompt might be, "Generate an appropriate response when a user is feeling uneasy about their living environment." Based on this prompt, the AI model generates an effective response.
[0195] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0196] Step 1:
[0197] The user inputs their inquiry by voice through their smartphone or the robot's microphone. This voice data becomes the input. The device uses its built-in speech recognition software to convert this voice data into text data. This text data is output and sent to the next processing step.
[0198] Step 2:
[0199] The terminal sends text data to the server. The server runs a natural language processing engine based on the received text data. Here, the input text is analyzed to understand the intent of the query. The result of this process is analyzed intent data.
[0200] Step 3:
[0201] The server runs an emotion analysis engine using the analyzed intent data. The emotion analysis engine identifies the emotions contained in the user's statements. For example, emotion labels such as "anxiety," "reassurance," and "interest" are output. This data is used when generating responses.
[0202] Step 4:
[0203] The server uses a generative AI model to create prompts based on the analysis results, including emotion labels. These prompts form the basis for generating specific responses. For example, a prompt such as "Generate an appropriate response when the user is feeling anxious" might be generated. The generative AI model then creates the response based on these prompts.
[0204] Step 5:
[0205] The generated response is fed into the server's translation engine and translated into the user's native language to accommodate multinational users. This translated response is then sent to the terminal as the final output.
[0206] Step 6:
[0207] The device receives the translated response and outputs it as speech using its speech synthesis function. The response is then played back to the user through the speaker of their smartphone or robot. This allows the user to receive a natural-sounding response.
[0208] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0209] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0210] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0211] [Second Embodiment]
[0212] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0213] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0214] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0215] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0216] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0217] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0218] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0219] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0220] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0221] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0222] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0223] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0224] The present invention's apartment management assistant system consists of a server, terminals, and users. This system uses an assistant program specifically designed for apartment management to provide information and support to residents and board members, thereby streamlining apartment management and facilitating communication.
[0225] The server manages a database containing apartment-specific rules and know-how, and is equipped with natural language processing technology to analyze each user's inquiry. It processes incoming inquiries and generates appropriate responses. It then provides a translation function to translate those responses into the user's native language. The translated responses are delivered to the user via their terminal, allowing the user to make decisions based on that information.
[0226] The terminal provides a user interface and sends user inquiries to the server via voice or text input. This allows users to easily utilize the assistant system. The terminal also receives information from the server and presents it to the user visually or audibly. This functionality enables smooth communication among residents of various nationalities.
[0227] Users can check daily management information, view board meeting minutes, and ask questions about changes to condominium rules. For example, if a user asks a question to the terminal such as, "Please tell me about the condominium's garbage disposal rules," the server immediately analyzes the content, searches the database for relevant information, translates it as needed, and provides it to the user. In this way, this system automates and streamlines many of the tasks necessary for condominium management, and acts as an aid to make residents' lives more comfortable.
[0228] The following describes the processing flow.
[0229] Step 1:
[0230] Users enter their inquiries via voice or text through their device. For example, they might ask, "What is the date of the next board meeting?"
[0231] Step 2:
[0232] If the device receives voice input, it uses speech recognition to convert the audio into text data. Then, it sends the user's inquiry to the server.
[0233] Step 3:
[0234] The server analyzes the received text data using a natural language processing engine to identify the intent of the inquiry. In this case, it understands that the user is asking about the board meeting schedule.
[0235] Step 4:
[0236] The server searches the apartment building database to retrieve the date of the next board meeting. During this process, it references the user's account information to identify the apartment building in which the user resides.
[0237] Step 5:
[0238] The server translates the search results into the user's native language. For example, for a user whose native language is Japanese, the server will generate a response in Japanese.
[0239] Step 6:
[0240] The server sends the translated response to the terminal.
[0241] Step 7:
[0242] The device displays the information it receives on the screen and, if necessary, uses speech synthesis technology to notify the user via voice. This allows the user to receive necessary information instantly through the device.
[0243] Step 8:
[0244] The server stores user inquiries and responses in a database and uses them as training data to improve the accuracy of responses to future inquiries.
[0245] (Example 1)
[0246] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0247] In condominium management, it is essential that residents and board members can quickly and efficiently obtain necessary information and communicate smoothly among residents of various nationalities. However, language barriers and the lack of timeliness of information are obstacles to achieving these goals. Furthermore, streamlining management operations while promoting resident involvement is also a crucial challenge.
[0248] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0249] In this invention, the server includes language processing means for analyzing user inquiries and generating appropriate responses, translation means for translating the responses into the user's native language, and output means for providing the responses via a user interface. This enables rapid information provision in apartment management and smooth communication between multiple languages.
[0250] "Language processing means" refers to technology that has the function of analyzing natural language queries input by users and generating appropriate responses.
[0251] "Translation means" refers to technology that converts the generated response into the user's native language, and is a means of providing information that transcends language differences.
[0252] "Output means" refers to technology for presenting translated responses visually or audibly through a user interface.
[0253] A "learning method" is a method that uses machine learning techniques to accumulate data on users' past interactions and improve responses to future inquiries.
[0254] "Information storage means" refers to a means of storing information of a management organization in a database and providing users with access to that information when needed.
[0255] "Generation means" refers to a technology that uses a generation AI model to acquire and utilize necessary information based on prompt text.
[0256] An "input conversion means" is a technology that accurately converts voice input from a user into text data.
[0257] A "speech generation means" is a device that has the function of outputting text data as speech using speech synthesis technology.
[0258] The apartment building management assistant system consists of a server, terminals, and users. The server has language processing capabilities that incorporate natural language processing technology to analyze user inquiries. This system utilizes advanced databases and machine learning technologies to provide fast and accurate responses to various inquiries.
[0259] Specifically, the server utilizes cloud services, and common cloud-based APIs are used for natural language processing. For example, natural language processing APIs can be used for language processing, and online translation services can be used for translation. The terminal also provides a user interface and uses speech recognition technology to convert the user's voice input into text. Speech recognition software can be used for this conversion. The terminal also utilizes speech synthesis technology to provide information received from the server visually or audibly.
[0260] This system allows users to inquire about a variety of management-related information. For example, by entering a prompt such as "Please tell me the date and time of the next board meeting," users can instantly obtain the necessary information. By using a generative AI model, this system helps users obtain information clearly and specifically, thereby streamlining the management of Mandala Department information.
[0261] For example, if a user tells their terminal, "Please tell me the latest rules regarding pet ownership," the server instantly searches for relevant data and provides the necessary information in the user's native language. In this way, smooth information sharing and communication are possible even in a diverse international living environment. This system contributes to increased efficiency in management operations and improved quality of life for both apartment building managers and residents.
[0262] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0263] Step 1:
[0264] The user enters their question through their device. Voice input is converted into text data using speech recognition technology. This process prepares the user's voice and input text for transmission to the server.
[0265] Step 2:
[0266] The terminal receives user inquiry data via voice or text and sends it to the server through a secure protocol. The input is the user's inquiry text, and the output is the data sent to the server.
[0267] Step 3:
[0268] The server receives user queries and analyzes the intent of those queries using natural language processing techniques. This analysis specifically includes extracting important keywords and meanings. The input is the user's text data, and the output is the analyzed data.
[0269] Step 4:
[0270] The server searches the database for relevant information based on the analysis results. This search utilizes a generative AI model to enhance the database query with prompt statements. The input is the analyzed keywords, and the output is the relevant data.
[0271] Step 5:
[0272] The server constructs a response to the user's inquiry based on the search results. The constructed response is automatically translated into the user's native language. The input is information from the database, and the output is the translated response data.
[0273] Step 6:
[0274] The server sends the final response to the terminal. The terminal presents the received translated response to the user. This presentation is done via screen display or voice output using speech synthesis technology. The input is the response data from the server, and the output is the information provided to the user.
[0275] (Application Example 1)
[0276] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0277] In urban life, with the increasing number of multinational residents, there is a need for efficient apartment management and smooth communication. However, current technology does not adequately connect these aspects, and there are challenges in establishing sufficient means of providing information that accommodates diverse languages and cultures. In particular, real-time and multilingual support are currently insufficient when it comes to understanding the usage status of public facilities and providing event information.
[0278] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0279] In this invention, the server includes natural language processing means for an assistant program specialized in condominium management, translation means, and output means. Thereby, it becomes possible to analyze the input inquiry from the user, generate an appropriate response, and translate the response into the native language for provision. Also, by adding information acquisition means for providing the usage status of public facilities in real time and information provision means for providing event information to the user, smooth communication between residents and efficient use of urban functions become possible.
[0280] The "assistant program" is software for responding to user inquiries and providing fair and efficient information.
[0281] The "natural language processing means" is a technology for understanding and analyzing inquiries from users in the form of text or voice.
[0282] The "translation means" is a function for converting the analyzed response into a language that is easy for the user to understand.
[0283] The "output means" is a device or interface for visually or aurally providing the generated response to the user.
[0284] The "learning means" is an algorithm for analyzing past user interaction data and improving the accuracy of the next response based on that information.
[0285] The "database" is an information management system for storing information related to condominium management and user inquiries and making it accessible as needed.
[0286] The "information acquisition means" is a technology for collecting and managing the usage status of public facilities and other related information in real time.
[0287] The "information providing means" is a mechanism that provides users with urban event information and other related information based on the collected data.
[0288] "Speech recognition" is a technology that analyzes speech from meetings or users and converts it into text.
[0289] The "data arrangement function" is a function that structures the acquired data for use in storage and analysis.
[0290] As a form for implementing the invention, this system is mainly composed of a server, a terminal, and a user. The server is equipped with natural language processing technology and analyzes the input user inquiries. Specifically, inquiries sent by the user in voice or text are converted into text using the Google Cloud Speech-to-Text API. Then, using AWS Lambda, an appropriate response is generated based on the analyzed content.
[0291] The server acquires necessary information from the condominium management database and determines the content to be provided to the user. This information is translated into the required language using the Microsoft Translator Text API for multilingual support. The final response is sent to the user's terminal and output as voice or text.
[0292] The user interacts with this system via a terminal such as a smartphone or tablet. The terminal provides a user interface, accepts voice input and text input, and has the role of sending inquiries to the server. The information provided as a result is presented to the user visually or aurally.
[0293] As a specific example, when the user inputs a question to the terminal such as "Tell me about the events next week", the server analyzes the request and sends the following prompt sentence to the AI model that generates relevant event information.
[0294] Example prompt: "Please provide information on community events in this area over the next week."
[0295] This embodiment allows users, regardless of their nationality, to easily access and utilize resources and event information in the city where they reside.
[0296] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0297] Step 1:
[0298] Users input questions via voice or text into a device such as a smartphone. This input is converted into text data using the device's speech recognition software. If voice input is available, the Google Cloud Speech-to-Text API is used to ensure accurate text conversion.
[0299] Step 2:
[0300] The terminal sends the converted text to the server. A secure communication protocol over the internet is used for this data transmission. The server analyzes the received text data and uses natural language processing techniques to understand what the query is.
[0301] Step 3:
[0302] The server uses AWS Lambda to parse text data and retrieve the necessary information from the database. For example, it might search a condominium management database to retrieve public event information. At this stage, the response content is determined according to the user's request.
[0303] Step 4:
[0304] The server generates a response based on the acquired information and performs multilingual translation using the Microsoft Translator Text API if necessary. The language to be translated is obtained from the user's setting information.
[0305] Step 5:
[0306] The server sends the final response to the terminal. The terminal presents the received response to the user visually or audibly. This output is performed through text display or voice output by voice synthesis software.
[0307] Step 6:
[0308] Based on the obtained information, the user determines the next action. The user interaction data obtained in this process is utilized for improving future response accuracy through the system's continuous learning means.
[0309] Furthermore, an emotion engine for estimating the user's emotion may be combined. That is, the specific processing unit 290 may estimate the user's emotion using the emotion identification model 59 and perform specific processing using the user's emotion.
[0310] The present invention provides a more personalized user experience by combining an emotion engine with an assistant system specialized for condominium management. This system functions by coordinating a server, a terminal, and a user.
[0311] The server has a natural language processing engine that analyzes inquiries from the user, analyzes the received content, and generates an appropriate response. Furthermore, when analyzing the user's input using the emotion engine, the emotion is identified. This emotion information is stored in a database and used for improving future responses. In addition, the server translates the generated response into the user's native language and provides personalized feedback by appropriately adjusting the content.
[0312] The terminal functions as a user interface, receiving input from the user. For example, if a user makes an inquiry accompanied by emotion, such as "I'm tired today, so I'd like to finish this quickly," the terminal sends that voice or text to the server. The terminal also receives a response from the server, displays it on the screen, and uses speech synthesis to convey the response to the user verbally. Simultaneously, it is possible to adjust the tone of the interface and the information presented according to the user's emotional state.
[0313] By making emotionally charged inquiries, users can receive more personalized responses from the system. For example, if a user inquiring about the schedule of a board meeting expresses an emotion such as "I'm worried," the server will recognize this emotion and provide additional information, such as advice on the meeting's content and preparation. In this way, the system, equipped with an emotion engine, provides user-centric support and facilitates reliable condominium management.
[0314] The following describes the processing flow.
[0315] Step 1:
[0316] Users enter their inquiries via voice or text through their device. For example, they might ask, "I want to know the date of the next board meeting, but I'm worried."
[0317] Step 2:
[0318] If the device receives voice input, it uses speech recognition to convert the voice data into text. Then, it sends the user's inquiry to the server.
[0319] Step 3:
[0320] The server analyzes the received text data using a natural language processing engine to identify the intent of the inquiry and the user's emotional state. In this case, it understands that the user is asking about the board meeting schedule and that they are feeling "worried."
[0321] Step 4:
[0322] The server uses an emotion engine to identify emotion information, which is then stored in a database and used to generate responses, as described below. Emotion data is continuously accumulated to accommodate future inquiries.
[0323] Step 5:
[0324] The server searches the apartment building's database to retrieve the date of the next board meeting. During this process, it considers the user's emotional state and generates a response that includes a summary of the meeting content and advice on preparation.
[0325] Step 6:
[0326] The server-generated response is translated into the user's native language, preparing a more personalized result.
[0327] Step 7:
[0328] The server sends the translated response to the terminal.
[0329] Step 8:
[0330] The device displays the information it receives on the screen and, if necessary, communicates it to the user via voice using speech synthesis technology. Simultaneously, it adds supplementary information tailored to the user's emotions and reassuring messages.
[0331] Step 9:
[0332] The server records the data from this entire interaction in the system and uses it as training data to improve response accuracy in future interactions.
[0333] (Example 2)
[0334] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0335] In condominium management, there is a need to respond quickly and appropriately to a wide range of inquiries from residents. However, conventional systems have difficulty considering the feelings of residents, and language barriers exist when dealing with multinational residents, resulting in challenges to the efficiency and accuracy of communication. Furthermore, there is a lack of mechanisms for accurately recording and organizing important information such as meetings. Therefore, a new system is needed to support user-friendly and reliable condominium management.
[0336] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0337] In this invention, the server includes information processing means for analyzing user inquiries and generating appropriate responses, information conversion means for translating the responses into the user's native language, and adaptive learning means for accumulating and analyzing the user's past interaction data and emotional data to improve future responses. This enables personalized responses that take into account the user's emotions, and further facilitates smooth communication among multinational users and accurate recording and organization of information.
[0338] "Information processing means" refers to a device or program that has the function of analyzing input data and generating a response according to a specific purpose.
[0339] "Information conversion means" refers to technologies or devices for converting information output in one language into another language.
[0340] An "adaptive learning tool" is a device or program that has the ability to learn by analyzing past data and interactions to improve the quality of responses in subsequent instances.
[0341] An "information storage device" is a device or program that securely stores organizational information and user information for management purposes and retrieves that data as needed.
[0342] "Emotion analysis means" refers to a technology or device that identifies a user's emotional state based on input data and reflects that state in the response.
[0343] "Output assistance means" refers to technologies or devices that provide assistance functions for effectively conveying information to users through sight or hearing.
[0344] The "acoustic recognition and information organization function" is a function that converts audio data into text or other formats, and efficiently organizes and stores the information.
[0345] This invention is an assistant system specifically designed for apartment building management, with the entire process realized through the cooperation of a server, terminals, and users. The following describes each component and its role in detail.
[0346] server
[0347] The server has information processing capabilities to analyze inquiries received from users. Specifically, it performs analysis using natural language processing (NLP). Here, it utilizes machine learning libraries such as TensorFlow and PyTorch, which are built in the Python language. Furthermore, as a means of information conversion, it uses an open translation API to translate the generated response into the user's native language. This API uses a service that provides general translation technology.
[0348] Furthermore, the server accumulates past user interaction and sentiment data through adaptive learning mechanisms and learns to improve future responses. This process is combined with storage technologies such as database management systems (relational or document-based).
[0349] terminal
[0350] The terminal acts as a user interface. It receives voice or text input from the user and sends it to the server. Terminals are typically implemented on platforms using Node.js or React Native. Furthermore, as an output assistance method, it presents response information sent from the server to the user in voice or text. In this case, a speech synthesis engine is used to perform the voice output.
[0351] User
[0352] Users make emotionally charged inquiries through their devices. A concrete example of a prompt might be, "When is the next board meeting? I'm a little worried." This allows users to expect more appropriate and personalized responses from the system.
[0353] Thus, the system of the present invention utilizes bidirectional communication between the server and the terminal to provide users with intelligent and flexible support for condominium management. As a result, users can confidently entrust their condominium management to the system.
[0354] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0355] Step 1:
[0356] The terminal receives input from the user. The user makes inquiries using a simple interface, either via text or voice. This input is sent to the terminal as digital data through the microphone or keyboard. The terminal then sends the input data directly to the server. The input is the user's inquiry, and the output is the inquiry data sent to the server.
[0357] Step 2:
[0358] The server analyzes the received query data using information processing tools. Natural language processing techniques are employed, particularly generative AI models such as TensorFlow and PyTorch, to understand the meaning and context of the text. The input is the query data received from the terminal, and the output is the subject and intent of the query identified through analysis.
[0359] Step 3:
[0360] The server generates an appropriate response based on the analysis results. During this generation process, sentiment analysis is performed, incorporating content that takes the user's emotions into account. An information conversion mechanism is also used to translate the generated response into the user's native language. The input consists of the analysis results and sentiment information, while the output is the translated response data.
[0361] Step 4:
[0362] The server sends the generated response to the terminal. This process uses a real-time communication protocol for rapid data transmission. The input is the response data generated by the server, and the output is the data sent to the terminal.
[0363] Step 5:
[0364] The terminal outputs the response received from the server to the user interface. Specifically, it either displays the response as text on the screen or conveys the response to the user as voice using a speech synthesis engine. The input is the response data received from the server, and the output is the content of the response presented to the user.
[0365] Step 6:
[0366] Based on the response received, the user decides on their next action. They can either continue operating the system or enter feedback via the terminal. User feedback helps in the continuous improvement of the system. Inputs are responses from the server and new user inquiries, while outputs are feedback and subsequent inquiries via the terminal.
[0367] (Application Example 2)
[0368] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0369] In modern living environments, apartment management is required to provide prompt and emotionally sensitive responses to complex inquiries and requests. However, conventional systems struggle to provide personalized responses that are sensitive to the user's feelings, and thus fail to alleviate the stress and dissatisfaction that arises. There is a need for an interface that can solve this problem and improve user satisfaction.
[0370] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0371] In this invention, the server includes natural language processing means for analyzing user inquiries and generating appropriate responses, translation means for translating the responses into the user's native language, and emotion analysis means for analyzing the user's emotions using emotion recognition technology and providing personalized responses. This makes it possible to provide personalized and considerate responses that are in line with the user's emotions.
[0372] "Natural language processing means" refers to technologies that analyze human language queries, convert them into a format that a computer can understand, and generate appropriate responses.
[0373] "Translation means" refers to technology for appropriately converting the generated response into the user's native language.
[0374] "Output means" refers to technologies for providing responses generated through a user interface as audio or text.
[0375] "Learning methods" refer to data analysis techniques that accumulate and analyze users' past interaction data in order to improve future responses.
[0376] "Emotion analysis means" refers to technology that recognizes emotions from input user data and provides personalized responses according to the user's emotional state.
[0377] A "human-supportive interface" is a user interface that connects the user and the system, providing emotionally sensitive dialogue in response to residents' inquiries.
[0378] A "database" is an information recording system that stores information related to an administrative organization and supports user engagement.
[0379] The system for implementing this invention mainly consists of three elements: a server, a terminal, and a user.
[0380] The server analyzes inquiries sent by users using natural language processing (NLP) technology. The server is equipped with a NLP engine that converts input data into a computer-processable format. Specifically, it uses tools such as the Google Cloud Natural Language API to analyze human language and understand its meaning. Furthermore, the server utilizes sentiment analysis technology to analyze the user's emotions and uses that emotional information to generate personalized responses. This process employs a sentiment analysis engine to identify emotions and customize responses based on user input and past interaction data.
[0381] Furthermore, the generated response is translated into the user's native language using a translation tool. For this purpose, the server utilizes translation engines such as the Google Translate API. This translation makes it possible to support users from multiple nationalities.
[0382] The device functions as a user interface. For example, a smartphone or robot uses voice input to receive inquiries from the user and sends that data to a server. The response returned from the server is then communicated to the user in voice format using the device's speech synthesis software (e.g., Amazon Polly).
[0383] Users can receive more personalized responses through the system. For example, in response to an inquiry such as, "Recently, the parking lot lighting seems dim," the server can provide an emotionally sensitive response such as, "Please rest assured, we will review the situation and arrange for environmental improvements." In this way, it is possible to create a comfortable living environment for users. An example of a prompt might be, "Generate an appropriate response when a user is feeling uneasy about their living environment." Based on this prompt, the AI model generates an effective response.
[0384] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0385] Step 1:
[0386] The user inputs their inquiry by voice through their smartphone or the robot's microphone. This voice data becomes the input. The device uses its built-in speech recognition software to convert this voice data into text data. This text data is output and sent to the next processing step.
[0387] Step 2:
[0388] The terminal sends text data to the server. The server runs a natural language processing engine based on the received text data. Here, the input text is analyzed to understand the intent of the query. The result of this process is analyzed intent data.
[0389] Step 3:
[0390] The server runs an emotion analysis engine using the analyzed intent data. The emotion analysis engine identifies the emotions contained in the user's statements. For example, emotion labels such as "anxiety," "reassurance," and "interest" are output. This data is used when generating responses.
[0391] Step 4:
[0392] The server uses a generative AI model to create prompts based on the analysis results, including emotion labels. These prompts form the basis for generating specific responses. For example, a prompt such as "Generate an appropriate response when the user is feeling anxious" might be generated. The generative AI model then creates the response based on these prompts.
[0393] Step 5:
[0394] The generated response is fed into the server's translation engine and translated into the user's native language to accommodate multinational users. This translated response is then sent to the terminal as the final output.
[0395] Step 6:
[0396] The device receives the translated response and outputs it as speech using its speech synthesis function. The response is then played back to the user through the speaker of their smartphone or robot. This allows the user to receive a natural-sounding response.
[0397] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0398] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0399] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0400] [Third Embodiment]
[0401] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0402] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0403] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0404] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0405] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0406] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0407] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0408] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0409] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0410] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0411] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0412] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0413] The present invention's apartment management assistant system consists of a server, terminals, and users. This system uses an assistant program specifically designed for apartment management to provide information and support to residents and board members, thereby streamlining apartment management and facilitating communication.
[0414] The server manages a database containing apartment-specific rules and know-how, and is equipped with natural language processing technology to analyze each user's inquiry. It processes incoming inquiries and generates appropriate responses. It then provides a translation function to translate those responses into the user's native language. The translated responses are delivered to the user via their terminal, allowing the user to make decisions based on that information.
[0415] The terminal provides a user interface and sends user inquiries to the server via voice or text input. This allows users to easily utilize the assistant system. The terminal also receives information from the server and presents it to the user visually or audibly. This functionality enables smooth communication among residents of various nationalities.
[0416] Users can check daily management information, view board meeting minutes, and ask questions about changes to condominium rules. For example, if a user asks a question to the terminal such as, "Please tell me about the condominium's garbage disposal rules," the server immediately analyzes the content, searches the database for relevant information, translates it as needed, and provides it to the user. In this way, this system automates and streamlines many of the tasks necessary for condominium management, and acts as an aid to make residents' lives more comfortable.
[0417] The following describes the processing flow.
[0418] Step 1:
[0419] Users enter their inquiries via voice or text through their device. For example, they might ask, "What is the date of the next board meeting?"
[0420] Step 2:
[0421] If the device receives voice input, it uses speech recognition to convert the audio into text data. Then, it sends the user's inquiry to the server.
[0422] Step 3:
[0423] The server analyzes the received text data using a natural language processing engine to identify the intent of the inquiry. In this case, it understands that the user is asking about the board meeting schedule.
[0424] Step 4:
[0425] The server searches the apartment building database to retrieve the date of the next board meeting. During this process, it references the user's account information to identify the apartment building in which the user resides.
[0426] Step 5:
[0427] The server translates the search results into the user's native language. For example, for a user whose native language is Japanese, the server will generate a response in Japanese.
[0428] Step 6:
[0429] The server sends the translated response to the terminal.
[0430] Step 7:
[0431] The device displays the information it receives on the screen and, if necessary, uses speech synthesis technology to notify the user via voice. This allows the user to receive necessary information instantly through the device.
[0432] Step 8:
[0433] The server stores user inquiries and responses in a database and uses them as training data to improve the accuracy of responses to future inquiries.
[0434] (Example 1)
[0435] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0436] In condominium management, it is essential that residents and board members can quickly and efficiently obtain necessary information and communicate smoothly among residents of various nationalities. However, language barriers and the lack of timeliness of information are obstacles to achieving these goals. Furthermore, streamlining management operations while promoting resident involvement is also a crucial challenge.
[0437] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0438] In this invention, the server includes language processing means for analyzing user inquiries and generating appropriate responses, translation means for translating the responses into the user's native language, and output means for providing the responses via a user interface. This enables rapid information provision in apartment management and smooth communication between multiple languages.
[0439] "Language processing means" refers to technology that has the function of analyzing natural language queries input by users and generating appropriate responses.
[0440] "Translation means" refers to technology that converts the generated response into the user's native language, and is a means of providing information that transcends language differences.
[0441] "Output means" refers to technology for presenting translated responses visually or audibly through a user interface.
[0442] A "learning method" is a method that uses machine learning techniques to accumulate data on users' past interactions and improve responses to future inquiries.
[0443] "Information storage means" refers to a means of storing information of a management organization in a database and providing users with access to that information when needed.
[0444] "Generation means" refers to a technology that uses a generation AI model to acquire and utilize necessary information based on prompt text.
[0445] An "input conversion means" is a technology that accurately converts voice input from a user into text data.
[0446] A "speech generation means" is a device that has the function of outputting text data as speech using speech synthesis technology.
[0447] The apartment building management assistant system consists of a server, terminals, and users. The server has language processing capabilities that incorporate natural language processing technology to analyze user inquiries. This system utilizes advanced databases and machine learning technologies to provide fast and accurate responses to various inquiries.
[0448] Specifically, the server utilizes cloud services, and common cloud-based APIs are used for natural language processing. For example, natural language processing APIs can be used for language processing, and online translation services can be used for translation. The terminal also provides a user interface and uses speech recognition technology to convert the user's voice input into text. Speech recognition software can be used for this conversion. The terminal also utilizes speech synthesis technology to provide information received from the server visually or audibly.
[0449] This system allows users to inquire about a variety of management-related information. For example, by entering a prompt such as "Please tell me the date and time of the next board meeting," users can instantly obtain the necessary information. By using a generative AI model, this system helps users obtain information clearly and specifically, thereby streamlining the management of Mandala Department information.
[0450] For example, if a user tells their terminal, "Please tell me the latest rules regarding pet ownership," the server instantly searches for relevant data and provides the necessary information in the user's native language. In this way, smooth information sharing and communication are possible even in a diverse international living environment. This system contributes to increased efficiency in management operations and improved quality of life for both apartment building managers and residents.
[0451] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0452] Step 1:
[0453] The user enters their question through their device. Voice input is converted into text data using speech recognition technology. This process prepares the user's voice and input text for transmission to the server.
[0454] Step 2:
[0455] The terminal receives user inquiry data via voice or text and sends it to the server through a secure protocol. The input is the user's inquiry text, and the output is the data sent to the server.
[0456] Step 3:
[0457] The server receives user queries and analyzes the intent of those queries using natural language processing techniques. This analysis specifically includes extracting important keywords and meanings. The input is the user's text data, and the output is the analyzed data.
[0458] Step 4:
[0459] The server searches the database for relevant information based on the analysis results. This search utilizes a generative AI model to enhance the database query with prompt statements. The input is the analyzed keywords, and the output is the relevant data.
[0460] Step 5:
[0461] The server constructs a response to the user's inquiry based on the search results. The constructed response is automatically translated into the user's native language. The input is information from the database, and the output is the translated response data.
[0462] Step 6:
[0463] The server sends the final response to the terminal. The terminal presents the received translated response to the user. This presentation is done via screen display or voice output using speech synthesis technology. The input is the response data from the server, and the output is the information provided to the user.
[0464] (Application Example 1)
[0465] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0466] In urban life, with the increasing number of multinational residents, there is a need for efficient apartment management and smooth communication. However, current technology does not adequately connect these aspects, and there are challenges in establishing sufficient means of providing information that accommodates diverse languages and cultures. In particular, real-time and multilingual support are currently insufficient when it comes to understanding the usage status of public facilities and providing event information.
[0467] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0468] In this invention, the server includes natural language processing means, translation means, and output means for an assistant program specifically designed for apartment building management. This makes it possible to analyze user inquiries, generate appropriate responses, and provide those responses translated into the user's native language. Furthermore, by adding information acquisition means for providing real-time information on the usage status of public facilities and information provision means for providing event information to users, it becomes possible to facilitate communication among residents and enable efficient use of urban functions.
[0469] An "assistant program" is software designed to respond to user inquiries and provide information fairly and efficiently.
[0470] "Natural language processing means" refers to technologies for understanding and analyzing inquiries from users in the form of text or speech.
[0471] A "translation tool" is a function that converts the analyzed response into a language that is easy for the user to understand.
[0472] "Output means" refers to a device or interface for providing the generated response to the user visually or audibly.
[0473] A "learning method" is an algorithm that analyzes past user interaction data and uses that information to improve the accuracy of future responses.
[0474] A "database" is an information management system that stores information related to apartment management and user inquiries, and makes it accessible as needed.
[0475] "Information acquisition means" refers to technologies for collecting and managing information on the usage status of public facilities and other related information in real time.
[0476] An "information provision method" is a system that provides users with information on city events and other related information based on collected data.
[0477] "Speech recognition" is a technology that analyzes audio from meetings or users and converts it into text.
[0478] The "data organization function" is a function that structures acquired data and uses it for storage and analysis.
[0479] In its embodiment, this system primarily consists of a server, a terminal, and a user. The server incorporates natural language processing technology to analyze user inquiries. Specifically, it converts user inquiries sent via voice or text into text using the Google Cloud Speech-to-Text API. Then, it uses AWS Lambda to generate an appropriate response based on the analyzed content.
[0480] The server retrieves necessary information from the apartment management database and determines what to provide to the user. This information is translated into the required language using the Microsoft Translator Text API for multilingual support. The final response is sent to the user's device and output as either audio or text.
[0481] Users interact with this system via devices such as smartphones and tablets. These devices provide the user interface, accept voice and text input, and send queries to the server. The resulting information is presented to the user visually or audibly.
[0482] For example, if a user enters the question "Tell me about next week's events" into their device, the server will analyze the request and send a prompt message like the following to the AI model to retrieve relevant event information.
[0483] Example prompt: "Please provide information on community events in this area over the next week."
[0484] This embodiment allows users, regardless of their nationality, to easily access and utilize resources and event information in the city where they reside.
[0485] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0486] Step 1:
[0487] Users input questions via voice or text into a device such as a smartphone. This input is converted into text data using the device's speech recognition software. If voice input is available, the Google Cloud Speech-to-Text API is used to ensure accurate text conversion.
[0488] Step 2:
[0489] The terminal sends the converted text to the server. A secure communication protocol over the internet is used for this data transmission. The server analyzes the received text data and uses natural language processing techniques to understand what the query is.
[0490] Step 3:
[0491] The server uses AWS Lambda to parse text data and retrieve the necessary information from the database. For example, it might search a condominium management database to retrieve public event information. At this stage, the response content is determined according to the user's request.
[0492] Step 4:
[0493] The server generates a response based on the acquired information and, if necessary, performs multilingual translation using the Microsoft Translator Text API. The languages to be translated are obtained from the user's configuration information.
[0494] Step 5:
[0495] The server sends the final response to the terminal. The terminal presents the received response to the user visually or audibly. This output is done through text display or speech output using speech synthesis software.
[0496] Step 6:
[0497] Based on the information obtained, the user decides on their next action. The user interaction data obtained during this process is used to improve future response accuracy through the system's continuous learning mechanisms.
[0498] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0499] This invention provides a more personalized user experience by combining an emotion engine with an assistant system specifically designed for apartment building management. This system functions through the coordinated interaction of servers, terminals, and users.
[0500] The server has a natural language processing engine that analyzes user inquiries, analyzing the received content and generating appropriate responses. Furthermore, it uses an emotion engine to identify emotions when analyzing user input. This emotion information is stored in a database and used to improve future responses. The server also translates the generated responses into the user's native language and adjusts the content appropriately to provide personalized feedback.
[0501] The terminal functions as a user interface, receiving input from the user. For example, if a user makes an inquiry accompanied by emotion, such as "I'm tired today, so I'd like to finish this quickly," the terminal sends that voice or text to the server. The terminal also receives a response from the server, displays it on the screen, and uses speech synthesis to convey the response to the user verbally. Simultaneously, it is possible to adjust the tone of the interface and the information presented according to the user's emotional state.
[0502] By making emotionally charged inquiries, users can receive more personalized responses from the system. For example, if a user inquiring about the schedule of a board meeting expresses an emotion such as "I'm worried," the server will recognize this emotion and provide additional information, such as advice on the meeting's content and preparation. In this way, the system, equipped with an emotion engine, provides user-centric support and facilitates reliable condominium management.
[0503] The following describes the processing flow.
[0504] Step 1:
[0505] Users enter their inquiries via voice or text through their device. For example, they might ask, "I want to know the date of the next board meeting, but I'm worried."
[0506] Step 2:
[0507] If the device receives voice input, it uses speech recognition to convert the voice data into text. Then, it sends the user's inquiry to the server.
[0508] Step 3:
[0509] The server analyzes the received text data using a natural language processing engine to identify the intent of the inquiry and the user's emotional state. In this case, it understands that the user is asking about the board meeting schedule and that they are feeling "worried."
[0510] Step 4:
[0511] The server uses an emotion engine to identify emotion information, which is then stored in a database and used to generate responses, as described below. Emotion data is continuously accumulated to accommodate future inquiries.
[0512] Step 5:
[0513] The server searches the apartment building's database to retrieve the date of the next board meeting. During this process, it considers the user's emotional state and generates a response that includes a summary of the meeting content and advice on preparation.
[0514] Step 6:
[0515] The server-generated response is translated into the user's native language, preparing a more personalized result.
[0516] Step 7:
[0517] The server sends the translated response to the terminal.
[0518] Step 8:
[0519] The device displays the information it receives on the screen and, if necessary, communicates it to the user via voice using speech synthesis technology. Simultaneously, it adds supplementary information tailored to the user's emotions and reassuring messages.
[0520] Step 9:
[0521] The server records the data from this entire interaction in the system and uses it as training data to improve response accuracy in future interactions.
[0522] (Example 2)
[0523] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0524] In condominium management, there is a need to respond quickly and appropriately to a wide range of inquiries from residents. However, conventional systems have difficulty considering the feelings of residents, and language barriers exist when dealing with multinational residents, resulting in challenges to the efficiency and accuracy of communication. Furthermore, there is a lack of mechanisms for accurately recording and organizing important information such as meetings. Therefore, a new system is needed to support user-friendly and reliable condominium management.
[0525] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0526] In this invention, the server includes information processing means for analyzing user inquiries and generating appropriate responses, information conversion means for translating the responses into the user's native language, and adaptive learning means for accumulating and analyzing the user's past interaction data and emotional data to improve future responses. This enables personalized responses that take into account the user's emotions, and further facilitates smooth communication among multinational users and accurate recording and organization of information.
[0527] "Information processing means" refers to a device or program that has the function of analyzing input data and generating a response according to a specific purpose.
[0528] "Information conversion means" refers to technologies or devices for converting information output in one language into another language.
[0529] An "adaptive learning tool" is a device or program that has the ability to learn by analyzing past data and interactions to improve the quality of responses in subsequent instances.
[0530] An "information storage device" is a device or program that securely stores organizational information and user information for management purposes and retrieves that data as needed.
[0531] "Emotion analysis means" refers to a technology or device that identifies a user's emotional state based on input data and reflects that state in the response.
[0532] "Output assistance means" refers to technologies or devices that provide assistance functions for effectively conveying information to users through sight or hearing.
[0533] The "acoustic recognition and information organization function" is a function that converts audio data into text or other formats, and efficiently organizes and stores the information.
[0534] This invention is an assistant system specifically designed for apartment building management, with the entire process realized through the cooperation of a server, terminals, and users. The following describes each component and its role in detail.
[0535] server
[0536] The server has information processing capabilities to analyze inquiries received from users. Specifically, it performs analysis using natural language processing (NLP). Here, it utilizes machine learning libraries such as TensorFlow and PyTorch, which are built in the Python language. Furthermore, as a means of information conversion, it uses an open translation API to translate the generated response into the user's native language. This API uses a service that provides general translation technology.
[0537] Furthermore, the server accumulates past user interaction and sentiment data through adaptive learning mechanisms and learns to improve future responses. This process is combined with storage technologies such as database management systems (relational or document-based).
[0538] terminal
[0539] The terminal acts as a user interface. It receives voice or text input from the user and sends it to the server. Terminals are typically implemented on platforms using Node.js or React Native. Furthermore, as an output assistance method, it presents response information sent from the server to the user in voice or text. In this case, a speech synthesis engine is used to perform the voice output.
[0540] User
[0541] Users make emotionally charged inquiries through their devices. A concrete example of a prompt might be, "When is the next board meeting? I'm a little worried." This allows users to expect more appropriate and personalized responses from the system.
[0542] Thus, the system of the present invention utilizes bidirectional communication between the server and the terminal to provide users with intelligent and flexible support for condominium management. As a result, users can confidently entrust their condominium management to the system.
[0543] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0544] Step 1:
[0545] The terminal receives input from the user. The user makes inquiries using a simple interface, either via text or voice. This input is sent to the terminal as digital data through the microphone or keyboard. The terminal then sends the input data directly to the server. The input is the user's inquiry, and the output is the inquiry data sent to the server.
[0546] Step 2:
[0547] The server analyzes the received query data using information processing tools. Natural language processing techniques are employed, particularly generative AI models such as TensorFlow and PyTorch, to understand the meaning and context of the text. The input is the query data received from the terminal, and the output is the subject and intent of the query identified through analysis.
[0548] Step 3:
[0549] The server generates an appropriate response based on the analysis results. During this generation process, sentiment analysis is performed, incorporating content that takes the user's emotions into account. An information conversion mechanism is also used to translate the generated response into the user's native language. The input consists of the analysis results and sentiment information, while the output is the translated response data.
[0550] Step 4:
[0551] The server sends the generated response to the terminal. This process uses a real-time communication protocol for rapid data transmission. The input is the response data generated by the server, and the output is the data sent to the terminal.
[0552] Step 5:
[0553] The terminal outputs the response received from the server to the user interface. Specifically, it either displays the response as text on the screen or conveys the response to the user as voice using a speech synthesis engine. The input is the response data received from the server, and the output is the content of the response presented to the user.
[0554] Step 6:
[0555] Based on the response received, the user decides on their next action. They can either continue operating the system or enter feedback via the terminal. User feedback helps in the continuous improvement of the system. Inputs are responses from the server and new user inquiries, while outputs are feedback and subsequent inquiries via the terminal.
[0556] (Application Example 2)
[0557] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0558] In modern living environments, apartment management is required to provide prompt and emotionally sensitive responses to complex inquiries and requests. However, conventional systems struggle to provide personalized responses that are sensitive to the user's feelings, and thus fail to alleviate the stress and dissatisfaction that arises. There is a need for an interface that can solve this problem and improve user satisfaction.
[0559] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0560] In this invention, the server includes natural language processing means for analyzing user inquiries and generating appropriate responses, translation means for translating the responses into the user's native language, and emotion analysis means for analyzing the user's emotions using emotion recognition technology and providing personalized responses. This makes it possible to provide personalized and considerate responses that are in line with the user's emotions.
[0561] "Natural language processing means" refers to technologies that analyze human language queries, convert them into a format that a computer can understand, and generate appropriate responses.
[0562] "Translation means" refers to technology for appropriately converting the generated response into the user's native language.
[0563] "Output means" refers to technologies for providing responses generated through a user interface as audio or text.
[0564] "Learning methods" refer to data analysis techniques that accumulate and analyze users' past interaction data in order to improve future responses.
[0565] "Emotion analysis means" refers to technology that recognizes emotions from input user data and provides personalized responses according to the user's emotional state.
[0566] A "human-supportive interface" is a user interface that connects the user and the system, providing emotionally sensitive dialogue in response to residents' inquiries.
[0567] A "database" is an information recording system that stores information related to an administrative organization and supports user engagement.
[0568] The system for implementing this invention mainly consists of three elements: a server, a terminal, and a user.
[0569] The server analyzes inquiries sent by users using natural language processing (NLP) technology. The server is equipped with a NLP engine that converts input data into a computer-processable format. Specifically, it uses tools such as the Google Cloud Natural Language API to analyze human language and understand its meaning. Furthermore, the server utilizes sentiment analysis technology to analyze the user's emotions and uses that emotional information to generate personalized responses. This process employs a sentiment analysis engine to identify emotions and customize responses based on user input and past interaction data.
[0570] Furthermore, the generated response is translated into the user's native language using a translation tool. For this purpose, the server utilizes translation engines such as the Google Translate API. This translation makes it possible to support users from multiple nationalities.
[0571] The device functions as a user interface. For example, a smartphone or robot uses voice input to receive inquiries from the user and sends that data to a server. The response returned from the server is then communicated to the user in voice format using the device's speech synthesis software (e.g., Amazon Polly).
[0572] Users can receive more personalized responses through the system. For example, in response to an inquiry such as, "Recently, the parking lot lighting seems dim," the server can provide an emotionally sensitive response such as, "Please rest assured, we will review the situation and arrange for environmental improvements." In this way, it is possible to create a comfortable living environment for users. An example of a prompt might be, "Generate an appropriate response when a user is feeling uneasy about their living environment." Based on this prompt, the AI model generates an effective response.
[0573] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0574] Step 1:
[0575] The user inputs their inquiry by voice through their smartphone or the robot's microphone. This voice data becomes the input. The device uses its built-in speech recognition software to convert this voice data into text data. This text data is output and sent to the next processing step.
[0576] Step 2:
[0577] The terminal sends text data to the server. The server runs a natural language processing engine based on the received text data. Here, the input text is analyzed to understand the intent of the query. The result of this process is analyzed intent data.
[0578] Step 3:
[0579] The server runs an emotion analysis engine using the analyzed intent data. The emotion analysis engine identifies the emotions contained in the user's statements. For example, emotion labels such as "anxiety," "reassurance," and "interest" are output. This data is used when generating responses.
[0580] Step 4:
[0581] The server uses a generative AI model to create prompts based on the analysis results, including emotion labels. These prompts form the basis for generating specific responses. For example, a prompt such as "Generate an appropriate response when the user is feeling anxious" might be generated. The generative AI model then creates the response based on these prompts.
[0582] Step 5:
[0583] The generated response is fed into the server's translation engine and translated into the user's native language to accommodate multinational users. This translated response is then sent to the terminal as the final output.
[0584] Step 6:
[0585] The device receives the translated response and outputs it as speech using its speech synthesis function. The response is then played back to the user through the speaker of their smartphone or robot. This allows the user to receive a natural-sounding response.
[0586] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0587] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0588] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0589] [Fourth Embodiment]
[0590] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0591] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0592] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0593] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0594] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0595] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0596] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0597] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0598] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0599] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0600] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0601] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0602] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0603] The present invention's apartment management assistant system consists of a server, terminals, and users. This system uses an assistant program specifically designed for apartment management to provide information and support to residents and board members, thereby streamlining apartment management and facilitating communication.
[0604] The server manages a database containing apartment-specific rules and know-how, and is equipped with natural language processing technology to analyze each user's inquiry. It processes incoming inquiries and generates appropriate responses. It then provides a translation function to translate those responses into the user's native language. The translated responses are delivered to the user via their terminal, allowing the user to make decisions based on that information.
[0605] The terminal provides a user interface and sends user inquiries to the server via voice or text input. This allows users to easily utilize the assistant system. The terminal also receives information from the server and presents it to the user visually or audibly. This functionality enables smooth communication among residents of various nationalities.
[0606] Users can check daily management information, view board meeting minutes, and ask questions about changes to condominium rules. For example, if a user asks a question to the terminal such as, "Please tell me about the condominium's garbage disposal rules," the server immediately analyzes the content, searches the database for relevant information, translates it as needed, and provides it to the user. In this way, this system automates and streamlines many of the tasks necessary for condominium management, and acts as an aid to make residents' lives more comfortable.
[0607] The following describes the processing flow.
[0608] Step 1:
[0609] Users enter their inquiries via voice or text through their device. For example, they might ask, "What is the date of the next board meeting?"
[0610] Step 2:
[0611] If the device receives voice input, it uses speech recognition to convert the audio into text data. Then, it sends the user's inquiry to the server.
[0612] Step 3:
[0613] The server analyzes the received text data using a natural language processing engine to identify the intent of the inquiry. In this case, it understands that the user is asking about the board meeting schedule.
[0614] Step 4:
[0615] The server searches the apartment building database to retrieve the date of the next board meeting. During this process, it references the user's account information to identify the apartment building in which the user resides.
[0616] Step 5:
[0617] The server translates the search results into the user's native language. For example, for a user whose native language is Japanese, the server will generate a response in Japanese.
[0618] Step 6:
[0619] The server sends the translated response to the terminal.
[0620] Step 7:
[0621] The device displays the information it receives on the screen and, if necessary, uses speech synthesis technology to notify the user via voice. This allows the user to receive necessary information instantly through the device.
[0622] Step 8:
[0623] The server stores user inquiries and responses in a database and uses them as training data to improve the accuracy of responses to future inquiries.
[0624] (Example 1)
[0625] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0626] In condominium management, it is essential that residents and board members can quickly and efficiently obtain necessary information and communicate smoothly among residents of various nationalities. However, language barriers and the lack of timeliness of information are obstacles to achieving these goals. Furthermore, streamlining management operations while promoting resident involvement is also a crucial challenge.
[0627] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0628] In this invention, the server includes language processing means for analyzing user inquiries and generating appropriate responses, translation means for translating the responses into the user's native language, and output means for providing the responses via a user interface. This enables rapid information provision in apartment management and smooth communication between multiple languages.
[0629] "Language processing means" refers to technology that has the function of analyzing natural language queries input by users and generating appropriate responses.
[0630] "Translation means" refers to technology that converts the generated response into the user's native language, and is a means of providing information that transcends language differences.
[0631] "Output means" refers to technology for presenting translated responses visually or audibly through a user interface.
[0632] A "learning method" is a method that uses machine learning techniques to accumulate data on users' past interactions and improve responses to future inquiries.
[0633] "Information storage means" refers to a means of storing information of a management organization in a database and providing users with access to that information when needed.
[0634] "Generation means" refers to a technology that uses a generation AI model to acquire and utilize necessary information based on prompt text.
[0635] An "input conversion means" is a technology that accurately converts voice input from a user into text data.
[0636] A "speech generation means" is a device that has the function of outputting text data as speech using speech synthesis technology.
[0637] The apartment building management assistant system consists of a server, terminals, and users. The server has language processing capabilities that incorporate natural language processing technology to analyze user inquiries. This system utilizes advanced databases and machine learning technologies to provide fast and accurate responses to various inquiries.
[0638] Specifically, the server utilizes cloud services, and common cloud-based APIs are used for natural language processing. For example, natural language processing APIs can be used for language processing, and online translation services can be used for translation. The terminal also provides a user interface and uses speech recognition technology to convert the user's voice input into text. Speech recognition software can be used for this conversion. The terminal also utilizes speech synthesis technology to provide information received from the server visually or audibly.
[0639] This system allows users to inquire about a variety of management-related information. For example, by entering a prompt such as "Please tell me the date and time of the next board meeting," users can instantly obtain the necessary information. By using a generative AI model, this system helps users obtain information clearly and specifically, thereby streamlining the management of Mandala Department information.
[0640] For example, if a user tells their terminal, "Please tell me the latest rules regarding pet ownership," the server instantly searches for relevant data and provides the necessary information in the user's native language. In this way, smooth information sharing and communication are possible even in a diverse international living environment. This system contributes to increased efficiency in management operations and improved quality of life for both apartment building managers and residents.
[0641] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0642] Step 1:
[0643] The user enters their question through their device. Voice input is converted into text data using speech recognition technology. This process prepares the user's voice and input text for transmission to the server.
[0644] Step 2:
[0645] The terminal receives user inquiry data via voice or text and sends it to the server through a secure protocol. The input is the user's inquiry text, and the output is the data sent to the server.
[0646] Step 3:
[0647] The server receives user queries and analyzes the intent of those queries using natural language processing techniques. This analysis specifically includes extracting important keywords and meanings. The input is the user's text data, and the output is the analyzed data.
[0648] Step 4:
[0649] The server searches the database for relevant information based on the analysis results. This search utilizes a generative AI model to enhance the database query with prompt statements. The input is the analyzed keywords, and the output is the relevant data.
[0650] Step 5:
[0651] The server constructs a response to the user's inquiry based on the search results. The constructed response is automatically translated into the user's native language. The input is information from the database, and the output is the translated response data.
[0652] Step 6:
[0653] The server sends the final response to the terminal. The terminal presents the received translated response to the user. This presentation is done via screen display or voice output using speech synthesis technology. The input is the response data from the server, and the output is the information provided to the user.
[0654] (Application Example 1)
[0655] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0656] In urban life, with the increasing number of multinational residents, there is a need for efficient apartment management and smooth communication. However, current technology does not adequately connect these aspects, and there are challenges in establishing sufficient means of providing information that accommodates diverse languages and cultures. In particular, real-time and multilingual support are currently insufficient when it comes to understanding the usage status of public facilities and providing event information.
[0657] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0658] In this invention, the server includes natural language processing means, translation means, and output means for an assistant program specifically designed for apartment building management. This makes it possible to analyze user inquiries, generate appropriate responses, and provide those responses translated into the user's native language. Furthermore, by adding information acquisition means for providing real-time information on the usage status of public facilities and information provision means for providing event information to users, it becomes possible to facilitate communication among residents and enable efficient use of urban functions.
[0659] An "assistant program" is software designed to respond to user inquiries and provide information fairly and efficiently.
[0660] "Natural language processing means" refers to technologies for understanding and analyzing inquiries from users in the form of text or speech.
[0661] A "translation tool" is a function that converts the analyzed response into a language that is easy for the user to understand.
[0662] "Output means" refers to a device or interface for providing the generated response to the user visually or audibly.
[0663] A "learning method" is an algorithm that analyzes past user interaction data and uses that information to improve the accuracy of future responses.
[0664] A "database" is an information management system that stores information related to apartment management and user inquiries, and makes it accessible as needed.
[0665] "Information acquisition means" refers to technologies for collecting and managing information on the usage status of public facilities and other related information in real time.
[0666] An "information provision method" is a system that provides users with information on city events and other related information based on collected data.
[0667] "Speech recognition" is a technology that analyzes audio from meetings or users and converts it into text.
[0668] The "data organization function" is a function that structures acquired data and uses it for storage and analysis.
[0669] In its embodiment, this system primarily consists of a server, a terminal, and a user. The server incorporates natural language processing technology to analyze user inquiries. Specifically, it converts user inquiries sent via voice or text into text using the Google Cloud Speech-to-Text API. Then, it uses AWS Lambda to generate an appropriate response based on the analyzed content.
[0670] The server retrieves necessary information from the apartment management database and determines what to provide to the user. This information is translated into the required language using the Microsoft Translator Text API for multilingual support. The final response is sent to the user's device and output as either audio or text.
[0671] Users interact with this system via devices such as smartphones and tablets. These devices provide the user interface, accept voice and text input, and send queries to the server. The resulting information is presented to the user visually or audibly.
[0672] For example, if a user enters the question "Tell me about next week's events" into their device, the server will analyze the request and send a prompt message like the following to the AI model to retrieve relevant event information.
[0673] Example prompt: "Please provide information on community events in this area over the next week."
[0674] This embodiment allows users, regardless of their nationality, to easily access and utilize resources and event information in the city where they reside.
[0675] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0676] Step 1:
[0677] Users input questions via voice or text into a device such as a smartphone. This input is converted into text data using the device's speech recognition software. If voice input is available, the Google Cloud Speech-to-Text API is used to ensure accurate text conversion.
[0678] Step 2:
[0679] The terminal sends the converted text to the server. A secure communication protocol over the internet is used for this data transmission. The server analyzes the received text data and uses natural language processing techniques to understand what the query is.
[0680] Step 3:
[0681] The server uses AWS Lambda to parse text data and retrieve the necessary information from the database. For example, it might search a condominium management database to retrieve public event information. At this stage, the response content is determined according to the user's request.
[0682] Step 4:
[0683] The server generates a response based on the acquired information and, if necessary, performs multilingual translation using the Microsoft Translator Text API. The languages to be translated are obtained from the user's configuration information.
[0684] Step 5:
[0685] The server sends the final response to the terminal. The terminal presents the received response to the user visually or audibly. This output is done through text display or speech output using speech synthesis software.
[0686] Step 6:
[0687] Based on the information obtained, the user decides on their next action. The user interaction data obtained during this process is used to improve future response accuracy through the system's continuous learning mechanisms.
[0688] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0689] This invention provides a more personalized user experience by combining an emotion engine with an assistant system specifically designed for apartment building management. This system functions through the coordinated interaction of servers, terminals, and users.
[0690] The server has a natural language processing engine that analyzes user inquiries, analyzing the received content and generating appropriate responses. Furthermore, it uses an emotion engine to identify emotions when analyzing user input. This emotion information is stored in a database and used to improve future responses. The server also translates the generated responses into the user's native language and adjusts the content appropriately to provide personalized feedback.
[0691] The terminal functions as a user interface, receiving input from the user. For example, if a user makes an inquiry accompanied by emotion, such as "I'm tired today, so I'd like to finish this quickly," the terminal sends that voice or text to the server. The terminal also receives a response from the server, displays it on the screen, and uses speech synthesis to convey the response to the user verbally. Simultaneously, it is possible to adjust the tone of the interface and the information presented according to the user's emotional state.
[0692] By making emotionally charged inquiries, users can receive more personalized responses from the system. For example, if a user inquiring about the schedule of a board meeting expresses an emotion such as "I'm worried," the server will recognize this emotion and provide additional information, such as advice on the meeting's content and preparation. In this way, the system, equipped with an emotion engine, provides user-centric support and facilitates reliable condominium management.
[0693] The following describes the processing flow.
[0694] Step 1:
[0695] Users enter their inquiries via voice or text through their device. For example, they might ask, "I want to know the date of the next board meeting, but I'm worried."
[0696] Step 2:
[0697] If the device receives voice input, it uses speech recognition to convert the voice data into text. Then, it sends the user's inquiry to the server.
[0698] Step 3:
[0699] The server analyzes the received text data using a natural language processing engine to identify the intent of the inquiry and the user's emotional state. In this case, it understands that the user is asking about the board meeting schedule and that they are feeling "worried."
[0700] Step 4:
[0701] The server uses an emotion engine to identify emotion information, which is then stored in a database and used to generate responses, as described below. Emotion data is continuously accumulated to accommodate future inquiries.
[0702] Step 5:
[0703] The server searches the apartment building's database to retrieve the date of the next board meeting. During this process, it considers the user's emotional state and generates a response that includes a summary of the meeting content and advice on preparation.
[0704] Step 6:
[0705] The server-generated response is translated into the user's native language, preparing a more personalized result.
[0706] Step 7:
[0707] The server sends the translated response to the terminal.
[0708] Step 8:
[0709] The device displays the information it receives on the screen and, if necessary, communicates it to the user via voice using speech synthesis technology. Simultaneously, it adds supplementary information tailored to the user's emotions and reassuring messages.
[0710] Step 9:
[0711] The server records the data from this entire interaction in the system and uses it as training data to improve response accuracy in future interactions.
[0712] (Example 2)
[0713] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0714] In condominium management, there is a need to respond quickly and appropriately to a wide range of inquiries from residents. However, conventional systems have difficulty considering the feelings of residents, and language barriers exist when dealing with multinational residents, resulting in challenges to the efficiency and accuracy of communication. Furthermore, there is a lack of mechanisms for accurately recording and organizing important information such as meetings. Therefore, a new system is needed to support user-friendly and reliable condominium management.
[0715] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0716] In this invention, the server includes information processing means for analyzing user inquiries and generating appropriate responses, information conversion means for translating the responses into the user's native language, and adaptive learning means for accumulating and analyzing the user's past interaction data and emotional data to improve future responses. This enables personalized responses that take into account the user's emotions, and further facilitates smooth communication among multinational users and accurate recording and organization of information.
[0717] "Information processing means" refers to a device or program that has the function of analyzing input data and generating a response according to a specific purpose.
[0718] "Information conversion means" refers to technologies or devices for converting information output in one language into another language.
[0719] An "adaptive learning tool" is a device or program that has the ability to learn by analyzing past data and interactions to improve the quality of responses in subsequent instances.
[0720] An "information storage device" is a device or program that securely stores organizational information and user information for management purposes and retrieves that data as needed.
[0721] "Emotion analysis means" refers to a technology or device that identifies a user's emotional state based on input data and reflects that state in the response.
[0722] "Output assistance means" refers to technologies or devices that provide assistance functions for effectively conveying information to users through sight or hearing.
[0723] The "acoustic recognition and information organization function" is a function that converts audio data into text or other formats, and efficiently organizes and stores the information.
[0724] This invention is an assistant system specifically designed for apartment building management, with the entire process realized through the cooperation of a server, terminals, and users. The following describes each component and its role in detail.
[0725] server
[0726] The server has information processing capabilities to analyze inquiries received from users. Specifically, it performs analysis using natural language processing (NLP). Here, it utilizes machine learning libraries such as TensorFlow and PyTorch, which are built in the Python language. Furthermore, as a means of information conversion, it uses an open translation API to translate the generated response into the user's native language. This API uses a service that provides general translation technology.
[0727] Furthermore, the server accumulates past user interaction and sentiment data through adaptive learning mechanisms and learns to improve future responses. This process is combined with storage technologies such as database management systems (relational or document-based).
[0728] terminal
[0729] The terminal acts as a user interface. It receives voice or text input from the user and sends it to the server. Terminals are typically implemented on platforms using Node.js or React Native. Furthermore, as an output assistance method, it presents response information sent from the server to the user in voice or text. In this case, a speech synthesis engine is used to perform the voice output.
[0730] User
[0731] Users make emotionally charged inquiries through their devices. A concrete example of a prompt might be, "When is the next board meeting? I'm a little worried." This allows users to expect more appropriate and personalized responses from the system.
[0732] Thus, the system of the present invention utilizes bidirectional communication between the server and the terminal to provide users with intelligent and flexible support for condominium management. As a result, users can confidently entrust their condominium management to the system.
[0733] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0734] Step 1:
[0735] The terminal receives input from the user. The user makes inquiries using a simple interface, either via text or voice. This input is sent to the terminal as digital data through the microphone or keyboard. The terminal then sends the input data directly to the server. The input is the user's inquiry, and the output is the inquiry data sent to the server.
[0736] Step 2:
[0737] The server analyzes the received query data using information processing tools. Natural language processing techniques are employed, particularly generative AI models such as TensorFlow and PyTorch, to understand the meaning and context of the text. The input is the query data received from the terminal, and the output is the subject and intent of the query identified through analysis.
[0738] Step 3:
[0739] The server generates an appropriate response based on the analysis results. During this generation process, sentiment analysis is performed, incorporating content that takes the user's emotions into account. An information conversion mechanism is also used to translate the generated response into the user's native language. The input consists of the analysis results and sentiment information, while the output is the translated response data.
[0740] Step 4:
[0741] The server sends the generated response to the terminal. This process uses a real-time communication protocol for rapid data transmission. The input is the response data generated by the server, and the output is the data sent to the terminal.
[0742] Step 5:
[0743] The terminal outputs the response received from the server to the user interface. Specifically, it either displays the response as text on the screen or conveys the response to the user as voice using a speech synthesis engine. The input is the response data received from the server, and the output is the content of the response presented to the user.
[0744] Step 6:
[0745] Based on the response received, the user decides on their next action. They can either continue operating the system or enter feedback via the terminal. User feedback helps in the continuous improvement of the system. Inputs are responses from the server and new user inquiries, while outputs are feedback and subsequent inquiries via the terminal.
[0746] (Application Example 2)
[0747] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0748] In modern living environments, apartment management is required to provide prompt and emotionally sensitive responses to complex inquiries and requests. However, conventional systems struggle to provide personalized responses that are sensitive to the user's feelings, and thus fail to alleviate the stress and dissatisfaction that arises. There is a need for an interface that can solve this problem and improve user satisfaction.
[0749] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0750] In this invention, the server includes natural language processing means for analyzing user inquiries and generating appropriate responses, translation means for translating the responses into the user's native language, and emotion analysis means for analyzing the user's emotions using emotion recognition technology and providing personalized responses. This makes it possible to provide personalized and considerate responses that are in line with the user's emotions.
[0751] "Natural language processing means" refers to technologies that analyze human language queries, convert them into a format that a computer can understand, and generate appropriate responses.
[0752] "Translation means" refers to technology for appropriately converting the generated response into the user's native language.
[0753] "Output means" refers to technologies for providing responses generated through a user interface as audio or text.
[0754] "Learning methods" refer to data analysis techniques that accumulate and analyze users' past interaction data in order to improve future responses.
[0755] "Emotion analysis means" refers to technology that recognizes emotions from input user data and provides personalized responses according to the user's emotional state.
[0756] A "human-supportive interface" is a user interface that connects the user and the system, providing emotionally sensitive dialogue in response to residents' inquiries.
[0757] A "database" is an information recording system that stores information related to an administrative organization and supports user engagement.
[0758] The system for implementing this invention mainly consists of three elements: a server, a terminal, and a user.
[0759] The server analyzes inquiries sent by users using natural language processing (NLP) technology. The server is equipped with a NLP engine that converts input data into a computer-processable format. Specifically, it uses tools such as the Google Cloud Natural Language API to analyze human language and understand its meaning. Furthermore, the server utilizes sentiment analysis technology to analyze the user's emotions and uses that emotional information to generate personalized responses. This process employs a sentiment analysis engine to identify emotions and customize responses based on user input and past interaction data.
[0760] Furthermore, the generated response is translated into the user's native language using a translation tool. For this purpose, the server utilizes translation engines such as the Google Translate API. This translation makes it possible to support users from multiple nationalities.
[0761] The device functions as a user interface. For example, a smartphone or robot uses voice input to receive inquiries from the user and sends that data to a server. The response returned from the server is then communicated to the user in voice format using the device's speech synthesis software (e.g., Amazon Polly).
[0762] Users can receive more personalized responses through the system. For example, in response to an inquiry such as, "Recently, the parking lot lighting seems dim," the server can provide an emotionally sensitive response such as, "Please rest assured, we will review the situation and arrange for environmental improvements." In this way, it is possible to create a comfortable living environment for users. An example of a prompt might be, "Generate an appropriate response when a user is feeling uneasy about their living environment." Based on this prompt, the AI model generates an effective response.
[0763] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0764] Step 1:
[0765] The user inputs their inquiry by voice through their smartphone or the robot's microphone. This voice data becomes the input. The device uses its built-in speech recognition software to convert this voice data into text data. This text data is output and sent to the next processing step.
[0766] Step 2:
[0767] The terminal sends text data to the server. The server runs a natural language processing engine based on the received text data. Here, the input text is analyzed to understand the intent of the query. The result of this process is analyzed intent data.
[0768] Step 3:
[0769] The server runs an emotion analysis engine using the analyzed intent data. The emotion analysis engine identifies the emotions contained in the user's statements. For example, emotion labels such as "anxiety," "reassurance," and "interest" are output. This data is used when generating responses.
[0770] Step 4:
[0771] The server uses a generative AI model to create prompts based on the analysis results, including emotion labels. These prompts form the basis for generating specific responses. For example, a prompt such as "Generate an appropriate response when the user is feeling anxious" might be generated. The generative AI model then creates the response based on these prompts.
[0772] Step 5:
[0773] The generated response is fed into the server's translation engine and translated into the user's native language to accommodate multinational users. This translated response is then sent to the terminal as the final output.
[0774] Step 6:
[0775] The device receives the translated response and outputs it as speech using its speech synthesis function. The response is then played back to the user through the speaker of their smartphone or robot. This allows the user to receive a natural-sounding response.
[0776] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0777] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0778] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0779] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0780] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0781] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0782] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0783] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0784] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0785] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0786] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0787] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0788] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0789] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0790] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0791] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0792] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0793] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0794] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0795] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0796] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0797] The following is further disclosed regarding the embodiments described above.
[0798] (Claim 1)
[0799] To provide an assistant program specializing in apartment building management,
[0800] A natural language processing means that analyzes user inquiries and generates appropriate responses,
[0801] A translation means for translating the response into the user's native language,
[0802] An output means that provides the response via a user interface,
[0803] A means of learning to improve future responses by accumulating and analyzing past user interaction data,
[0804] A system that includes a database for storing information about the management organization and promoting user engagement.
[0805] (Claim 2)
[0806] To facilitate communication among users of multiple nationalities,
[0807] Real-time translation into multiple languages
[0808] The system according to claim 1.
[0809] (Claim 3)
[0810] Automatically transcribe meeting audio into text.
[0811] For generating and saving meeting minutes,
[0812] Equipped with voice recognition and data organization functions,
[0813] The system according to claim 1.
[0814] "Example 1"
[0815] (Claim 1)
[0816] A language processing means that analyzes user inquiries entered and generates appropriate responses,
[0817] A translation means for translating the response into the user's native language,
[0818] An output means that provides the response via a user interface,
[0819] A means of learning to improve future responses by accumulating and analyzing past user interaction data,
[0820] Information storage means for storing information of management organizations and promoting user engagement,
[0821] A generation means that uses a generation AI model to acquire information based on a prompt sentence,
[0822] An input conversion means that recognizes voice input from a user and converts it into text data,
[0823] A speech generation means that outputs text data as speech using speech synthesis technology,
[0824] ...
[0825] A system that includes this.
[0826] (Claim 2)
[0827] The system according to claim 1, comprising means for real-time translation into multiple languages in order to facilitate communication among multinational users.
[0828] (Claim 3)
[0829] The system according to claim 1, comprising speech recognition and information organization means for automatically converting meeting audio into text and generating and saving meeting minutes.
[0830] "Application Example 1"
[0831] (Claim 1)
[0832] To provide an assistant program specializing in apartment building management,
[0833] A natural language processing means that analyzes user inquiries and generates appropriate responses,
[0834] A translation means for translating the response into the user's native language,
[0835] An output means that provides the response via a user interface,
[0836] A means of learning to improve future responses by accumulating and analyzing past user interaction data,
[0837] A database that stores information about the management organization and promotes user engagement,
[0838] A means of acquiring information to provide real-time information on the usage status of public facilities,
[0839] A system that includes means for providing information to users regarding events.
[0840] (Claim 2)
[0841] To facilitate communication among users of multiple nationalities,
[0842] Real-time translation into multiple languages
[0843] The system according to claim 1.
[0844] (Claim 3)
[0845] Automatically transcribe meeting audio into text.
[0846] For generating and saving meeting minutes,
[0847] Equipped with voice recognition and data organization functions,
[0848] The system according to claim 1.
[0849] "Example 2 of combining an emotion engine"
[0850] (Claim 1)
[0851] Information processing means for analyzing user inquiries and generating appropriate responses,
[0852] Information conversion means for translating the response into the user's native language,
[0853] An output means that provides the response via a user interface,
[0854] An adaptive learning method that accumulates and analyzes users' past interaction data and emotional data to improve future responses,
[0855] An information storage device that stores information about the management organization and promotes user engagement,
[0856] A means of emotion analysis that identifies emotions from user input and reflects them in the response,
[0857] An output support means that provides the real-time converted response acoustically or visually using information synthesis means,
[0858] A system that includes this.
[0859] (Claim 2)
[0860] The system according to claim 1, which provides real-time translation into multiple languages to facilitate communication among multinational users.
[0861] (Claim 3)
[0862] The system according to claim 1, which automatically records and visualizes meeting audio and includes acoustic recognition and information organization functions for information management.
[0863] "Application example 2 when combining with an emotional engine"
[0864] (Claim 1)
[0865] A natural language processing means that analyzes user inquiries and generates appropriate responses,
[0866] A translation means for translating the response into the user's native language,
[0867] An output means that provides the response via a user interface,
[0868] A means of learning to improve future responses by accumulating and analyzing past user interaction data,
[0869] An emotion analysis means that uses emotion recognition technology to analyze the user's emotions and provide a personalized response,
[0870] A human-supportive interface that enables emotionally sensitive dialogue based on inquiries from residents in their living environment,
[0871] A system that includes a database for storing information about the management organization and promoting user engagement.
[0872] (Claim 2)
[0873] To facilitate communication among users of multiple nationalities,
[0874] Real-time translation into multiple languages
[0875] The system according to claim 1.
[0876] (Claim 3)
[0877] Automatically transcribe meeting audio into text.
[0878] For generating and saving meeting minutes,
[0879] Equipped with voice recognition and data organization functions,
[0880] The system according to claim 1. [Explanation of Symbols]
[0881] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. To provide an assistant program specializing in apartment building management, A natural language processing means that analyzes user inquiries and generates appropriate responses, A translation means for translating the response into the user's native language, An output means that provides the response via a user interface, A learning tool that accumulates and analyzes users' past interaction data to improve future responses, A database that stores information about the management organization and promotes user engagement, A means of acquiring information to provide real-time information on the usage status of public facilities, A system that includes means for providing information to users regarding events.
2. To facilitate communication among users of multiple nationalities, Real-time translation into multiple languages The system according to claim 1.
3. Automatically transcribe meeting audio into text. For generating and saving meeting minutes, Equipped with voice recognition and data organization functions, The system according to claim 1.