system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The AI-powered community management system addresses the challenges of creating meeting minutes, multilingual support, and elderly monitoring, enhancing community operations and resident services through automated audio-to-text conversion, translation, and anomaly detection.

JP2026096564APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

AI Technical Summary

Technical Problem

The operation of local communities is becoming difficult due to the aging and diversification of local communities and the busyness of residents, with a need to improve the efficiency of creating meeting minutes, establish multilingual support, and strengthen the function of watching over the elderly, which conventional methods have not sufficiently addressed.

Method used

A community association management support system using AI technology that automatically generates meeting minutes by converting meeting audio to text, provides multilingual support through automatic translation, and enhances community monitoring by detecting anomalies in elderly activity.

Benefits of technology

The system efficiently creates meeting minutes, facilitates smooth communication across languages, and ensures the safety and well-being of elderly residents, thereby improving the operational efficiency and quality of community services.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096564000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] A means of collecting audio data during a meeting in real time and converting that audio data into text data using speech recognition technology, A means of summarizing the converted text data using natural language processing technology and automatically generating meeting minutes, The means of distributing the generated meeting minutes to participants by email or other means of communication, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance that responds to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] There is a problem that the operation of local communities is becoming difficult due to the aging and diversification of local communities and the busyness of residents due to co-working. In particular, there is a need to improve the efficiency of creating minutes of meetings, establish a multilingual support window, and strengthen the function of watching over the elderly. However, the conventional methods have not been able to sufficiently solve these problems, so a more efficient and flexible operation support system is required.

Means for Solving the Problems

[0005] This invention provides a community association management support system using AI technology. This system automatically generates meeting minutes by collecting meeting audio data in real time and converting it into text data using speech recognition technology. It also improves services for residents with diverse cultural backgrounds by automatically translating inquiries from residents in multiple languages and providing appropriate answers. Furthermore, it enhances community monitoring functions by monitoring the activities of the elderly with sensors and sending alerts to contacts when an anomaly is detected.

[0006] "Audio data" refers to information captured in digital format from the content of conversations during a meeting.

[0007] "Speech recognition technology" is a technology that analyzes speech data and converts it into linguistic text data.

[0008] "Text data" refers to digital information consisting of characters, converted from audio or other formats.

[0009] "Natural language processing technology" refers to computational techniques for analyzing text data and performing summarization and semantic analysis.

[0010] Meeting minutes are official records that summarize the content of a meeting and allow for later reference.

[0011] "Multilingual support" refers to functions and technologies that address the needs of residents who speak different languages.

[0012] "Automatic translation technology" is a technology that converts input text from one language to another.

[0013] A "database" is a collection of data that systematically stores a specific type of information and makes it accessible as needed.

[0014] A "sensor" is a device that detects physical or chemical changes and outputs them as data.

[0015] A "machine learning algorithm" is a computational procedure for learning patterns based on data and performing predictions and analyses.

[0016] "Anomaly detection" is a process of identifying patterns and values different from normal in data.

[0017] "Notification" is an act or system of informing recipients of information about specific events or situations.

Brief Description of Drawings

[0018] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which multiple emotions are mapped. [Figure 10] It shows an emotion map to which multiple emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12]It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when combined with an emotion engine. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when combined with an emotion engine.

Mode for Carrying Out the Invention

[0019] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0020] First, the language used in the following description will be explained.

[0021] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include CPU (Central Processing Unit), GPU (Graphics Processing Unit), GPGPU (General-Purpose computing on Graphics Processing Units), APU (Accelerated Processing Unit), etc.

[0022] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0023] In the following embodiments, the signed storage is one or more non-volatile storage devices that store various programs and various parameters. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes.

[0024] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0025] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0026] [First Embodiment]

[0027] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0028] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0029] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0030] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0031] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0032] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0033] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0034] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0035] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0036] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0037] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0038] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0039] The present invention's community association management support system utilizes AI technology to efficiently support various community association activities. Its main functions include creating meeting minutes, responding to multilingual inquiries from residents, and providing monitoring services for the elderly.

[0040] Meeting minutes creation method

[0041] During a meeting, the terminal collects audio data in real time using microphones installed in the meeting room. This audio data is sent to a server and converted into text data using speech recognition technology. The converted text is summarized using natural language processing technology and automatically generated as meeting minutes. The completed meeting minutes are sent from the server to the participants' email addresses.

[0042] Implementation Examples of Multilingual Support

[0043] When residents contact their local community association, they use a dedicated application to input their questions in text format. This data is sent to a server and translated into Japanese using automatic translation technology. The server then refers to its database to generate an appropriate response, translates that response back into the original language, and provides it to the user. This enables smooth communication without language barriers.

[0044] Implementation of an elderly monitoring service

[0045] To support the lives of elderly people living alone, sensors placed as terminals periodically monitor activity data. The terminals send the collected data to a server, where machine learning algorithms are used to detect anomalies. If an anomaly is detected, the server immediately sends a notification to the designated contacts, helping family members or caregivers respond quickly.

[0046] These functions effectively solve many of the challenges associated with managing neighborhood associations, providing residents with a safe and convenient living environment. Specific examples of its use include generating meeting minutes for monthly neighborhood association meetings, responding to inquiries from foreign residents regarding garbage disposal rules, and providing a regular check-in service for elderly people living alone. Each function operates smoothly and efficiently, playing a role in improving the quality of neighborhood association operations and resident services.

[0047] The following describes the processing flow.

[0048] Step 1:

[0049] The terminal collects audio data in real time through microphones installed in the conference room and transmits that data to the server.

[0050] Step 2:

[0051] The server passes the received audio data to the speech recognition engine, which converts the audio into text data. During this process, speaker identification and timestamps are also added.

[0052] Step 3:

[0053] The server analyzes the generated text data using natural language processing algorithms to extract and summarize the key points of the meeting.

[0054] Step 4:

[0055] The server automatically generates meeting minutes based on the created summary and creates a file in PDF format or other suitable format.

[0056] Step 5:

[0057] The server sends the generated meeting minutes to the designated email address and distributes them to the relevant parties.

[0058] Step 1:

[0059] Users use a dedicated application to enter and submit inquiries in multiple languages.

[0060] Step 2:

[0061] The server passes the received inquiry data to an automatic translation engine, which then translates it into Japanese.

[0062] Step 3:

[0063] The server searches the FAQ database for information and prepares the answer based on the translated text to generate an appropriate response.

[0064] Step 4:

[0065] The server re-translates the prepared response into the language of the requester and sends the response back to the user.

[0066] Step 1:

[0067] The device monitors the activity levels of elderly people through sensors and transmits the collected data to a server.

[0068] Step 2:

[0069] The server uses machine learning algorithms to analyze the transmitted data and monitor the activity patterns of older adults.

[0070] Step 3:

[0071] If the server detects unusual activity or patterns, it sets a flag to proceed to the next step.

[0072] Step 4:

[0073] If a flag is set, the server will send a notification to pre-registered emergency contacts to inform them of the situation.

[0074] (Example 1)

[0075] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0076] In modern society, there is a need to improve the efficiency of meetings related to the operation of local governments, overcome language barriers in multicultural societies, and ensure safe and secure living for the elderly. Traditional methods for addressing these issues are largely reliant on manual labor, requiring significant time and effort, making it difficult to respond efficiently and quickly.

[0077] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0078] In this invention, the server includes means for collecting sound data in real time using a sound acquisition device and converting the sound data into text data using speech analysis technology; means for summarizing the converted text data using natural language processing technology and automatically generating meeting minutes; and means for using communication means to provide the generated meeting minutes to meeting participants. This enables efficient meeting progress and rapid sharing of records.

[0079] A "voice acquisition device" is a device used to collect audio at meetings, events, and other similar occasions, and to transmit that data to a processing device.

[0080] "Audio data" refers to digital data that records the waveform of sounds such as speech, and is a format suitable for information processing.

[0081] "Speech analysis technology" is a technology that analyzes sound data and converts its content into text data.

[0082] "Text data" refers to text-formatted data generated from audio information using speech analysis technology.

[0083] "Natural language processing technology" is a technology that analyzes text data and summarizes or converts it into a form that humans can understand.

[0084] "Meeting minutes" are documents that record the content of meetings and other gatherings so that they can be reviewed later.

[0085] "Communication methods" refer to means used to transmit or receive information, including email and other digital communication methods.

[0086] "Multilingualism" is a term that refers to the ability to express or process multiple different languages.

[0087] "Automatic translation technology" is a technology used to automatically translate text written in one language into another language.

[0088] An "information storage medium" is a medium used to store data and allow it to be retrieved as needed.

[0089] An "observation device" is a device used to monitor the physical environment and collect data.

[0090] An "automated learning algorithm" is an algorithm that improves its performance by learning from data and updating the model.

[0091] "Notification" refers to the act of sending information about a specific event to recipients who have been registered in advance.

[0092] The system according to the present invention provides an efficient and safe environment for the operation of community associations and similar organizations by integrating a voice acquisition device, multilingual support function, and elderly monitoring function.

[0093] Voice acquisition device

[0094] The terminal uses high-sensitivity microphones placed in the conference room to collect audio in real time and transmits the data to the server. The server uses speech analysis technology, specifically speech recognition software, to convert the audio data into text data. Next, the server utilizes natural language processing technology to summarize the converted text data and generate meeting minutes. These minutes are immediately distributed to all participants via email or other means. For example, immediately after the meeting ends, the summarized minutes are sent to all participants, enabling rapid information sharing.

[0095] Multilingual support feature

[0096] Users enter inquiries in multiple languages as text using a dedicated application on their smartphones or other devices. The entered data is sent to a server and translated into Japanese using automatic translation technology. The server accesses the information storage medium, generates an appropriate response, and then translates the response back into the original language before providing it to the user. As a concrete example, foreign residents can ask questions to the neighborhood association in their native language and receive immediate responses. An example of a prompt would be, "Please propose a program that automatically generates minutes of neighborhood association meetings."

[0097] Elderly monitoring function

[0098] The device periodically monitors the elderly person's daily activities through an observation device (sensor) installed in their home and transmits the data to a server. The server uses an automated learning algorithm to analyze the activity data and sends a notification to pre-registered contacts if an anomaly is detected. This function supports the peace of mind of elderly people living alone; specifically, if the elderly person's scheduled activities cannot be confirmed, a notification is sent immediately, allowing for timely action.

[0099] By combining these technologies, a system can be realized that contributes to the smooth operation of neighborhood associations and the improvement of the quality of services provided to residents.

[0100] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0101] Processing steps of the voice acquisition device

[0102] Step 1:

[0103] The terminal collects audio in real time using microphones installed in the conference room. The input is audio data, which is converted to a digital format and pre-processed to remove noise. The data is then adjusted to a format that can be sent directly to the server.

[0104] Step 2:

[0105] The server converts received digital audio data into text data using speech analysis technology. Specifically, it uses acoustic and language models to perform feature extraction and decoding, generating text data as output.

[0106] Step 3:

[0107] The server summarizes text data using natural language processing techniques. The input is the previously converted character data, and a generative AI model is used to extract relevant and important parts and create a summary. A concise meeting minutes document is generated as output.

[0108] Step 4:

[0109] The server provides the completed meeting minutes to the meeting participants via communication means. Specifically, it uses email to automatically send the minutes to the participants' addresses. This process also includes setting the email format and performing error checking.

[0110] Multilingual support processing steps

[0111] Step 1:

[0112] The user opens a dedicated app on their smartphone and enters the question. The input is in text data format, and this data is sent to the server via the app.

[0113] Step 2:

[0114] The server translates the received multilingual questions into Japanese using automatic translation technology. Specifically, it uses a translation engine to analyze the input text and outputs the corresponding Japanese words and phrases.

[0115] Step 3:

[0116] The server retrieves appropriate information from the information storage medium and generates an answer based on the translated question. This includes a process of referencing data using a generative AI model and extracting and formatting an answer that is appropriate for the question.

[0117] Step 4:

[0118] The server re-translates the generated response back into the original language and provides it to the user. Using a prompt, the user translates it back into the original language through the re-translation engine and sends it back to the app as text data.

[0119] Processing steps for the elderly monitoring function

[0120] Step 1:

[0121] The terminal periodically monitors activity information using sensors placed in the homes of elderly individuals. Inputs include motion data and environmental data, which are then formatted into a standard format and sent to the server.

[0122] Step 2:

[0123] The server analyzes the received data using an automated learning algorithm. It then runs an anomaly monitoring model using the input dataset to detect pattern changes. Deviations from the normal range are obtained as deliverables.

[0124] Step 3:

[0125] If an anomaly is detected, the server will send a notification to pre-registered contacts. This process involves sending SMS or emails via various communication methods, and additional adjustments may be made as needed.

[0126] (Application Example 1)

[0127] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0128] In managing local communities, diverse needs exist, including taking meeting minutes, providing multilingual support among residents, and monitoring the elderly. However, a lack of efficient technical support increases the burden on these tasks, leading to challenges such as insufficient communication among residents and concerns about the safety of the elderly. Furthermore, traditional systems struggle with real-time information sharing, highlighting the need for improved convenience through the use of smartphones.

[0129] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0130] This invention includes a server that collects audio information during a meeting in real time and converts that audio information into text information using speech recognition technology; a server that summarizes the converted text information using natural language processing technology and automatically generates meeting minutes; a server that uses email or other communication methods to distribute the generated meeting minutes to members; and a server that uses a program installed on a smartphone to obtain and view the meeting minutes of the neighborhood association. This enables more efficient management of the neighborhood association and faster provision of services to residents.

[0131] "Audio information" refers to data recorded in digital format from sounds that occur in settings such as meetings.

[0132] "Speech recognition technology" is a technology that uses computers to analyze human speech and convert it into text information.

[0133] "Textual information" refers to text data converted from speech using speech recognition technology.

[0134] "Natural language processing technology" is a technology that allows computers to understand and process human language, and to perform tasks such as summarizing and translating it.

[0135] "Methods for automatically generating meeting minutes" refer to technologies that organize meeting content as text and generate it in a format that can be distributed to participants.

[0136] "Members" refer to people who participate in meetings or communities and are involved in discussions and decisions.

[0137] "Email" is a method of exchanging textual information over the internet.

[0138] "Communication methods" is a general term for the technologies and means used to send and receive information.

[0139] A "smartphone" is a multi-functional mobile phone device that can connect to the internet.

[0140] "A method for obtaining and viewing neighborhood association meeting minutes using a program" refers to a method of viewing the generated meeting minutes using an application on a smartphone.

[0141] The system for realizing this invention will be built as a software solution to support the operation of community associations. The server will acquire audio information from microphones set up during meetings and convert it into text information using speech recognition technology. This will utilize speech recognition services such as Google® Cloud Speech-to-Text API as hardware. The converted text information will be summarized using natural language processing technology. Google Cloud Natural Language API and similar services will be used in this process. The generated meeting minutes can be viewed through an application on mobile devices, including smartphones. This application will help streamline the acquisition and distribution of meeting minutes.

[0142] Furthermore, multilingual inquiries from residents are received by the server and translated into the specified language using automatic translation technology. The Google Cloud Translation API is used for translation, and combined with information extraction from the database, it generates appropriate responses. The generated responses are then translated back into the original language and provided to the user. This enables smooth communication with residents who speak foreign languages.

[0143] For monitoring the elderly, activity data collected through sensor devices is sent to a server, where machine learning algorithms are used to detect anomalies. This anomaly detection is then notified to the smartphones of pre-registered contacts, enabling a quick response. Specific examples include residents who cannot attend monthly neighborhood meetings viewing meeting minutes via a smartphone app to deepen their understanding, and new foreign residents using the app to inquire about garbage disposal procedures in their native language.

[0144] As an example of a prompt, you could ask, "Please display the minutes of the recent neighborhood association meeting regarding the local crime prevention plan that was discussed."

[0145] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0146] Step 1:

[0147] The server acquires audio information received from terminals during a meeting. The input is digital audio information collected from the microphone, and this information is converted into text using the Google Cloud Speech-to-Text API. The output is text data. A speech recognition service is activated, analyzing the audio and converting it to text in real time.

[0148] Step 2:

[0149] The server analyzes the acquired text information and summarizes it using natural language processing techniques. The input is the text information generated in step 1, which is then summarized using the Google Cloud Natural Language API. The output is the summarized text. The natural language processing algorithm operates at this stage, extracting important information and generating a summary.

[0150] Step 3:

[0151] The server prepares the summarized meeting minutes for distribution to smartphones. The input is the summarized text generated in step 2, and the output is the meeting minutes organized into a distribution format. These minutes are stored in a database for distribution to members via email or app.

[0152] Step 4:

[0153] Users retrieve and view meeting minutes from an app using their smartphones. The input is the meeting minutes data distributed in step 3, and the output is the meeting minutes information displayed on the user's screen. The application retrieves the necessary information from the database and provides an interface to display it on the screen.

[0154] Step 5:

[0155] The server receives multilingual inquiries submitted by residents. The input is user-generated text inquiry data, which is then translated into the specified language using the Google Cloud Translation API. The output is the translated text data. The server automatically identifies the language and performs the appropriate translation.

[0156] Step 6:

[0157] The server extracts relevant information from the database based on the translated query and generates a response. The input is the translated query text obtained in step 5, and the output is the response text. In the information extraction process, information appropriate to the query is immediately searched for and the response is constructed.

[0158] Step 7:

[0159] The server re-translates the generated responses back into the residents' original language. The input is the response text generated in step 6, and the output is the re-translated output text. The Google Cloud Translation API is used again to perform an accurate translation back into the original language.

[0160] Step 8:

[0161] The device monitors the elderly person's activity data based on notifications from the server and sends a notification if an anomaly is detected. The input is activity data from the sensor, and the output is an anomaly detection notification. The sensor records daily activities, and a machine learning algorithm analyzes the patterns, so if an anomaly is detected, the server immediately notifies the family or caregiver.

[0162] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0163] The community association management support system of the present invention utilizes AI technology to efficiently support various community association events. In particular, by combining it with an emotion engine, the present invention can recognize emotions during meetings and interactions with residents, and reflect them in meeting minutes and services. This makes it possible to make communication more effective and improve the quality of management.

[0164] Implementation methods for meeting minute creation and sentiment recognition

[0165] During a meeting, the terminal uses microphones installed in the meeting room to collect audio data in real time. This audio data is sent to a server, where it is converted into text data and sentiment data using speech recognition technology and an emotion engine. The converted text is summarized using natural language processing technology, and when meeting minutes are automatically generated, participant sentiment information is also taken into account. For example, if the system recognizes a situation during the meeting where participants have conflicting opinions, the meeting minutes will record not only the content of those opinions but also that the discussion was heated.

[0166] Embodiments of emotion recognition in multilingual support

[0167] When residents make an inquiry, they use a dedicated application to input their question in text format. This data is sent to a server, where it is translated into Japanese using automatic translation technology, and then an emotion engine recognizes the emotional tone of the content. Based on the translated text and the recognized emotional tone, the server prepares a more appropriate response, which is then re-translated and provided to the user. This allows for a more considerate response if, for example, the inquiry contains emotions such as dissatisfaction or urgency.

[0168] Implementation of emotion recognition in elderly monitoring services

[0169] The system uses wearable devices worn by elderly individuals to monitor their voice and behavior. This data is sent to a server, where an emotion engine analyzes the voice to determine the individual's emotional state. The server detects any deviations from normal conversation or behavior and, if an emotionally unstable state persists, sends an emergency notification to family members or caregivers that takes this into account. For example, if an elderly person repeatedly expresses irritability or uses irritable language, this system facilitates mental health care in addition to addressing any normal physical abnormalities.

[0170] This system not only improves the efficiency of community association operations but also enables flexible responses that are attentive to people's latent needs and emotions, thereby contributing to improved human interaction and welfare within the community.

[0171] The following describes the processing flow.

[0172] Step 1:

[0173] The terminal collects audio data through the microphone used during the meeting and transmits that audio data to the server.

[0174] Step 2:

[0175] The server converts the received audio data into text data using speech recognition technology, and simultaneously uses an emotion engine to recognize the participant's emotions from the audio.

[0176] Step 3:

[0177] The server analyzes the converted text data using a natural language processing algorithm and generates a summary. During this process, recognized sentiment information is added to the summary.

[0178] Step 4:

[0179] The server formats meeting minutes with sentiment information and distributes them to participants via email or other means of communication.

[0180] Step 1:

[0181] Users use a dedicated application to enter their inquiries in text format and send them to the server.

[0182] Step 2:

[0183] The server translates the inquiry into the specified language using an automatic translation engine, and simultaneously recognizes the emotional tone of the inquiry using an emotion engine.

[0184] Step 3:

[0185] The server generates the most suitable response from its database based on the translated text and emotional tone.

[0186] Step 4:

[0187] The server re-translates the generated response back into the original language and returns it to the user, reflecting the emotional tone.

[0188] Step 1:

[0189] The device monitors activity and voice data using sensors installed in the living environment of elderly people and transmits the data to a server.

[0190] Step 2:

[0191] The server analyzes the monitored data, uses machine learning algorithms to detect behavioral anomalies, and evaluates emotional states using an emotion engine.

[0192] Step 3:

[0193] If the server detects any abnormalities or unstable emotional states, it will send an alert to the designated contacts to notify them of the information.

[0194] (Example 2)

[0195] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0196] In managing neighborhood associations, there is a need to efficiently and emotionally handle tasks such as creating meeting minutes, responding to residents' inquiries in multiple languages, and monitoring the elderly. However, with conventional technologies, processes such as transcribing voice data into text, responding to inquiries, and detecting anomalies based on the emotional state of the elderly are performed individually, and there is a lack of a system for integrated and efficient operation. This is a problem because it complicates the management of neighborhood associations and communities, and increases the burden on personnel.

[0197] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0198] In this invention, the server includes means for collecting acoustic information in real time and converting that acoustic information into text information using conversion technology; means for summarizing the converted text information using processing technology and automatically generating a document; means for using communication technology to distribute the generated document to participants; and means for extracting emotional information from the acoustic information and integrating the extracted emotional information into the document. This enables efficient decision-making and flexible responses that take emotions into account in the management of neighborhood associations and communities.

[0199] "Acoustic information" refers to all audio and sound data obtained from the environment, and is fundamental information for processing sound as a signal.

[0200] "Conversion technology" refers to technologies for converting audio or sound data into text data or other data formats, and includes technologies such as speech recognition.

[0201] "Textual information" refers to data that represents audio or other data as text, providing information in written form.

[0202] "Processing technology" refers to techniques for analyzing, summarizing, or reconstructing text data and other information, and includes techniques for natural language processing.

[0203] A “document” is a set of textual information generated for a specific purpose, including meeting minutes and other records.

[0204] "Communication technology" refers to the means and techniques for delivering generated information and data to others, and includes technologies such as email and other digital communications.

[0205] "Emotional information" refers to data that indicates a psychological or emotional state, analyzed from audio or text, and highlights a person's emotional tone.

[0206] A "measuring device" refers to a device or apparatus used to monitor a subject's activity, health status, or emotional state.

[0207] This invention is a system for streamlining community association operations and providing information that takes into account the feelings of participants and residents. Specific embodiments are described below.

[0208] Processing of acoustic information

[0209] The terminal uses microphones installed in the conference room to collect acoustic information from participants in real time. The collected acoustic information is immediately transmitted to the server. This process uses highly sensitive acoustic sensors and a secure data transmission protocol.

[0210] The server converts received acoustic information into text information using conversion technology. For this process, it utilizes a general API service that provides speech recognition technology, for example, as its speech recognition engine. The converted text information is further summarized using processing technology. The natural language processing technologies used here include OpenAI® generative AI models and other natural language processing frameworks.

[0211] The server then extracts emotional information from the acoustic data and uses an emotion analysis engine to analyze the emotional tones of the participants. The emotional information is integrated into the generated document, and meeting minutes that take into account the emotional aspects of the meeting are automatically produced.

[0212] Multilingual support

[0213] Users submit inquiries in text format through a dedicated application on their device. The user's input is sent to the server and translated into the specified language using automatic translation technology. The translation technology utilizes a multilingual online translation service.

[0214] The server analyzes the emotional tone of incoming inquiries and adjusts the response accordingly. The response is generated using a generative AI model, re-translated into the original language, and then provided to the user. This enables emotionally sensitive responses even to inquiries from residents.

[0215] Specific example

[0216] For example, in the processing of acoustic information, if someone strongly states "I cannot agree with this proposal" during a meeting, that emotion will be recorded in the minutes as "Mr. / Ms. XX expressed strong opposition to the proposal."

[0217] Example of a prompt

[0218] "Please transcribe today's meeting discussions, analyze the emotions involved, and then summarize them."

[0219] "Please analyze multilingual inquiries and automatically generate appropriate responses that take emotional tone into consideration."

[0220] In this way, the system supports diverse forms of communication and enables high-quality management of community associations.

[0221] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0222] Step 1:

[0223] The terminal uses microphones installed in the conference room to collect acoustic information from participants in real time. This acoustic information is acquired as analog data and transmitted to the server as a digital signal. Specifically, the audio input device converts sound waves into electrical signals.

[0224] Step 2:

[0225] The server converts received digital audio information into text information using conversion technology. This process uses a speech recognition engine to analyze the audio signal and generate corresponding text data. As a result, the conversation content is output in text format. Specifically, this involves collaborative processing by an acoustic model and a language model.

[0226] Step 3:

[0227] The server further summarizes the converted text information using processing techniques. It employs natural language processing techniques to extract key information and generate a summary. The input is text data, and the output is a concise summary. Specific operations include tokenization and grammatical analysis.

[0228] Step 4:

[0229] The server extracts emotional information from text data. Using an emotion analysis engine, it analyzes the emotional tone of each statement and assigns an emotional label to the text. In this step, text data is input, and labels indicating the emotional state are output. Specifically, the tone and emphasis patterns of the speech are analyzed.

[0230] Step 5:

[0231] The server integrates emotional information into documents and generates meeting minutes. Using a generation AI model, it constructs meeting minutes that include emotional tone. The input consists of a summary and emotional information, and the output is meeting minutes that take emotions into account. Specifically, the document structure and content are adjusted.

[0232] Step 6:

[0233] Users submit inquiries through a dedicated application, and the text is sent to the server. The input is multilingual text, which is passed to the server as inquiry data. Specifically, this involves digitizing the captured user input.

[0234] Step 7:

[0235] The server translates received queries into the specified language using translation technology. Multilingual support technology is used to perform translations between different languages. The input is text data in the original language from the user, and the output is text data in the specified language. The use of a translation engine is a concrete example.

[0236] Step 8:

[0237] The server analyzes the sentiment tone of the translated query. Sentiment analysis techniques are used to evaluate the emotional state of the query. The input is the translated text, and the output is data indicating the sentiment tone. Specifically, this includes categorizing the sentiment of the text.

[0238] Step 9:

[0239] The server generates an appropriate response based on emotional information, re-translates it, and returns it to the user. A generative AI model is used to construct emotionally sensitive responses. The output is the response in the original language. Specifically, the process involves response generation and adjustment of the translation results.

[0240] (Application Example 2)

[0241] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0242] In elderly care settings, it is difficult to grasp the emotional changes of elderly individuals in real time, resulting in the inability to provide appropriate care approaches immediately. Furthermore, in handling inquiries in multiple languages, it is necessary to provide prompt and appropriate responses that take emotions into consideration.

[0243] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0244] In this invention, the server includes means for collecting audio data in real time and converting the audio data into text data using acoustic processing technology, means for summarizing the converted text data using natural language information processing technology and automatically generating a recorded document, and means for providing real-time visual feedback on the emotional state of the elderly person using a wearable display device. This enables appropriate care responses in accordance with the emotional state of the elderly person.

[0245] "Audio data" refers to sound information that records the content of conversations and speech.

[0246] "Real-time data collection" means acquiring data simultaneously with its occurrence or with a short delay.

[0247] "Audio processing technology" is a technology for analyzing sound and converting it into textual information.

[0248] "Character data" refers to data in which information is written as characters.

[0249] "Natural language processing technology" refers to the technology that allows computers to understand, interpret, and generate language used by humans.

[0250] "Summarizing" is the process of extracting only the essential parts of a large amount of information and putting them together concisely.

[0251] A "record document" is a document in which specific information is stored in text format.

[0252] A "wearable display device" is a device that is worn on the body to display information.

[0253] "Emotional state" refers to a psychological or emotional situation or change.

[0254] The system that realizes this invention enables real-time voice and emotion analysis in elderly care and multilingual inquiry handling. The system includes various devices and servers and functions as follows:

[0255] On the device side, a wearable display device is used. This device collects the voice of elderly people in real time and sends the data to a server. The server converts this voice data into text data using acoustic processing technology. For example, everyday conversations of elderly people can be acquired directly as data via smart glasses or similar devices.

[0256] Next, the server summarizes the converted text data using natural language processing technology and automatically generates a record document. For example, if an elderly person says, "I'm a little tired today," the content is quickly converted into text and visually fed back to the care staff.

[0257] Furthermore, a wearable display device visually shows the analyzed emotional state of the elderly person. This allows care staff to intuitively understand the psychological state of the elderly person and provide more appropriate care.

[0258] As a concrete example, a prompt for a generative AI model could be something like, "Design a system that analyzes the speech of elderly people, understands their emotional tone, and displays it in real time on a smart device. The voice data will be processed in the cloud, and the results will be fed back to a wearable display device."

[0259] This system will enable flexible and prompt responses in care settings that are tailored to the emotional state of elderly individuals, leading to improved communication between caregivers and caregivers.

[0260] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0261] Step 1:

[0262] A device (wearable display device) collects the speech of elderly individuals using a microphone and acquires it as audio data. The input is the speech of the elderly individuals, and the output is the collected audio data. The device collects data in real time and immediately transfers it to the server.

[0263] Step 2:

[0264] The server converts received audio data into text data using acoustic processing technology. Speech recognition software is used for this conversion. The input is audio data, and the output is text data. The server performs this data conversion process in real time.

[0265] Step 3:

[0266] The server summarizes the converted text data using natural language processing technology and automatically generates a recorded document. The input is text data, and the output is a summarized recorded document. A data analysis algorithm considers linguistic characteristics and extracts important information.

[0267] Step 4:

[0268] The server uses an emotion analysis engine to analyze the emotional state of elderly individuals from text data. The input is text data, and the output is the result of the emotional state analysis. Through emotion analysis, the server detects feelings from the tone and content of speech.

[0269] Step 5:

[0270] The wearable display device feeds back analyzed emotional states, visually displaying the emotional state of elderly individuals. The input is the analysis result of the emotional state, and the output is the displayed emotional information. The device provides a visual and intuitive interface, enabling care staff to respond quickly to the situation.

[0271] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0272] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0273] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0274] [Second Embodiment]

[0275] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0276] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0277] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0278] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0279] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0280] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0281] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0282] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0283] The specific processing program 56 is an example of the "program" according to the technology of the present disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by operating as the specific processing unit 290 according to the specific processing program 56 executed by the processor 28 on the RAM 30.

[0284] The storage 32 stores a data generation model 58 and an emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the specific processing unit 290.

[0285] In the smart glasses 214, reception / output processing is performed by the processor 46. The storage 50 stores a reception / output program 60. The processor 46 reads the reception / output program 60 from the storage 50 and executes the read reception / output program 60 on the RAM 48. The reception / output processing is realized by operating as the control unit 46A according to the reception / output program 60 executed by the processor 46 on the RAM 48.

[0286] Next, the specific processing by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 is referred to as a "server", and the smart glasses 214 are referred to as a "terminal".

[0287] The neighborhood association operation support system of the present invention uses AI technology to efficiently support various activities of the neighborhood association. The main functions include creating meeting minutes, handling multilingual inquiries from residents, and providing a monitoring service for the elderly.

[0288] Embodiment of creating meeting minutes

[0289] During a meeting, the terminal collects audio data in real time using microphones installed in the meeting room. This audio data is sent to a server and converted into text data using speech recognition technology. The converted text is summarized using natural language processing technology and automatically generated as meeting minutes. The completed meeting minutes are sent from the server to the participants' email addresses.

[0290] Implementation Examples of Multilingual Support

[0291] When residents contact their local community association, they use a dedicated application to input their questions in text format. This data is sent to a server and translated into Japanese using automatic translation technology. The server then refers to its database to generate an appropriate response, translates that response back into the original language, and provides it to the user. This enables smooth communication without language barriers.

[0292] Implementation of an elderly monitoring service

[0293] To support the lives of elderly people living alone, sensors placed as terminals periodically monitor activity data. The terminals send the collected data to a server, where machine learning algorithms are used to detect anomalies. If an anomaly is detected, the server immediately sends a notification to the designated contacts, helping family members or caregivers respond quickly.

[0294] These functions effectively solve many of the challenges associated with managing neighborhood associations, providing residents with a safe and convenient living environment. Specific examples of its use include generating meeting minutes for monthly neighborhood association meetings, responding to inquiries from foreign residents regarding garbage disposal rules, and providing a regular check-in service for elderly people living alone. Each function operates smoothly and efficiently, playing a role in improving the quality of neighborhood association operations and resident services.

[0295] The following describes the processing flow.

[0296] Step 1:

[0297] The terminal collects voice data in real time through the microphone installed in the meeting room and sends the data to the server.

[0298] Step 2:

[0299] The server passes the received voice data to the voice recognition engine to convert the voice into text data. At this time, the speaker is also distinguished and a time stamp is added.

[0300] Step 3:

[0301] The server analyzes the generated text data by means of a natural language processing algorithm to extract and summarize the key points of the meeting.

[0302] Step 4:

[0303] The server automatically generates minutes of the meeting based on the created summary and generates a file in PDF format or the like.

[0304] Step 5: <G

[0305] The server sends the generated minutes of the meeting to the designated email address and distributes them to the relevant parties.

[0306] <00G00965>Step 1:

[0307] The user uses a dedicated application to input and send the content of the inquiry in multiple languages.

[0308] Step 2:

[0309] The server passes the received inquiry data to the automatic translation engine and translates it into Japanese.

[0310] Step 3:

[0311] It should be noted that there is a "G0000961" and "00G00965" in the original text which might be incorrect tags. I've translated them as they are but they might need to be corrected in the original context.The server searches the FAQ database for information and prepares the answer based on the translated text to generate an appropriate response.

[0312] Step 4:

[0313] The server re-translates the prepared response into the language of the requester and sends the response back to the user.

[0314] Step 1:

[0315] The device monitors the activity levels of elderly people through sensors and transmits the collected data to a server.

[0316] Step 2:

[0317] The server uses machine learning algorithms to analyze the transmitted data and monitor the activity patterns of older adults.

[0318] Step 3:

[0319] If the server detects unusual activity or patterns, it sets a flag to proceed to the next step.

[0320] Step 4:

[0321] If a flag is set, the server will send a notification to pre-registered emergency contacts to inform them of the situation.

[0322] (Example 1)

[0323] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0324] In modern society, there is a need to improve the efficiency of meetings related to the operation of local governments, overcome language barriers in multicultural societies, and ensure safe and secure living for the elderly. Traditional methods for addressing these issues are largely reliant on manual labor, requiring significant time and effort, making it difficult to respond efficiently and quickly.

[0325] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0326] In this invention, the server includes means for collecting sound data in real time using a sound acquisition device and converting the sound data into text data using speech analysis technology; means for summarizing the converted text data using natural language processing technology and automatically generating meeting minutes; and means for using communication means to provide the generated meeting minutes to meeting participants. This enables efficient meeting progress and rapid sharing of records.

[0327] A "voice acquisition device" is a device used to collect audio at meetings, events, and other similar occasions, and to transmit that data to a processing device.

[0328] "Audio data" refers to digital data that records the waveform of sounds such as speech, and is a format suitable for information processing.

[0329] "Speech analysis technology" is a technology that analyzes sound data and converts its content into text data.

[0330] "Text data" refers to text-formatted data generated from audio information using speech analysis technology.

[0331] "Natural language processing technology" is a technology that analyzes text data and summarizes or converts it into a form that humans can understand.

[0332] "Meeting minutes" are documents that record the content of meetings and other gatherings so that they can be reviewed later.

[0333] "Communication methods" refer to means used to transmit or receive information, including email and other digital communication methods.

[0334] "Multilingualism" is a term that refers to the ability to express or process multiple different languages.

[0335] "Automatic translation technology" is a technology used to automatically translate text written in one language into another language.

[0336] An "information storage medium" is a medium used to store data and allow it to be retrieved as needed.

[0337] An "observation device" is a device used to monitor the physical environment and collect data.

[0338] An "automated learning algorithm" is an algorithm that improves its performance by learning from data and updating the model.

[0339] "Notification" refers to the act of sending information about a specific event to recipients who have been registered in advance.

[0340] The system according to the present invention provides an efficient and safe environment for the operation of community associations and similar organizations by integrating a voice acquisition device, multilingual support function, and elderly monitoring function.

[0341] Voice acquisition device

[0342] The terminal uses high-sensitivity microphones placed in the conference room to collect audio in real time and transmits the data to the server. The server uses speech analysis technology, specifically speech recognition software, to convert the audio data into text data. Next, the server utilizes natural language processing technology to summarize the converted text data and generate meeting minutes. These minutes are immediately distributed to all participants via email or other means. For example, immediately after the meeting ends, the summarized minutes are sent to all participants, enabling rapid information sharing.

[0343] Multilingual support feature

[0344] Users enter inquiries in multiple languages as text using a dedicated application on their smartphones or other devices. The entered data is sent to a server and translated into Japanese using automatic translation technology. The server accesses the information storage medium, generates an appropriate response, and then translates the response back into the original language before providing it to the user. As a concrete example, foreign residents can ask questions to the neighborhood association in their native language and receive immediate responses. An example of a prompt would be, "Please propose a program that automatically generates minutes of neighborhood association meetings."

[0345] Elderly monitoring function

[0346] The device periodically monitors the elderly person's daily activities through an observation device (sensor) installed in their home and transmits the data to a server. The server uses an automated learning algorithm to analyze the activity data and sends a notification to pre-registered contacts if an anomaly is detected. This function supports the peace of mind of elderly people living alone; specifically, if the elderly person's scheduled activities cannot be confirmed, a notification is sent immediately, allowing for timely action.

[0347] By combining these technologies, a system can be realized that contributes to the smooth operation of neighborhood associations and the improvement of the quality of services provided to residents.

[0348] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0349] Processing steps of the voice acquisition device

[0350] Step 1:

[0351] The terminal collects audio in real time using microphones installed in the conference room. The input is audio data, which is converted to a digital format and pre-processed to remove noise. The data is then adjusted to a format that can be sent directly to the server.

[0352] Step 2:

[0353] The server converts received digital audio data into text data using speech analysis technology. Specifically, it uses acoustic and language models to perform feature extraction and decoding, generating text data as output.

[0354] Step 3:

[0355] The server summarizes text data using natural language processing techniques. The input is the previously converted character data, and a generative AI model is used to extract relevant and important parts and create a summary. A concise meeting minutes document is generated as output.

[0356] Step 4:

[0357] The server provides the completed meeting minutes to the meeting participants via communication means. Specifically, it uses email to automatically send the minutes to the participants' addresses. This process also includes setting the email format and performing error checking.

[0358] Multilingual support processing steps

[0359] Step 1:

[0360] The user opens a dedicated app on their smartphone and enters the question. The input is in text data format, and this data is sent to the server via the app.

[0361] Step 2:

[0362] The server translates the received multilingual questions into Japanese using automatic translation technology. Specifically, it uses a translation engine to analyze the input text and outputs the corresponding Japanese words and phrases.

[0363] Step 3:

[0364] The server retrieves appropriate information from the information storage medium and generates an answer based on the translated question. This includes a process of referencing data using a generative AI model and extracting and formatting an answer that is appropriate for the question.

[0365] Step 4:

[0366] The server re-translates the generated response back into the original language and provides it to the user. Using a prompt, the user translates it back into the original language through the re-translation engine and sends it back to the app as text data.

[0367] Processing steps for the elderly monitoring function

[0368] Step 1:

[0369] The terminal periodically monitors activity information using sensors placed in the homes of elderly individuals. Inputs include motion data and environmental data, which are then formatted into a standard format and sent to the server.

[0370] Step 2:

[0371] The server analyzes the received data using an automated learning algorithm. It then runs an anomaly monitoring model using the input dataset to detect pattern changes. Deviations from the normal range are obtained as deliverables.

[0372] Step 3:

[0373] If an anomaly is detected, the server will send a notification to pre-registered contacts. This process involves sending SMS or emails via various communication methods, and additional adjustments may be made as needed.

[0374] (Application Example 1)

[0375] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0376] In managing local communities, diverse needs exist, including taking meeting minutes, providing multilingual support among residents, and monitoring the elderly. However, a lack of efficient technical support increases the burden on these tasks, leading to challenges such as insufficient communication among residents and concerns about the safety of the elderly. Furthermore, traditional systems struggle with real-time information sharing, highlighting the need for improved convenience through the use of smartphones.

[0377] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0378] This invention includes a server that collects audio information during a meeting in real time and converts that audio information into text information using speech recognition technology; a server that summarizes the converted text information using natural language processing technology and automatically generates meeting minutes; a server that uses email or other communication methods to distribute the generated meeting minutes to members; and a server that uses a program installed on a smartphone to obtain and view the meeting minutes of the neighborhood association. This enables more efficient management of the neighborhood association and faster provision of services to residents.

[0379] "Audio information" refers to data recorded in digital format from sounds that occur in settings such as meetings.

[0380] "Speech recognition technology" is a technology that uses computers to analyze human speech and convert it into text information.

[0381] "Textual information" refers to text data converted from speech using speech recognition technology.

[0382] "Natural language processing technology" is a technology that allows computers to understand and process human language, and to perform tasks such as summarizing and translating it.

[0383] "Methods for automatically generating meeting minutes" refer to technologies that organize meeting content as text and generate it in a format that can be distributed to participants.

[0384] "Members" refer to people who participate in meetings or communities and are involved in discussions and decisions.

[0385] "Email" is a method of exchanging textual information over the internet.

[0386] "Communication methods" is a general term for the technologies and means used to send and receive information.

[0387] A "smartphone" is a multi-functional mobile phone device that can connect to the internet.

[0388] "A method for obtaining and viewing neighborhood association meeting minutes using a program" refers to a method of viewing the generated meeting minutes using an application on a smartphone.

[0389] The system for realizing this invention will be built as a software solution to support the operation of community associations. The server will acquire audio information from microphones set up during meetings and convert it into text using speech recognition technology. This will utilize speech recognition services such as the Google Cloud Speech-to-Text API as hardware. The converted text information will be summarized using natural language processing technology. This process will utilize services such as the Google Cloud Natural Language API. The generated meeting minutes will be viewable on mobile devices, including smartphones, through an application. This application will help streamline the acquisition and distribution of meeting minutes.

[0390] Furthermore, multilingual inquiries from residents are received by the server and translated into the specified language using automatic translation technology. The Google Cloud Translation API is used for translation, and combined with information extraction from the database, it generates appropriate responses. The generated responses are then translated back into the original language and provided to the user. This enables smooth communication with residents who speak foreign languages.

[0391] For monitoring the elderly, activity data collected through sensor devices is sent to a server, where machine learning algorithms are used to detect anomalies. This anomaly detection is then notified to the smartphones of pre-registered contacts, enabling a quick response. Specific examples include residents who cannot attend monthly neighborhood meetings viewing meeting minutes via a smartphone app to deepen their understanding, and new foreign residents using the app to inquire about garbage disposal procedures in their native language.

[0392] As an example of a prompt, you could ask, "Please display the minutes of the recent neighborhood association meeting regarding the local crime prevention plan that was discussed."

[0393] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0394] Step 1:

[0395] The server acquires audio information received from terminals during a meeting. The input is digital audio information collected from the microphone, and this information is converted into text using the Google Cloud Speech-to-Text API. The output is text data. A speech recognition service is activated, analyzing the audio and converting it to text in real time.

[0396] Step 2:

[0397] The server analyzes the acquired text information and summarizes it using natural language processing techniques. The input is the text information generated in step 1, which is then summarized using the Google Cloud Natural Language API. The output is the summarized text. The natural language processing algorithm operates at this stage, extracting important information and generating a summary.

[0398] Step 3:

[0399] The server prepares the summarized meeting minutes for distribution to smartphones. The input is the summarized text generated in step 2, and the output is the meeting minutes organized into a distribution format. These minutes are stored in a database for distribution to members via email or app.

[0400] Step 4:

[0401] Users retrieve and view meeting minutes from an app using their smartphones. The input is the meeting minutes data distributed in step 3, and the output is the meeting minutes information displayed on the user's screen. The application retrieves the necessary information from the database and provides an interface to display it on the screen.

[0402] Step 5:

[0403] The server receives multilingual inquiries submitted by residents. The input is user-generated text inquiry data, which is then translated into the specified language using the Google Cloud Translation API. The output is the translated text data. The server automatically identifies the language and performs the appropriate translation.

[0404] Step 6:

[0405] The server extracts relevant information from the database based on the translated query and generates a response. The input is the translated query text obtained in step 5, and the output is the response text. In the information extraction process, information appropriate to the query is immediately searched for and the response is constructed.

[0406] Step 7:

[0407] The server re-translates the generated responses back into the residents' original language. The input is the response text generated in step 6, and the output is the re-translated output text. The Google Cloud Translation API is used again to perform an accurate translation back into the original language.

[0408] Step 8:

[0409] The device monitors the elderly person's activity data based on notifications from the server and sends a notification if an anomaly is detected. The input is activity data from the sensor, and the output is an anomaly detection notification. The sensor records daily activities, and a machine learning algorithm analyzes the patterns, so if an anomaly is detected, the server immediately notifies the family or caregiver.

[0410] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0411] The community association management support system of the present invention utilizes AI technology to efficiently support various community association events. In particular, by combining it with an emotion engine, the present invention can recognize emotions during meetings and interactions with residents, and reflect them in meeting minutes and services. This makes it possible to make communication more effective and improve the quality of management.

[0412] Implementation methods for meeting minute creation and sentiment recognition

[0413] During a meeting, the terminal uses microphones installed in the meeting room to collect audio data in real time. This audio data is sent to a server, where it is converted into text data and sentiment data using speech recognition technology and an emotion engine. The converted text is summarized using natural language processing technology, and when meeting minutes are automatically generated, participant sentiment information is also taken into account. For example, if the system recognizes a situation during the meeting where participants have conflicting opinions, the meeting minutes will record not only the content of those opinions but also that the discussion was heated.

[0414] Embodiments of emotion recognition in multilingual support

[0415] When residents make an inquiry, they use a dedicated application to input their question in text format. This data is sent to a server, where it is translated into Japanese using automatic translation technology, and then an emotion engine recognizes the emotional tone of the content. Based on the translated text and the recognized emotional tone, the server prepares a more appropriate response, which is then re-translated and provided to the user. This allows for a more considerate response if, for example, the inquiry contains emotions such as dissatisfaction or urgency.

[0416] Implementation of emotion recognition in elderly monitoring services

[0417] The system uses wearable devices worn by elderly individuals to monitor their voice and behavior. This data is sent to a server, where an emotion engine analyzes the voice to determine the individual's emotional state. The server detects any deviations from normal conversation or behavior and, if an emotionally unstable state persists, sends an emergency notification to family members or caregivers that takes this into account. For example, if an elderly person repeatedly expresses irritability or uses irritable language, this system facilitates mental health care in addition to addressing any normal physical abnormalities.

[0418] This system not only improves the efficiency of community association operations but also enables flexible responses that are attentive to people's latent needs and emotions, thereby contributing to improved human interaction and welfare within the community.

[0419] The following describes the processing flow.

[0420] Step 1:

[0421] The terminal collects audio data through the microphone used during the meeting and transmits that audio data to the server.

[0422] Step 2:

[0423] The server converts the received audio data into text data using speech recognition technology, and simultaneously uses an emotion engine to recognize the participant's emotions from the audio.

[0424] Step 3:

[0425] The server analyzes the converted text data using a natural language processing algorithm and generates a summary. During this process, recognized sentiment information is added to the summary.

[0426] Step 4:

[0427] The server formats meeting minutes with sentiment information and distributes them to participants via email or other means of communication.

[0428] Step 1:

[0429] Users use a dedicated application to enter their inquiries in text format and send them to the server.

[0430] Step 2:

[0431] The server translates the inquiry into the specified language using an automatic translation engine, and simultaneously recognizes the emotional tone of the inquiry using an emotion engine.

[0432] Step 3:

[0433] The server generates the most suitable response from its database based on the translated text and emotional tone.

[0434] Step 4:

[0435] The server re-translates the generated response back into the original language and returns it to the user, reflecting the emotional tone.

[0436] Step 1:

[0437] The device monitors activity and voice data using sensors installed in the living environment of elderly people and transmits the data to a server.

[0438] Step 2:

[0439] The server analyzes the monitored data, uses machine learning algorithms to detect behavioral anomalies, and evaluates emotional states using an emotion engine.

[0440] Step 3:

[0441] If the server detects any abnormalities or unstable emotional states, it will send an alert to the designated contacts to notify them of the information.

[0442] (Example 2)

[0443] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0444] In managing neighborhood associations, there is a need to efficiently and emotionally handle tasks such as creating meeting minutes, responding to residents' inquiries in multiple languages, and monitoring the elderly. However, with conventional technologies, processes such as transcribing voice data into text, responding to inquiries, and detecting anomalies based on the emotional state of the elderly are performed individually, and there is a lack of a system for integrated and efficient operation. This is a problem because it complicates the management of neighborhood associations and communities, and increases the burden on personnel.

[0445] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0446] In this invention, the server includes means for collecting acoustic information in real time and converting that acoustic information into text information using conversion technology; means for summarizing the converted text information using processing technology and automatically generating a document; means for using communication technology to distribute the generated document to participants; and means for extracting emotional information from the acoustic information and integrating the extracted emotional information into the document. This enables efficient decision-making and flexible responses that take emotions into account in the management of neighborhood associations and communities.

[0447] "Acoustic information" refers to all audio and sound data obtained from the environment, and is fundamental information for processing sound as a signal.

[0448] "Conversion technology" refers to technologies for converting audio or sound data into text data or other data formats, and includes technologies such as speech recognition.

[0449] "Textual information" refers to data that represents audio or other data as text, providing information in written form.

[0450] "Processing technology" refers to techniques for analyzing, summarizing, or reconstructing text data and other information, and includes techniques for natural language processing.

[0451] A “document” is a set of textual information generated for a specific purpose, including meeting minutes and other records.

[0452] "Communication technology" refers to the means and techniques for delivering generated information and data to others, and includes technologies such as email and other digital communications.

[0453] "Emotional information" refers to data that indicates a psychological or emotional state, analyzed from audio or text, and highlights a person's emotional tone.

[0454] A "measuring device" refers to a device or apparatus used to monitor a subject's activity, health status, or emotional state.

[0455] This invention is a system for streamlining community association operations and providing information that takes into account the feelings of participants and residents. Specific embodiments are described below.

[0456] Processing of acoustic information

[0457] The terminal uses microphones installed in the conference room to collect acoustic information from participants in real time. The collected acoustic information is immediately transmitted to the server. This process uses highly sensitive acoustic sensors and a secure data transmission protocol.

[0458] The server converts received acoustic information into text information using conversion technology. For this process, it utilizes a general API service that provides speech recognition technology, for example, as the speech recognition engine. The converted text information is further summarized using processing technology. The natural language processing technologies used here include OpenAI's generative AI models and other natural language processing frameworks.

[0459] The server then extracts emotional information from the acoustic data and uses an emotion analysis engine to analyze the emotional tones of the participants. The emotional information is integrated into the generated document, and meeting minutes that take into account the emotional aspects of the meeting are automatically produced.

[0460] Multilingual support

[0461] Users submit inquiries in text format through a dedicated application on their device. The user's input is sent to the server and translated into the specified language using automatic translation technology. The translation technology utilizes a multilingual online translation service.

[0462] The server analyzes the emotional tone of incoming inquiries and adjusts the response accordingly. The response is generated using a generative AI model, re-translated into the original language, and then provided to the user. This enables emotionally sensitive responses even to inquiries from residents.

[0463] Specific example

[0464] For example, in the processing of acoustic information, if someone strongly states "I cannot agree with this proposal" during a meeting, that emotion will be recorded in the minutes as "Mr. / Ms. XX expressed strong opposition to the proposal."

[0465] Example of a prompt

[0466] "Please transcribe today's meeting discussions, analyze the emotions involved, and then summarize them."

[0467] "Please analyze multilingual inquiries and automatically generate appropriate responses that take emotional tone into consideration."

[0468] In this way, the system supports diverse forms of communication and enables high-quality management of community associations.

[0469] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0470] Step 1:

[0471] The terminal uses microphones installed in the conference room to collect acoustic information from participants in real time. This acoustic information is acquired as analog data and transmitted to the server as a digital signal. Specifically, the audio input device converts sound waves into electrical signals.

[0472] Step 2:

[0473] The server converts received digital audio information into text information using conversion technology. This process uses a speech recognition engine to analyze the audio signal and generate corresponding text data. As a result, the conversation content is output in text format. Specifically, this involves collaborative processing by an acoustic model and a language model.

[0474] Step 3:

[0475] The server further summarizes the converted text information using processing techniques. It employs natural language processing techniques to extract key information and generate a summary. The input is text data, and the output is a concise summary. Specific operations include tokenization and grammatical analysis.

[0476] Step 4:

[0477] The server extracts emotional information from text data. Using an emotion analysis engine, it analyzes the emotional tone of each statement and assigns an emotional label to the text. In this step, text data is input, and labels indicating the emotional state are output. Specifically, the tone and emphasis patterns of the speech are analyzed.

[0478] Step 5:

[0479] The server integrates emotional information into documents and generates meeting minutes. Using a generation AI model, it constructs meeting minutes that include emotional tone. The input consists of a summary and emotional information, and the output is meeting minutes that take emotions into account. Specifically, the document structure and content are adjusted.

[0480] Step 6:

[0481] Users submit inquiries through a dedicated application, and the text is sent to the server. The input is multilingual text, which is passed to the server as inquiry data. Specifically, this involves digitizing the captured user input.

[0482] Step 7:

[0483] The server translates received queries into the specified language using translation technology. Multilingual support technology is used to perform translations between different languages. The input is text data in the original language from the user, and the output is text data in the specified language. The use of a translation engine is a concrete example.

[0484] Step 8:

[0485] The server analyzes the sentiment tone of the translated query. Sentiment analysis techniques are used to evaluate the emotional state of the query. The input is the translated text, and the output is data indicating the sentiment tone. Specifically, this includes categorizing the sentiment of the text.

[0486] Step 9:

[0487] The server generates an appropriate response based on emotional information, re-translates it, and returns it to the user. A generative AI model is used to construct emotionally sensitive responses. The output is the response in the original language. Specifically, the process involves response generation and adjustment of the translation results.

[0488] (Application Example 2)

[0489] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0490] In elderly care settings, it is difficult to grasp the emotional changes of elderly individuals in real time, resulting in the inability to provide appropriate care approaches immediately. Furthermore, in handling inquiries in multiple languages, it is necessary to provide prompt and appropriate responses that take emotions into consideration.

[0491] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0492] In this invention, the server includes means for collecting audio data in real time and converting the audio data into text data using acoustic processing technology, means for summarizing the converted text data using natural language information processing technology and automatically generating a recorded document, and means for providing real-time visual feedback on the emotional state of the elderly person using a wearable display device. This enables appropriate care responses in accordance with the emotional state of the elderly person.

[0493] "Audio data" refers to sound information that records the content of conversations and speech.

[0494] "Real-time data collection" means acquiring data simultaneously with its occurrence or with a short delay.

[0495] "Audio processing technology" is a technology for analyzing sound and converting it into textual information.

[0496] "Character data" refers to data in which information is written as characters.

[0497] "Natural language processing technology" refers to the technology that allows computers to understand, interpret, and generate language used by humans.

[0498] "Summarizing" is the process of extracting only the essential parts of a large amount of information and putting them together concisely.

[0499] A "record document" is a document in which specific information is stored in text format.

[0500] A "wearable display device" is a device that is worn on the body to display information.

[0501] "Emotional state" refers to a psychological or emotional situation or change.

[0502] The system that realizes this invention enables real-time voice and emotion analysis in elderly care and multilingual inquiry handling. The system includes various devices and servers and functions as follows:

[0503] On the device side, a wearable display device is used. This device collects the voice of elderly people in real time and sends the data to a server. The server converts this voice data into text data using acoustic processing technology. For example, everyday conversations of elderly people can be acquired directly as data via smart glasses or similar devices.

[0504] Next, the server summarizes the converted text data using natural language processing technology and automatically generates a record document. For example, if an elderly person says, "I'm a little tired today," the content is quickly converted into text and visually fed back to the care staff.

[0505] Furthermore, a wearable display device visually shows the analyzed emotional state of the elderly person. This allows care staff to intuitively understand the psychological state of the elderly person and provide more appropriate care.

[0506] As a concrete example, a prompt for a generative AI model could be something like, "Design a system that analyzes the speech of elderly people, understands their emotional tone, and displays it in real time on a smart device. The voice data will be processed in the cloud, and the results will be fed back to a wearable display device."

[0507] This system will enable flexible and prompt responses in care settings that are tailored to the emotional state of elderly individuals, leading to improved communication between caregivers and caregivers.

[0508] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0509] Step 1:

[0510] A device (wearable display device) collects the speech of elderly individuals using a microphone and acquires it as audio data. The input is the speech of the elderly individuals, and the output is the collected audio data. The device collects data in real time and immediately transfers it to the server.

[0511] Step 2:

[0512] The server converts received audio data into text data using acoustic processing technology. Speech recognition software is used for this conversion. The input is audio data, and the output is text data. The server performs this data conversion process in real time.

[0513] Step 3:

[0514] The server summarizes the converted text data using natural language processing technology and automatically generates a recorded document. The input is text data, and the output is a summarized recorded document. A data analysis algorithm considers linguistic characteristics and extracts important information.

[0515] Step 4:

[0516] The server uses an emotion analysis engine to analyze the emotional state of elderly individuals from text data. The input is text data, and the output is the result of the emotional state analysis. Through emotion analysis, the server detects feelings from the tone and content of speech.

[0517] Step 5:

[0518] The wearable display device feeds back analyzed emotional states, visually displaying the emotional state of elderly individuals. The input is the analysis result of the emotional state, and the output is the displayed emotional information. The device provides a visual and intuitive interface, enabling care staff to respond quickly to the situation.

[0519] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0520] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0521] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0522] [Third Embodiment]

[0523] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0524] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0525] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0526] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0527] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0528] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0529] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0530] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0531] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0532] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0533] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0534] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0535] The present invention's community association management support system utilizes AI technology to efficiently support various community association activities. Its main functions include creating meeting minutes, responding to multilingual inquiries from residents, and providing monitoring services for the elderly.

[0536] Meeting minutes creation method

[0537] During a meeting, the terminal collects audio data in real time using microphones installed in the meeting room. This audio data is sent to a server and converted into text data using speech recognition technology. The converted text is summarized using natural language processing technology and automatically generated as meeting minutes. The completed meeting minutes are sent from the server to the participants' email addresses.

[0538] Implementation Examples of Multilingual Support

[0539] When residents contact their local community association, they use a dedicated application to input their questions in text format. This data is sent to a server and translated into Japanese using automatic translation technology. The server then refers to its database to generate an appropriate response, translates that response back into the original language, and provides it to the user. This enables smooth communication without language barriers.

[0540] Implementation of an elderly monitoring service

[0541] To support the lives of elderly people living alone, sensors placed as terminals periodically monitor activity data. The terminals send the collected data to a server, where machine learning algorithms are used to detect anomalies. If an anomaly is detected, the server immediately sends a notification to the designated contacts, helping family members or caregivers respond quickly.

[0542] These functions effectively solve many of the challenges associated with managing neighborhood associations, providing residents with a safe and convenient living environment. Specific examples of its use include generating meeting minutes for monthly neighborhood association meetings, responding to inquiries from foreign residents regarding garbage disposal rules, and providing a regular check-in service for elderly people living alone. Each function operates smoothly and efficiently, playing a role in improving the quality of neighborhood association operations and resident services.

[0543] The following describes the processing flow.

[0544] Step 1:

[0545] The terminal collects audio data in real time through microphones installed in the conference room and transmits that data to the server.

[0546] Step 2:

[0547] The server passes the received audio data to the speech recognition engine, which converts the audio into text data. During this process, speaker identification and timestamps are also added.

[0548] Step 3:

[0549] The server analyzes the generated text data using natural language processing algorithms to extract and summarize the key points of the meeting.

[0550] Step 4:

[0551] The server automatically generates meeting minutes based on the created summary and creates a file in PDF format or other suitable format.

[0552] Step 5:

[0553] The server sends the generated meeting minutes to the designated email address and distributes them to the relevant parties.

[0554] Step 1:

[0555] Users use a dedicated application to enter and submit inquiries in multiple languages.

[0556] Step 2:

[0557] The server passes the received inquiry data to an automatic translation engine, which then translates it into Japanese.

[0558] Step 3:

[0559] The server searches the FAQ database for information and prepares the answer based on the translated text to generate an appropriate response.

[0560] Step 4:

[0561] The server re-translates the prepared response into the language of the requester and sends the response back to the user.

[0562] Step 1:

[0563] The device monitors the activity levels of elderly people through sensors and transmits the collected data to a server.

[0564] Step 2:

[0565] The server uses machine learning algorithms to analyze the transmitted data and monitor the activity patterns of older adults.

[0566] Step 3:

[0567] If the server detects unusual activity or patterns, it sets a flag to proceed to the next step.

[0568] Step 4:

[0569] If a flag is set, the server will send a notification to pre-registered emergency contacts to inform them of the situation.

[0570] (Example 1)

[0571] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0572] In modern society, there is a need to improve the efficiency of meetings related to the operation of local governments, overcome language barriers in multicultural societies, and ensure safe and secure living for the elderly. Traditional methods for addressing these issues are largely reliant on manual labor, requiring significant time and effort, making it difficult to respond efficiently and quickly.

[0573] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0574] In this invention, the server includes means for collecting sound data in real time using a sound acquisition device and converting the sound data into text data using speech analysis technology; means for summarizing the converted text data using natural language processing technology and automatically generating meeting minutes; and means for using communication means to provide the generated meeting minutes to meeting participants. This enables efficient meeting progress and rapid sharing of records.

[0575] A "voice acquisition device" is a device used to collect audio at meetings, events, and other similar occasions, and to transmit that data to a processing device.

[0576] "Audio data" refers to digital data that records the waveform of sounds such as speech, and is a format suitable for information processing.

[0577] "Speech analysis technology" is a technology that analyzes sound data and converts its content into text data.

[0578] "Text data" refers to text-formatted data generated from audio information using speech analysis technology.

[0579] "Natural language processing technology" is a technology that analyzes text data and summarizes or converts it into a form that humans can understand.

[0580] "Meeting minutes" are documents that record the content of meetings and other gatherings so that they can be reviewed later.

[0581] "Communication methods" refer to means used to transmit or receive information, including email and other digital communication methods.

[0582] "Multilingualism" is a term that refers to the ability to express or process multiple different languages.

[0583] "Automatic translation technology" is a technology used to automatically translate text written in one language into another language.

[0584] An "information storage medium" is a medium used to store data and allow it to be retrieved as needed.

[0585] An "observation device" is a device used to monitor the physical environment and collect data.

[0586] An "automated learning algorithm" is an algorithm that improves its performance by learning from data and updating the model.

[0587] "Notification" refers to the act of sending information about a specific event to recipients who have been registered in advance.

[0588] The system according to the present invention provides an efficient and safe environment for the operation of community associations and similar organizations by integrating a voice acquisition device, multilingual support function, and elderly monitoring function.

[0589] Voice acquisition device

[0590] The terminal uses high-sensitivity microphones placed in the conference room to collect audio in real time and transmits the data to the server. The server uses speech analysis technology, specifically speech recognition software, to convert the audio data into text data. Next, the server utilizes natural language processing technology to summarize the converted text data and generate meeting minutes. These minutes are immediately distributed to all participants via email or other means. For example, immediately after the meeting ends, the summarized minutes are sent to all participants, enabling rapid information sharing.

[0591] Multilingual support feature

[0592] Users enter inquiries in multiple languages as text using a dedicated application on their smartphones or other devices. The entered data is sent to a server and translated into Japanese using automatic translation technology. The server accesses the information storage medium, generates an appropriate response, and then translates the response back into the original language before providing it to the user. As a concrete example, foreign residents can ask questions to the neighborhood association in their native language and receive immediate responses. An example of a prompt would be, "Please propose a program that automatically generates minutes of neighborhood association meetings."

[0593] Elderly monitoring function

[0594] The device periodically monitors the elderly person's daily activities through an observation device (sensor) installed in their home and transmits the data to a server. The server uses an automated learning algorithm to analyze the activity data and sends a notification to pre-registered contacts if an anomaly is detected. This function supports the peace of mind of elderly people living alone; specifically, if the elderly person's scheduled activities cannot be confirmed, a notification is sent immediately, allowing for timely action.

[0595] By combining these technologies, a system can be realized that contributes to the smooth operation of neighborhood associations and the improvement of the quality of services provided to residents.

[0596] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0597] Processing steps of the voice acquisition device

[0598] Step 1:

[0599] The terminal collects audio in real time using microphones installed in the conference room. The input is audio data, which is converted to a digital format and pre-processed to remove noise. The data is then adjusted to a format that can be sent directly to the server.

[0600] Step 2:

[0601] The server converts received digital audio data into text data using speech analysis technology. Specifically, it uses acoustic and language models to perform feature extraction and decoding, generating text data as output.

[0602] Step 3:

[0603] The server summarizes text data using natural language processing techniques. The input is the previously converted character data, and a generative AI model is used to extract relevant and important parts and create a summary. A concise meeting minutes document is generated as output.

[0604] Step 4:

[0605] The server provides the completed meeting minutes to the meeting participants via communication means. Specifically, it uses email to automatically send the minutes to the participants' addresses. This process also includes setting the email format and performing error checking.

[0606] Multilingual support processing steps

[0607] Step 1:

[0608] The user opens a dedicated app on their smartphone and enters the question. The input is in text data format, and this data is sent to the server via the app.

[0609] Step 2:

[0610] The server translates the received multilingual questions into Japanese using automatic translation technology. Specifically, it uses a translation engine to analyze the input text and outputs the corresponding Japanese words and phrases.

[0611] Step 3:

[0612] The server retrieves appropriate information from the information storage medium and generates an answer based on the translated question. This includes a process of referencing data using a generative AI model and extracting and formatting an answer that is appropriate for the question.

[0613] Step 4:

[0614] The server re-translates the generated response back into the original language and provides it to the user. Using a prompt, the user translates it back into the original language through the re-translation engine and sends it back to the app as text data.

[0615] Processing steps for the elderly monitoring function

[0616] Step 1:

[0617] The terminal periodically monitors activity information using sensors placed in the homes of elderly individuals. Inputs include motion data and environmental data, which are then formatted into a standard format and sent to the server.

[0618] Step 2:

[0619] The server analyzes the received data using an automated learning algorithm. It then runs an anomaly monitoring model using the input dataset to detect pattern changes. Deviations from the normal range are obtained as deliverables.

[0620] Step 3:

[0621] If an anomaly is detected, the server will send a notification to pre-registered contacts. This process involves sending SMS or emails via various communication methods, and additional adjustments may be made as needed.

[0622] (Application Example 1)

[0623] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0624] In managing local communities, diverse needs exist, including taking meeting minutes, providing multilingual support among residents, and monitoring the elderly. However, a lack of efficient technical support increases the burden on these tasks, leading to challenges such as insufficient communication among residents and concerns about the safety of the elderly. Furthermore, traditional systems struggle with real-time information sharing, highlighting the need for improved convenience through the use of smartphones.

[0625] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0626] This invention includes a server that collects audio information during a meeting in real time and converts that audio information into text information using speech recognition technology; a server that summarizes the converted text information using natural language processing technology and automatically generates meeting minutes; a server that uses email or other communication methods to distribute the generated meeting minutes to members; and a server that uses a program installed on a smartphone to obtain and view the meeting minutes of the neighborhood association. This enables more efficient management of the neighborhood association and faster provision of services to residents.

[0627] "Audio information" refers to data recorded in digital format from sounds that occur in settings such as meetings.

[0628] "Speech recognition technology" is a technology that uses computers to analyze human speech and convert it into text information.

[0629] "Textual information" refers to text data converted from speech using speech recognition technology.

[0630] "Natural language processing technology" is a technology that allows computers to understand and process human language, and to perform tasks such as summarizing and translating it.

[0631] "Methods for automatically generating meeting minutes" refer to technologies that organize meeting content as text and generate it in a format that can be distributed to participants.

[0632] "Members" refer to people who participate in meetings or communities and are involved in discussions and decisions.

[0633] "Email" is a method of exchanging textual information over the internet.

[0634] "Communication methods" is a general term for the technologies and means used to send and receive information.

[0635] A "smartphone" is a multi-functional mobile phone device that can connect to the internet.

[0636] "A method for obtaining and viewing neighborhood association meeting minutes using a program" refers to a method of viewing the generated meeting minutes using an application on a smartphone.

[0637] The system for realizing this invention will be built as a software solution to support the operation of community associations. The server will acquire audio information from microphones set up during meetings and convert it into text using speech recognition technology. This will utilize speech recognition services such as the Google Cloud Speech-to-Text API as hardware. The converted text information will be summarized using natural language processing technology. This process will utilize services such as the Google Cloud Natural Language API. The generated meeting minutes will be viewable on mobile devices, including smartphones, through an application. This application will help streamline the acquisition and distribution of meeting minutes.

[0638] Furthermore, multilingual inquiries from residents are received by the server and translated into the specified language using automatic translation technology. The Google Cloud Translation API is used for translation, and combined with information extraction from the database, it generates appropriate responses. The generated responses are then translated back into the original language and provided to the user. This enables smooth communication with residents who speak foreign languages.

[0639] For monitoring the elderly, activity data collected through sensor devices is sent to a server, where machine learning algorithms are used to detect anomalies. This anomaly detection is then notified to the smartphones of pre-registered contacts, enabling a quick response. Specific examples include residents who cannot attend monthly neighborhood meetings viewing meeting minutes via a smartphone app to deepen their understanding, and new foreign residents using the app to inquire about garbage disposal procedures in their native language.

[0640] As an example of a prompt, you could ask, "Please display the minutes of the recent neighborhood association meeting regarding the local crime prevention plan that was discussed."

[0641] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0642] Step 1:

[0643] The server acquires audio information received from terminals during a meeting. The input is digital audio information collected from the microphone, and this information is converted into text using the Google Cloud Speech-to-Text API. The output is text data. A speech recognition service is activated, analyzing the audio and converting it to text in real time.

[0644] Step 2:

[0645] The server analyzes the acquired text information and summarizes it using natural language processing techniques. The input is the text information generated in step 1, which is then summarized using the Google Cloud Natural Language API. The output is the summarized text. The natural language processing algorithm operates at this stage, extracting important information and generating a summary.

[0646] Step 3:

[0647] The server prepares the summarized meeting minutes for distribution to smartphones. The input is the summarized text generated in step 2, and the output is the meeting minutes organized into a distribution format. These minutes are stored in a database for distribution to members via email or app.

[0648] Step 4:

[0649] Users retrieve and view meeting minutes from an app using their smartphones. The input is the meeting minutes data distributed in step 3, and the output is the meeting minutes information displayed on the user's screen. The application retrieves the necessary information from the database and provides an interface to display it on the screen.

[0650] Step 5:

[0651] The server receives multilingual inquiries submitted by residents. The input is user-generated text inquiry data, which is then translated into the specified language using the Google Cloud Translation API. The output is the translated text data. The server automatically identifies the language and performs the appropriate translation.

[0652] Step 6:

[0653] The server extracts relevant information from the database based on the translated query and generates a response. The input is the translated query text obtained in step 5, and the output is the response text. In the information extraction process, information appropriate to the query is immediately searched for and the response is constructed.

[0654] Step 7:

[0655] The server re-translates the generated responses back into the residents' original language. The input is the response text generated in step 6, and the output is the re-translated output text. The Google Cloud Translation API is used again to perform an accurate translation back into the original language.

[0656] Step 8:

[0657] The device monitors the elderly person's activity data based on notifications from the server and sends a notification if an anomaly is detected. The input is activity data from the sensor, and the output is an anomaly detection notification. The sensor records daily activities, and a machine learning algorithm analyzes the patterns, so if an anomaly is detected, the server immediately notifies the family or caregiver.

[0658] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0659] The community association management support system of the present invention utilizes AI technology to efficiently support various community association events. In particular, by combining it with an emotion engine, the present invention can recognize emotions during meetings and interactions with residents, and reflect them in meeting minutes and services. This makes it possible to make communication more effective and improve the quality of management.

[0660] Implementation methods for meeting minute creation and sentiment recognition

[0661] During a meeting, the terminal uses microphones installed in the meeting room to collect audio data in real time. This audio data is sent to a server, where it is converted into text data and sentiment data using speech recognition technology and an emotion engine. The converted text is summarized using natural language processing technology, and when meeting minutes are automatically generated, participant sentiment information is also taken into account. For example, if the system recognizes a situation during the meeting where participants have conflicting opinions, the meeting minutes will record not only the content of those opinions but also that the discussion was heated.

[0662] Embodiments of emotion recognition in multilingual support

[0663] When residents make an inquiry, they use a dedicated application to input their question in text format. This data is sent to a server, where it is translated into Japanese using automatic translation technology, and then an emotion engine recognizes the emotional tone of the content. Based on the translated text and the recognized emotional tone, the server prepares a more appropriate response, which is then re-translated and provided to the user. This allows for a more considerate response if, for example, the inquiry contains emotions such as dissatisfaction or urgency.

[0664] Implementation of emotion recognition in elderly monitoring services

[0665] The system uses wearable devices worn by elderly individuals to monitor their voice and behavior. This data is sent to a server, where an emotion engine analyzes the voice to determine the individual's emotional state. The server detects any deviations from normal conversation or behavior and, if an emotionally unstable state persists, sends an emergency notification to family members or caregivers that takes this into account. For example, if an elderly person repeatedly expresses irritability or uses irritable language, this system facilitates mental health care in addition to addressing any normal physical abnormalities.

[0666] This system not only improves the efficiency of community association operations but also enables flexible responses that are attentive to people's latent needs and emotions, thereby contributing to improved human interaction and welfare within the community.

[0667] The following describes the processing flow.

[0668] Step 1:

[0669] The terminal collects audio data through the microphone used during the meeting and transmits that audio data to the server.

[0670] Step 2:

[0671] The server converts the received audio data into text data using speech recognition technology, and simultaneously uses an emotion engine to recognize the participant's emotions from the audio.

[0672] Step 3:

[0673] The server analyzes the converted text data using a natural language processing algorithm and generates a summary. During this process, recognized sentiment information is added to the summary.

[0674] Step 4:

[0675] The server formats meeting minutes with sentiment information and distributes them to participants via email or other means of communication.

[0676] Step 1:

[0677] Users use a dedicated application to enter their inquiries in text format and send them to the server.

[0678] Step 2:

[0679] The server translates the inquiry into the specified language using an automatic translation engine, and simultaneously recognizes the emotional tone of the inquiry using an emotion engine.

[0680] Step 3:

[0681] The server generates the most suitable response from its database based on the translated text and emotional tone.

[0682] Step 4:

[0683] The server re-translates the generated response back into the original language and returns it to the user, reflecting the emotional tone.

[0684] Step 1:

[0685] The device monitors activity and voice data using sensors installed in the living environment of elderly people and transmits the data to a server.

[0686] Step 2:

[0687] The server analyzes the monitored data, uses machine learning algorithms to detect behavioral anomalies, and evaluates emotional states using an emotion engine.

[0688] Step 3:

[0689] If the server detects any abnormalities or unstable emotional states, it will send an alert to the designated contacts to notify them of the information.

[0690] (Example 2)

[0691] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0692] In managing neighborhood associations, there is a need to efficiently and emotionally handle tasks such as creating meeting minutes, responding to residents' inquiries in multiple languages, and monitoring the elderly. However, with conventional technologies, processes such as transcribing voice data into text, responding to inquiries, and detecting anomalies based on the emotional state of the elderly are performed individually, and there is a lack of a system for integrated and efficient operation. This is a problem because it complicates the management of neighborhood associations and communities, and increases the burden on personnel.

[0693] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0694] In this invention, the server includes means for collecting acoustic information in real time and converting that acoustic information into text information using conversion technology; means for summarizing the converted text information using processing technology and automatically generating a document; means for using communication technology to distribute the generated document to participants; and means for extracting emotional information from the acoustic information and integrating the extracted emotional information into the document. This enables efficient decision-making and flexible responses that take emotions into account in the management of neighborhood associations and communities.

[0695] "Acoustic information" refers to all audio and sound data obtained from the environment, and is fundamental information for processing sound as a signal.

[0696] "Conversion technology" refers to technologies for converting audio or sound data into text data or other data formats, and includes technologies such as speech recognition.

[0697] "Textual information" refers to data that represents audio or other data as text, providing information in written form.

[0698] "Processing technology" refers to techniques for analyzing, summarizing, or reconstructing text data and other information, and includes techniques for natural language processing.

[0699] A “document” is a set of textual information generated for a specific purpose, including meeting minutes and other records.

[0700] "Communication technology" refers to the means and techniques for delivering generated information and data to others, and includes technologies such as email and other digital communications.

[0701] "Emotional information" refers to data that indicates a psychological or emotional state, analyzed from audio or text, and highlights a person's emotional tone.

[0702] A "measuring device" refers to a device or apparatus used to monitor a subject's activity, health status, or emotional state.

[0703] This invention is a system for streamlining community association operations and providing information that takes into account the feelings of participants and residents. Specific embodiments are described below.

[0704] Processing of acoustic information

[0705] The terminal uses microphones installed in the conference room to collect acoustic information from participants in real time. The collected acoustic information is immediately transmitted to the server. This process uses highly sensitive acoustic sensors and a secure data transmission protocol.

[0706] The server converts received acoustic information into text information using conversion technology. For this process, it utilizes a general API service that provides speech recognition technology, for example, as the speech recognition engine. The converted text information is further summarized using processing technology. The natural language processing technologies used here include OpenAI's generative AI models and other natural language processing frameworks.

[0707] The server then extracts emotional information from the acoustic data and uses an emotion analysis engine to analyze the emotional tones of the participants. The emotional information is integrated into the generated document, and meeting minutes that take into account the emotional aspects of the meeting are automatically produced.

[0708] Multilingual support

[0709] Users submit inquiries in text format through a dedicated application on their device. The user's input is sent to the server and translated into the specified language using automatic translation technology. The translation technology utilizes a multilingual online translation service.

[0710] The server analyzes the emotional tone of incoming inquiries and adjusts the response accordingly. The response is generated using a generative AI model, re-translated into the original language, and then provided to the user. This enables emotionally sensitive responses even to inquiries from residents.

[0711] Specific example

[0712] For example, in the processing of acoustic information, if someone strongly states "I cannot agree with this proposal" during a meeting, that emotion will be recorded in the minutes as "Mr. / Ms. XX expressed strong opposition to the proposal."

[0713] Example of a prompt

[0714] "Please transcribe today's meeting discussions, analyze the emotions involved, and then summarize them."

[0715] "Please analyze multilingual inquiries and automatically generate appropriate responses that take emotional tone into consideration."

[0716] In this way, the system supports diverse forms of communication and enables high-quality management of community associations.

[0717] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0718] Step 1:

[0719] The terminal uses microphones installed in the conference room to collect acoustic information from participants in real time. This acoustic information is acquired as analog data and transmitted to the server as a digital signal. Specifically, the audio input device converts sound waves into electrical signals.

[0720] Step 2:

[0721] The server converts received digital audio information into text information using conversion technology. This process uses a speech recognition engine to analyze the audio signal and generate corresponding text data. As a result, the conversation content is output in text format. Specifically, this involves collaborative processing by an acoustic model and a language model.

[0722] Step 3:

[0723] The server further summarizes the converted text information using processing techniques. It employs natural language processing techniques to extract key information and generate a summary. The input is text data, and the output is a concise summary. Specific operations include tokenization and grammatical analysis.

[0724] Step 4:

[0725] The server extracts emotional information from text data. Using an emotion analysis engine, it analyzes the emotional tone of each statement and assigns an emotional label to the text. In this step, text data is input, and labels indicating the emotional state are output. Specifically, the tone and emphasis patterns of the speech are analyzed.

[0726] Step 5:

[0727] The server integrates emotional information into documents and generates meeting minutes. Using a generation AI model, it constructs meeting minutes that include emotional tone. The input consists of a summary and emotional information, and the output is meeting minutes that take emotions into account. Specifically, the document structure and content are adjusted.

[0728] Step 6:

[0729] Users submit inquiries through a dedicated application, and the text is sent to the server. The input is multilingual text, which is passed to the server as inquiry data. Specifically, this involves digitizing the captured user input.

[0730] Step 7:

[0731] The server translates received queries into the specified language using translation technology. Multilingual support technology is used to perform translations between different languages. The input is text data in the original language from the user, and the output is text data in the specified language. The use of a translation engine is a concrete example.

[0732] Step 8:

[0733] The server analyzes the sentiment tone of the translated query. Sentiment analysis techniques are used to evaluate the emotional state of the query. The input is the translated text, and the output is data indicating the sentiment tone. Specifically, this includes categorizing the sentiment of the text.

[0734] Step 9:

[0735] The server generates an appropriate response based on emotional information, re-translates it, and returns it to the user. A generative AI model is used to construct emotionally sensitive responses. The output is the response in the original language. Specifically, the process involves response generation and adjustment of the translation results.

[0736] (Application Example 2)

[0737] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0738] In elderly care settings, it is difficult to grasp the emotional changes of elderly individuals in real time, resulting in the inability to provide appropriate care approaches immediately. Furthermore, in handling inquiries in multiple languages, it is necessary to provide prompt and appropriate responses that take emotions into consideration.

[0739] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0740] In this invention, the server includes means for collecting audio data in real time and converting the audio data into text data using acoustic processing technology, means for summarizing the converted text data using natural language information processing technology and automatically generating a recorded document, and means for providing real-time visual feedback on the emotional state of the elderly person using a wearable display device. This enables appropriate care responses in accordance with the emotional state of the elderly person.

[0741] "Audio data" refers to sound information that records the content of conversations and speech.

[0742] "Real-time data collection" means acquiring data simultaneously with its occurrence or with a short delay.

[0743] "Audio processing technology" is a technology for analyzing sound and converting it into textual information.

[0744] "Character data" refers to data in which information is written as characters.

[0745] "Natural language processing technology" refers to the technology that allows computers to understand, interpret, and generate language used by humans.

[0746] "Summarizing" is the process of extracting only the essential parts of a large amount of information and putting them together concisely.

[0747] A "record document" is a document in which specific information is stored in text format.

[0748] A "wearable display device" is a device that is worn on the body to display information.

[0749] "Emotional state" refers to a psychological or emotional situation or change.

[0750] The system that realizes this invention enables real-time voice and emotion analysis in elderly care and multilingual inquiry handling. The system includes various devices and servers and functions as follows:

[0751] On the device side, a wearable display device is used. This device collects the voice of elderly people in real time and sends the data to a server. The server converts this voice data into text data using acoustic processing technology. For example, everyday conversations of elderly people can be acquired directly as data via smart glasses or similar devices.

[0752] Next, the server summarizes the converted text data using natural language processing technology and automatically generates a record document. For example, if an elderly person says, "I'm a little tired today," the content is quickly converted into text and visually fed back to the care staff.

[0753] Furthermore, a wearable display device visually shows the analyzed emotional state of the elderly person. This allows care staff to intuitively understand the psychological state of the elderly person and provide more appropriate care.

[0754] As a concrete example, a prompt for a generative AI model could be something like, "Design a system that analyzes the speech of elderly people, understands their emotional tone, and displays it in real time on a smart device. The voice data will be processed in the cloud, and the results will be fed back to a wearable display device."

[0755] This system will enable flexible and prompt responses in care settings that are tailored to the emotional state of elderly individuals, leading to improved communication between caregivers and caregivers.

[0756] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0757] Step 1:

[0758] A device (wearable display device) collects the speech of elderly individuals using a microphone and acquires it as audio data. The input is the speech of the elderly individuals, and the output is the collected audio data. The device collects data in real time and immediately transfers it to the server.

[0759] Step 2:

[0760] The server converts received audio data into text data using acoustic processing technology. Speech recognition software is used for this conversion. The input is audio data, and the output is text data. The server performs this data conversion process in real time.

[0761] Step 3:

[0762] The server summarizes the converted text data using natural language processing technology and automatically generates a recorded document. The input is text data, and the output is a summarized recorded document. A data analysis algorithm considers linguistic characteristics and extracts important information.

[0763] Step 4:

[0764] The server uses an emotion analysis engine to analyze the emotional state of elderly individuals from text data. The input is text data, and the output is the result of the emotional state analysis. Through emotion analysis, the server detects feelings from the tone and content of speech.

[0765] Step 5:

[0766] The wearable display device feeds back analyzed emotional states, visually displaying the emotional state of elderly individuals. The input is the analysis result of the emotional state, and the output is the displayed emotional information. The device provides a visual and intuitive interface, enabling care staff to respond quickly to the situation.

[0767] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0768] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0769] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0770] [Fourth Embodiment]

[0771] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0772] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0773] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0774] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0775] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0776] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0777] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0778] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0779] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0780] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0781] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0782] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0783] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0784] The present invention's community association management support system utilizes AI technology to efficiently support various community association activities. Its main functions include creating meeting minutes, responding to multilingual inquiries from residents, and providing monitoring services for the elderly.

[0785] Meeting minutes creation method

[0786] During a meeting, the terminal collects audio data in real time using microphones installed in the meeting room. This audio data is sent to a server and converted into text data using speech recognition technology. The converted text is summarized using natural language processing technology and automatically generated as meeting minutes. The completed meeting minutes are sent from the server to the participants' email addresses.

[0787] Implementation Examples of Multilingual Support

[0788] When residents contact their local community association, they use a dedicated application to input their questions in text format. This data is sent to a server and translated into Japanese using automatic translation technology. The server then refers to its database to generate an appropriate response, translates that response back into the original language, and provides it to the user. This enables smooth communication without language barriers.

[0789] Implementation of an elderly monitoring service

[0790] To support the lives of elderly people living alone, sensors placed as terminals periodically monitor activity data. The terminals send the collected data to a server, where machine learning algorithms are used to detect anomalies. If an anomaly is detected, the server immediately sends a notification to the designated contacts, helping family members or caregivers respond quickly.

[0791] These functions effectively solve many of the challenges associated with managing neighborhood associations, providing residents with a safe and convenient living environment. Specific examples of its use include generating meeting minutes for monthly neighborhood association meetings, responding to inquiries from foreign residents regarding garbage disposal rules, and providing a regular check-in service for elderly people living alone. Each function operates smoothly and efficiently, playing a role in improving the quality of neighborhood association operations and resident services.

[0792] The following describes the processing flow.

[0793] Step 1:

[0794] The terminal collects audio data in real time through microphones installed in the conference room and transmits that data to the server.

[0795] Step 2:

[0796] The server passes the received audio data to the speech recognition engine, which converts the audio into text data. During this process, speaker identification and timestamps are also added.

[0797] Step 3:

[0798] The server analyzes the generated text data using natural language processing algorithms to extract and summarize the key points of the meeting.

[0799] Step 4:

[0800] The server automatically generates meeting minutes based on the created summary and creates a file in PDF format or other suitable format.

[0801] Step 5:

[0802] The server sends the generated meeting minutes to the designated email address and distributes them to the relevant parties.

[0803] Step 1:

[0804] Users use a dedicated application to enter and submit inquiries in multiple languages.

[0805] Step 2:

[0806] The server passes the received inquiry data to an automatic translation engine, which then translates it into Japanese.

[0807] Step 3:

[0808] The server searches the FAQ database for information and prepares the answer based on the translated text to generate an appropriate response.

[0809] Step 4:

[0810] The server re-translates the prepared response into the language of the requester and sends the response back to the user.

[0811] Step 1:

[0812] The device monitors the activity levels of elderly people through sensors and transmits the collected data to a server.

[0813] Step 2:

[0814] The server uses machine learning algorithms to analyze the transmitted data and monitor the activity patterns of older adults.

[0815] Step 3:

[0816] If the server detects unusual activity or patterns, it sets a flag to proceed to the next step.

[0817] Step 4:

[0818] If a flag is set, the server will send a notification to pre-registered emergency contacts to inform them of the situation.

[0819] (Example 1)

[0820] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0821] In modern society, there is a need to improve the efficiency of meetings related to the operation of local governments, overcome language barriers in multicultural societies, and ensure safe and secure living for the elderly. Traditional methods for addressing these issues are largely reliant on manual labor, requiring significant time and effort, making it difficult to respond efficiently and quickly.

[0822] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0823] In this invention, the server includes means for collecting sound data in real time using a sound acquisition device and converting the sound data into text data using speech analysis technology; means for summarizing the converted text data using natural language processing technology and automatically generating meeting minutes; and means for using communication means to provide the generated meeting minutes to meeting participants. This enables efficient meeting progress and rapid sharing of records.

[0824] A "voice acquisition device" is a device used to collect audio at meetings, events, and other similar occasions, and to transmit that data to a processing device.

[0825] "Audio data" refers to digital data that records the waveform of sounds such as speech, and is a format suitable for information processing.

[0826] "Speech analysis technology" is a technology that analyzes sound data and converts its content into text data.

[0827] "Text data" refers to text-formatted data generated from audio information using speech analysis technology.

[0828] "Natural language processing technology" is a technology that analyzes text data and summarizes or converts it into a form that humans can understand.

[0829] "Meeting minutes" are documents that record the content of meetings and other gatherings so that they can be reviewed later.

[0830] "Communication methods" refer to means used to transmit or receive information, including email and other digital communication methods.

[0831] "Multilingualism" is a term that refers to the ability to express or process multiple different languages.

[0832] "Automatic translation technology" is a technology used to automatically translate text written in one language into another language.

[0833] An "information storage medium" is a medium used to store data and allow it to be retrieved as needed.

[0834] An "observation device" is a device used to monitor the physical environment and collect data.

[0835] An "automated learning algorithm" is an algorithm that improves its performance by learning from data and updating the model.

[0836] "Notification" refers to the act of sending information about a specific event to recipients who have been registered in advance.

[0837] The system according to the present invention provides an efficient and safe environment for the operation of community associations and similar organizations by integrating a voice acquisition device, multilingual support function, and elderly monitoring function.

[0838] Voice acquisition device

[0839] The terminal uses high-sensitivity microphones placed in the conference room to collect audio in real time and transmits the data to the server. The server uses speech analysis technology, specifically speech recognition software, to convert the audio data into text data. Next, the server utilizes natural language processing technology to summarize the converted text data and generate meeting minutes. These minutes are immediately distributed to all participants via email or other means. For example, immediately after the meeting ends, the summarized minutes are sent to all participants, enabling rapid information sharing.

[0840] Multilingual support feature

[0841] Users enter inquiries in multiple languages as text using a dedicated application on their smartphones or other devices. The entered data is sent to a server and translated into Japanese using automatic translation technology. The server accesses the information storage medium, generates an appropriate response, and then translates the response back into the original language before providing it to the user. As a concrete example, foreign residents can ask questions to the neighborhood association in their native language and receive immediate responses. An example of a prompt would be, "Please propose a program that automatically generates minutes of neighborhood association meetings."

[0842] Elderly monitoring function

[0843] The device periodically monitors the elderly person's daily activities through an observation device (sensor) installed in their home and transmits the data to a server. The server uses an automated learning algorithm to analyze the activity data and sends a notification to pre-registered contacts if an anomaly is detected. This function supports the peace of mind of elderly people living alone; specifically, if the elderly person's scheduled activities cannot be confirmed, a notification is sent immediately, allowing for timely action.

[0844] By combining these technologies, a system can be realized that contributes to the smooth operation of neighborhood associations and the improvement of the quality of services provided to residents.

[0845] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0846] Processing steps of the voice acquisition device

[0847] Step 1:

[0848] The terminal collects audio in real time using microphones installed in the conference room. The input is audio data, which is converted to a digital format and pre-processed to remove noise. The data is then adjusted to a format that can be sent directly to the server.

[0849] Step 2:

[0850] The server converts received digital audio data into text data using speech analysis technology. Specifically, it uses acoustic and language models to perform feature extraction and decoding, generating text data as output.

[0851] Step 3:

[0852] The server summarizes text data using natural language processing techniques. The input is the previously converted character data, and a generative AI model is used to extract relevant and important parts and create a summary. A concise meeting minutes document is generated as output.

[0853] Step 4:

[0854] The server provides the completed meeting minutes to the meeting participants via communication means. Specifically, it uses email to automatically send the minutes to the participants' addresses. This process also includes setting the email format and performing error checking.

[0855] Multilingual support processing steps

[0856] Step 1:

[0857] The user opens a dedicated app on their smartphone and enters the question. The input is in text data format, and this data is sent to the server via the app.

[0858] Step 2:

[0859] The server translates the received multilingual questions into Japanese using automatic translation technology. Specifically, it uses a translation engine to analyze the input text and outputs the corresponding Japanese words and phrases.

[0860] Step 3:

[0861] The server retrieves appropriate information from the information storage medium and generates an answer based on the translated question. This includes a process of referencing data using a generative AI model and extracting and formatting an answer that is appropriate for the question.

[0862] Step 4:

[0863] The server re-translates the generated response back into the original language and provides it to the user. Using a prompt, the user translates it back into the original language through the re-translation engine and sends it back to the app as text data.

[0864] Processing steps for the elderly monitoring function

[0865] Step 1:

[0866] The terminal periodically monitors activity information using sensors placed in the homes of elderly individuals. Inputs include motion data and environmental data, which are then formatted into a standard format and sent to the server.

[0867] Step 2:

[0868] The server analyzes the received data using an automated learning algorithm. It then runs an anomaly monitoring model using the input dataset to detect pattern changes. Deviations from the normal range are obtained as deliverables.

[0869] Step 3:

[0870] If an anomaly is detected, the server will send a notification to pre-registered contacts. This process involves sending SMS or emails via various communication methods, and additional adjustments may be made as needed.

[0871] (Application Example 1)

[0872] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0873] In managing local communities, diverse needs exist, including taking meeting minutes, providing multilingual support among residents, and monitoring the elderly. However, a lack of efficient technical support increases the burden on these tasks, leading to challenges such as insufficient communication among residents and concerns about the safety of the elderly. Furthermore, traditional systems struggle with real-time information sharing, highlighting the need for improved convenience through the use of smartphones.

[0874] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0875] This invention includes a server that collects audio information during a meeting in real time and converts that audio information into text information using speech recognition technology; a server that summarizes the converted text information using natural language processing technology and automatically generates meeting minutes; a server that uses email or other communication methods to distribute the generated meeting minutes to members; and a server that uses a program installed on a smartphone to obtain and view the meeting minutes of the neighborhood association. This enables more efficient management of the neighborhood association and faster provision of services to residents.

[0876] "Audio information" refers to data recorded in digital format from sounds that occur in settings such as meetings.

[0877] "Speech recognition technology" is a technology that uses computers to analyze human speech and convert it into text information.

[0878] "Textual information" refers to text data converted from speech using speech recognition technology.

[0879] "Natural language processing technology" is a technology that allows computers to understand and process human language, and to perform tasks such as summarizing and translating it.

[0880] "Methods for automatically generating meeting minutes" refer to technologies that organize meeting content as text and generate it in a format that can be distributed to participants.

[0881] "Members" refer to people who participate in meetings or communities and are involved in discussions and decisions.

[0882] "Email" is a method of exchanging textual information over the internet.

[0883] "Communication methods" is a general term for the technologies and means used to send and receive information.

[0884] A "smartphone" is a multi-functional mobile phone device that can connect to the internet.

[0885] "A method for obtaining and viewing neighborhood association meeting minutes using a program" refers to a method of viewing the generated meeting minutes using an application on a smartphone.

[0886] The system for realizing this invention will be built as a software solution to support the operation of community associations. The server will acquire audio information from microphones set up during meetings and convert it into text using speech recognition technology. This will utilize speech recognition services such as the Google Cloud Speech-to-Text API as hardware. The converted text information will be summarized using natural language processing technology. This process will utilize services such as the Google Cloud Natural Language API. The generated meeting minutes will be viewable on mobile devices, including smartphones, through an application. This application will help streamline the acquisition and distribution of meeting minutes.

[0887] Furthermore, multilingual inquiries from residents are received by the server and translated into the specified language using automatic translation technology. The Google Cloud Translation API is used for translation, and combined with information extraction from the database, it generates appropriate responses. The generated responses are then translated back into the original language and provided to the user. This enables smooth communication with residents who speak foreign languages.

[0888] For monitoring the elderly, activity data collected through sensor devices is sent to a server, where machine learning algorithms are used to detect anomalies. This anomaly detection is then notified to the smartphones of pre-registered contacts, enabling a quick response. Specific examples include residents who cannot attend monthly neighborhood meetings viewing meeting minutes via a smartphone app to deepen their understanding, and new foreign residents using the app to inquire about garbage disposal procedures in their native language.

[0889] As an example of a prompt, you could ask, "Please display the minutes of the recent neighborhood association meeting regarding the local crime prevention plan that was discussed."

[0890] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0891] Step 1:

[0892] The server acquires audio information received from terminals during a meeting. The input is digital audio information collected from the microphone, and this information is converted into text using the Google Cloud Speech-to-Text API. The output is text data. A speech recognition service is activated, analyzing the audio and converting it to text in real time.

[0893] Step 2:

[0894] The server analyzes the acquired text information and summarizes it using natural language processing techniques. The input is the text information generated in step 1, which is then summarized using the Google Cloud Natural Language API. The output is the summarized text. The natural language processing algorithm operates at this stage, extracting important information and generating a summary.

[0895] Step 3:

[0896] The server prepares the summarized meeting minutes for distribution to smartphones. The input is the summarized text generated in step 2, and the output is the meeting minutes organized into a distribution format. These minutes are stored in a database for distribution to members via email or app.

[0897] Step 4:

[0898] Users retrieve and view meeting minutes from an app using their smartphones. The input is the meeting minutes data distributed in step 3, and the output is the meeting minutes information displayed on the user's screen. The application retrieves the necessary information from the database and provides an interface to display it on the screen.

[0899] Step 5:

[0900] The server receives multilingual inquiries submitted by residents. The input is user-generated text inquiry data, which is then translated into the specified language using the Google Cloud Translation API. The output is the translated text data. The server automatically identifies the language and performs the appropriate translation.

[0901] Step 6:

[0902] The server extracts relevant information from the database based on the translated query and generates a response. The input is the translated query text obtained in step 5, and the output is the response text. In the information extraction process, information appropriate to the query is immediately searched for and the response is constructed.

[0903] Step 7:

[0904] The server re-translates the generated responses back into the residents' original language. The input is the response text generated in step 6, and the output is the re-translated output text. The Google Cloud Translation API is used again to perform an accurate translation back into the original language.

[0905] Step 8:

[0906] The device monitors the elderly person's activity data based on notifications from the server and sends a notification if an anomaly is detected. The input is activity data from the sensor, and the output is an anomaly detection notification. The sensor records daily activities, and a machine learning algorithm analyzes the patterns, so if an anomaly is detected, the server immediately notifies the family or caregiver.

[0907] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0908] The community association management support system of the present invention utilizes AI technology to efficiently support various community association events. In particular, by combining it with an emotion engine, the present invention can recognize emotions during meetings and interactions with residents, and reflect them in meeting minutes and services. This makes it possible to make communication more effective and improve the quality of management.

[0909] Implementation methods for meeting minute creation and sentiment recognition

[0910] During a meeting, the terminal uses microphones installed in the meeting room to collect audio data in real time. This audio data is sent to a server, where it is converted into text data and sentiment data using speech recognition technology and an emotion engine. The converted text is summarized using natural language processing technology, and when meeting minutes are automatically generated, participant sentiment information is also taken into account. For example, if the system recognizes a situation during the meeting where participants have conflicting opinions, the meeting minutes will record not only the content of those opinions but also that the discussion was heated.

[0911] Embodiments of emotion recognition in multilingual support

[0912] When residents make an inquiry, they use a dedicated application to input their question in text format. This data is sent to a server, where it is translated into Japanese using automatic translation technology, and then an emotion engine recognizes the emotional tone of the content. Based on the translated text and the recognized emotional tone, the server prepares a more appropriate response, which is then re-translated and provided to the user. This allows for a more considerate response if, for example, the inquiry contains emotions such as dissatisfaction or urgency.

[0913] Implementation of emotion recognition in elderly monitoring services

[0914] The system uses wearable devices worn by elderly individuals to monitor their voice and behavior. This data is sent to a server, where an emotion engine analyzes the voice to determine the individual's emotional state. The server detects any deviations from normal conversation or behavior and, if an emotionally unstable state persists, sends an emergency notification to family members or caregivers that takes this into account. For example, if an elderly person repeatedly expresses irritability or uses irritable language, this system facilitates mental health care in addition to addressing any normal physical abnormalities.

[0915] This system not only improves the efficiency of community association operations but also enables flexible responses that are attentive to people's latent needs and emotions, thereby contributing to improved human interaction and welfare within the community.

[0916] The following describes the processing flow.

[0917] Step 1:

[0918] The terminal collects audio data through the microphone used during the meeting and transmits that audio data to the server.

[0919] Step 2:

[0920] The server converts the received audio data into text data using speech recognition technology, and simultaneously uses an emotion engine to recognize the participant's emotions from the audio.

[0921] Step 3:

[0922] The server analyzes the converted text data using a natural language processing algorithm and generates a summary. During this process, recognized sentiment information is added to the summary.

[0923] Step 4:

[0924] The server formats meeting minutes with sentiment information and distributes them to participants via email or other means of communication.

[0925] Step 1:

[0926] Users use a dedicated application to enter their inquiries in text format and send them to the server.

[0927] Step 2:

[0928] The server translates the inquiry into the specified language using an automatic translation engine, and simultaneously recognizes the emotional tone of the inquiry using an emotion engine.

[0929] Step 3:

[0930] The server generates the most suitable response from its database based on the translated text and emotional tone.

[0931] Step 4:

[0932] The server re-translates the generated response back into the original language and returns it to the user, reflecting the emotional tone.

[0933] Step 1:

[0934] The device monitors activity and voice data using sensors installed in the living environment of elderly people and transmits the data to a server.

[0935] Step 2:

[0936] The server analyzes the monitored data, uses machine learning algorithms to detect behavioral anomalies, and evaluates emotional states using an emotion engine.

[0937] Step 3:

[0938] If the server detects any abnormalities or unstable emotional states, it will send an alert to the designated contacts to notify them of the information.

[0939] (Example 2)

[0940] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0941] In managing neighborhood associations, there is a need to efficiently and emotionally handle tasks such as creating meeting minutes, responding to residents' inquiries in multiple languages, and monitoring the elderly. However, with conventional technologies, processes such as transcribing voice data into text, responding to inquiries, and detecting anomalies based on the emotional state of the elderly are performed individually, and there is a lack of a system for integrated and efficient operation. This is a problem because it complicates the management of neighborhood associations and communities, and increases the burden on personnel.

[0942] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0943] In this invention, the server includes means for collecting acoustic information in real time and converting that acoustic information into text information using conversion technology; means for summarizing the converted text information using processing technology and automatically generating a document; means for using communication technology to distribute the generated document to participants; and means for extracting emotional information from the acoustic information and integrating the extracted emotional information into the document. This enables efficient decision-making and flexible responses that take emotions into account in the management of neighborhood associations and communities.

[0944] "Acoustic information" refers to all audio and sound data obtained from the environment, and is fundamental information for processing sound as a signal.

[0945] "Conversion technology" refers to technologies for converting audio or sound data into text data or other data formats, and includes technologies such as speech recognition.

[0946] "Textual information" refers to data that represents audio or other data as text, providing information in written form.

[0947] "Processing technology" refers to techniques for analyzing, summarizing, or reconstructing text data and other information, and includes techniques for natural language processing.

[0948] A “document” is a set of textual information generated for a specific purpose, including meeting minutes and other records.

[0949] "Communication technology" refers to the means and techniques for delivering generated information and data to others, and includes technologies such as email and other digital communications.

[0950] "Emotional information" refers to data that indicates a psychological or emotional state, analyzed from audio or text, and highlights a person's emotional tone.

[0951] A "measuring device" refers to a device or apparatus used to monitor a subject's activity, health status, or emotional state.

[0952] This invention is a system for streamlining community association operations and providing information that takes into account the feelings of participants and residents. Specific embodiments are described below.

[0953] Processing of acoustic information

[0954] The terminal uses microphones installed in the conference room to collect acoustic information from participants in real time. The collected acoustic information is immediately transmitted to the server. This process uses highly sensitive acoustic sensors and a secure data transmission protocol.

[0955] The server converts received acoustic information into text information using conversion technology. For this process, it utilizes a general API service that provides speech recognition technology, for example, as the speech recognition engine. The converted text information is further summarized using processing technology. The natural language processing technologies used here include OpenAI's generative AI models and other natural language processing frameworks.

[0956] The server then extracts emotional information from the acoustic data and uses an emotion analysis engine to analyze the emotional tones of the participants. The emotional information is integrated into the generated document, and meeting minutes that take into account the emotional aspects of the meeting are automatically produced.

[0957] Multilingual support

[0958] Users submit inquiries in text format through a dedicated application on their device. The user's input is sent to the server and translated into the specified language using automatic translation technology. The translation technology utilizes a multilingual online translation service.

[0959] The server analyzes the emotional tone of incoming inquiries and adjusts the response accordingly. The response is generated using a generative AI model, re-translated into the original language, and then provided to the user. This enables emotionally sensitive responses even to inquiries from residents.

[0960] Specific example

[0961] For example, in the processing of acoustic information, if someone strongly states "I cannot agree with this proposal" during a meeting, that emotion will be recorded in the minutes as "Mr. / Ms. XX expressed strong opposition to the proposal."

[0962] Example of a prompt

[0963] "Please transcribe today's meeting discussions, analyze the emotions involved, and then summarize them."

[0964] "Please analyze multilingual inquiries and automatically generate appropriate responses that take emotional tone into consideration."

[0965] In this way, the system supports diverse forms of communication and enables high-quality management of community associations.

[0966] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0967] Step 1:

[0968] The terminal uses microphones installed in the conference room to collect acoustic information from participants in real time. This acoustic information is acquired as analog data and transmitted to the server as a digital signal. Specifically, the audio input device converts sound waves into electrical signals.

[0969] Step 2:

[0970] The server converts received digital audio information into text information using conversion technology. This process uses a speech recognition engine to analyze the audio signal and generate corresponding text data. As a result, the conversation content is output in text format. Specifically, this involves collaborative processing by an acoustic model and a language model.

[0971] Step 3:

[0972] The server further summarizes the converted text information using processing techniques. It employs natural language processing techniques to extract key information and generate a summary. The input is text data, and the output is a concise summary. Specific operations include tokenization and grammatical analysis.

[0973] Step 4:

[0974] The server extracts emotional information from text data. Using an emotion analysis engine, it analyzes the emotional tone of each statement and assigns an emotional label to the text. In this step, text data is input, and labels indicating the emotional state are output. Specifically, the tone and emphasis patterns of the speech are analyzed.

[0975] Step 5:

[0976] The server integrates emotional information into documents and generates meeting minutes. Using a generation AI model, it constructs meeting minutes that include emotional tone. The input consists of a summary and emotional information, and the output is meeting minutes that take emotions into account. Specifically, the document structure and content are adjusted.

[0977] Step 6:

[0978] Users submit inquiries through a dedicated application, and the text is sent to the server. The input is multilingual text, which is passed to the server as inquiry data. Specifically, this involves digitizing the captured user input.

[0979] Step 7:

[0980] The server translates received queries into the specified language using translation technology. Multilingual support technology is used to perform translations between different languages. The input is text data in the original language from the user, and the output is text data in the specified language. The use of a translation engine is a concrete example.

[0981] Step 8:

[0982] The server analyzes the sentiment tone of the translated query. Sentiment analysis techniques are used to evaluate the emotional state of the query. The input is the translated text, and the output is data indicating the sentiment tone. Specifically, this includes categorizing the sentiment of the text.

[0983] Step 9:

[0984] The server generates an appropriate response based on emotional information, re-translates it, and returns it to the user. A generative AI model is used to construct emotionally sensitive responses. The output is the response in the original language. Specifically, the process involves response generation and adjustment of the translation results.

[0985] (Application Example 2)

[0986] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0987] In elderly care settings, it is difficult to grasp the emotional changes of elderly individuals in real time, resulting in the inability to provide appropriate care approaches immediately. Furthermore, in handling inquiries in multiple languages, it is necessary to provide prompt and appropriate responses that take emotions into consideration.

[0988] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0989] In this invention, the server includes means for collecting audio data in real time and converting the audio data into text data using acoustic processing technology, means for summarizing the converted text data using natural language information processing technology and automatically generating a recorded document, and means for providing real-time visual feedback on the emotional state of the elderly person using a wearable display device. This enables appropriate care responses in accordance with the emotional state of the elderly person.

[0990] "Audio data" refers to sound information that records the content of conversations and speech.

[0991] "Real-time data collection" means acquiring data simultaneously with its occurrence or with a short delay.

[0992] "Audio processing technology" is a technology for analyzing sound and converting it into textual information.

[0993] "Character data" refers to data in which information is written as characters.

[0994] "Natural language processing technology" refers to the technology that allows computers to understand, interpret, and generate language used by humans.

[0995] "Summarizing" is the process of extracting only the essential parts of a large amount of information and putting them together concisely.

[0996] A "record document" is a document in which specific information is stored in text format.

[0997] A "wearable display device" is a device that is worn on the body to display information.

[0998] "Emotional state" refers to a psychological or emotional situation or change.

[0999] The system that realizes this invention enables real-time voice and emotion analysis in elderly care and multilingual inquiry handling. The system includes various devices and servers and functions as follows:

[1000] On the device side, a wearable display device is used. This device collects the voice of elderly people in real time and sends the data to a server. The server converts this voice data into text data using acoustic processing technology. For example, everyday conversations of elderly people can be acquired directly as data via smart glasses or similar devices.

[1001] Next, the server summarizes the converted text data using natural language processing technology and automatically generates a record document. For example, if an elderly person says, "I'm a little tired today," the content is quickly converted into text and visually fed back to the care staff.

[1002] Furthermore, a wearable display device visually shows the analyzed emotional state of the elderly person. This allows care staff to intuitively understand the psychological state of the elderly person and provide more appropriate care.

[1003] As a concrete example, a prompt for a generative AI model could be something like, "Design a system that analyzes the speech of elderly people, understands their emotional tone, and displays it in real time on a smart device. The voice data will be processed in the cloud, and the results will be fed back to a wearable display device."

[1004] This system will enable flexible and prompt responses in care settings that are tailored to the emotional state of elderly individuals, leading to improved communication between caregivers and caregivers.

[1005] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[1006] Step 1:

[1007] A device (wearable display device) collects the speech of elderly individuals using a microphone and acquires it as audio data. The input is the speech of the elderly individuals, and the output is the collected audio data. The device collects data in real time and immediately transfers it to the server.

[1008] Step 2:

[1009] The server converts received audio data into text data using acoustic processing technology. Speech recognition software is used for this conversion. The input is audio data, and the output is text data. The server performs this data conversion process in real time.

[1010] Step 3:

[1011] The server summarizes the converted text data using natural language processing technology and automatically generates a recorded document. The input is text data, and the output is a summarized recorded document. A data analysis algorithm considers linguistic characteristics and extracts important information.

[1012] Step 4:

[1013] The server uses an emotion analysis engine to analyze the emotional state of elderly individuals from text data. The input is text data, and the output is the result of the emotional state analysis. Through emotion analysis, the server detects feelings from the tone and content of speech.

[1014] Step 5:

[1015] The wearable display device feeds back analyzed emotional states, visually displaying the emotional state of elderly individuals. The input is the analysis result of the emotional state, and the output is the displayed emotional information. The device provides a visual and intuitive interface, enabling care staff to respond quickly to the situation.

[1016] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[1017] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[1018] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[1019] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[1020] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[1021] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[1022] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[1023] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[1024] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[1025] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[1026] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[1027] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[1028] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[1029] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[1030] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[1031] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[1032] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[1033] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[1034] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[1035] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[1036] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted as being incorporated by reference.

[1037] The following is further disclosed regarding the embodiments described above.

[1038] (Claim 1)

[1039] A means of collecting audio data during a meeting in real time and converting that audio data into text data using speech recognition technology,

[1040] A means of summarizing the converted text data using natural language processing technology and automatically generating meeting minutes,

[1041] The means of distributing the generated meeting minutes to participants by email or other means of communication,

[1042] A system that includes this.

[1043] (Claim 2)

[1044] A means of receiving multilingual inquiries from residents and translating them into a specified language using automatic translation technology,

[1045] A means of extracting appropriate information from a database in response to an incoming inquiry and generating an answer,

[1046] A means of re-translating the generated response into the original language and providing it to the resident who made the inquiry,

[1047] The system according to claim 1, including the following:

[1048] (Claim 3)

[1049] A means of using sensor devices to monitor the activity of elderly people,

[1050] A means of applying machine learning algorithms to analyze monitored data and detect anomalies,

[1051] A means of sending notifications to pre-registered contacts regarding detected anomalies,

[1052] The system according to claim 1, including the following:

[1053] "Example 1"

[1054] (Claim 1)

[1055] A means for collecting sound data in real time using a speech acquisition device and converting that sound data into text data using speech analysis technology,

[1056] A means of automatically generating meeting minutes by summarizing the converted text data using natural language processing technology,

[1057] A means of using communication to provide the generated meeting minutes to meeting participants,

[1058] A system that includes this.

[1059] (Claim 2)

[1060] A means of receiving questions in multiple languages and translating them into a specific language using automatic translation technology,

[1061] A means for retrieving appropriate information from an information storage medium based on a received question and generating an answer,

[1062] A means of retranslating the generated answer back into the original language and providing it to the questioner,

[1063] The system according to claim 1, including the following:

[1064] (Claim 3)

[1065] A means of using observation devices to monitor the living conditions of the elderly,

[1066] A means of using an automated learning algorithm to analyze monitored information and detect anomalies,

[1067] A means of sending notifications to pre-registered contacts regarding detected anomalies,

[1068] The system according to claim 1, including the following:

[1069] "Application Example 1"

[1070] (Claim 1)

[1071] A means for collecting audio information during a meeting in real time and converting that audio information into text information using speech recognition technology,

[1072] A means for summarizing converted text information using natural language processing technology and automatically generating meeting minutes,

[1073] Means of using email or other communication methods to distribute the generated meeting minutes to members,

[1074] A method for obtaining and viewing the minutes of neighborhood association meetings using a program installed on a smartphone,

[1075] A system that includes this.

[1076] (Claim 2)

[1077] A means of receiving multilingual inquiries from residents and translating them into a specified language using automatic translation technology,

[1078] A means of extracting appropriate information from a database in response to an incoming inquiry and generating an answer,

[1079] A means of re-translating the generated response into the original language and providing it to the resident who made the inquiry,

[1080] The system according to claim 1, which includes means for making multilingual inquiries and receiving responses in real time using a smartphone.

[1081] (Claim 3)

[1082] A means of using a sensor device to measure the activity of elderly people,

[1083] A means of applying machine learning algorithms to analyze measured data and detect anomalies,

[1084] A means of sending notifications to pre-registered contacts regarding detected anomalies,

[1085] The system according to claim 1, which includes means for receiving anomaly notifications in real time on a smartphone and prompting a quick response.

[1086] "Example 2 of combining an emotion engine"

[1087] (Claim 1)

[1088] A means for collecting acoustic information in real time and converting that acoustic information into text information using conversion technology,

[1089] A means for automatically generating a document by summarizing the converted text information using processing technology,

[1090] Means of using communication technology to distribute the generated documents to participants,

[1091] A means for extracting emotional information from acoustic information and integrating that extracted emotional information into a document,

[1092] A system that includes this.

[1093] (Claim 2)

[1094] A means for receiving multilingual inquiries and converting them into a specified language using translation technology,

[1095] A means to analyze the emotional tone of received inquiries and adjust responses more appropriately,

[1096] A means of converting the translated response back into the original language and providing it to the source of the inquiry,

[1097] The system according to claim 1, including the following:

[1098] (Claim 3)

[1099] Means of using measuring devices to monitor individual activities,

[1100] A means of applying technology to detect emotional states by analyzing monitoring data,

[1101] A means of notifying registered contacts when an unusual emotional state or behavior is detected,

[1102] The system according to claim 1, including the following:

[1103] "Application example 2 when combining with an emotional engine"

[1104] (Claim 1)

[1105] A means for collecting audio data in real time and converting that audio data into text data using acoustic processing technology,

[1106] A means for summarizing converted character data using natural language processing technology and automatically generating a record document,

[1107] A means of using electronic communication to distribute the generated record documents to users,

[1108] A means of providing real-time visual feedback on the emotional state of elderly people using a wearable display device,

[1109] A system that includes this.

[1110] (Claim 2)

[1111] A means of receiving inquiry information and translating it into a specified language using machine translation technology,

[1112] A means for extracting appropriate information from a source in response to a received inquiry and generating a response,

[1113] A means of retranslating the generated response back into the source language and providing it to the requester,

[1114] The system according to claim 1, including the following:

[1115] (Claim 3)

[1116] A means of using a detection device to monitor user activity,

[1117] A means for applying a data analysis algorithm to analyze monitored data and detect anomalies,

[1118] A means of sending notifications to pre-registered contacts regarding detected anomalies,

[1119] The system according to claim 1, including the following: [Explanation of Symbols]

[1120] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of collecting audio data during a meeting in real time and converting that audio data into text data using speech recognition technology, A means of summarizing the converted text data using natural language processing technology and automatically generating meeting minutes, The means of distributing the generated meeting minutes to participants by email or other means of communication, A system that includes this.

2. A means of receiving multilingual inquiries from residents and translating them into a specified language using automatic translation technology, A means of extracting appropriate information from a database in response to an incoming inquiry and generating an answer, A means of re-translating the generated response into the original language and providing it to the resident who made the inquiry, The system according to claim 1, including the following:

3. A means of using sensor devices to monitor the activity of elderly people, A means of applying machine learning algorithms to analyze monitored data and detect anomalies, A means of sending notifications to pre-registered contacts regarding detected anomalies, The system according to claim 1, including the following: