system

The system addresses inefficiencies in local organizations by integrating data storage, conversion, translation, and monitoring to streamline operations, improve communication, and ensure elderly care, enhancing operational efficiency and community sustainability.

JP2026101309APending Publication Date: 2026-06-22SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-10
Publication Date
2026-06-22

AI Technical Summary

Technical Problem

Local organizations face inefficiencies in managing operation-related information, language translation needs, and monitoring the elderly, exacerbated by aging workforce shortages and language diversification, leading to a lack of effective systems for streamlined operations.

Method used

A system comprising storage, conversion, translation, analysis, and monitoring means to manage operational information, translate text across languages, coordinate schedules, and monitor elderly activities, using servers, terminals, and AI for efficient data processing and communication.

Benefits of technology

The system enhances operational efficiency, reduces operator burden, and promotes community sustainability by providing centralized data management, real-time translation, and proactive elderly care, ensuring smooth communication and safety.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026101309000001_ABST
    Figure 2026101309000001_ABST
Patent Text Reader

Abstract

Provide a system. 【Solution means】 Data management means for storing business information of local organizations, Data conversion means for converting audio materials into text materials, Language conversion means for translating the text materials into multiple languages, Information analysis means for adjusting the schedule of local activities, Dynamic monitoring means for monitoring the behavior of elderly individuals, Information presentation means for presenting the information generated from each of the above means to the user, Data acquisition means for collecting audio materials in real time, Information sharing means for dynamically displaying the information and sharing information among residents, A system including the above.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including the steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of the chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] Currently, many local organizations manage operation-related information in an analog manner, imposing a huge burden on human hands. In addition, in areas where the aging process is advancing, there is a shortage of officers and operators, and the decrease in new members is also a major problem. These problems are caused by the complexity of various operations such as meeting minutes creation, the need for translation due to language diversification, regional event schedule adjustment, and the monitoring of the elderly. Although there is a desire to provide a system that can efficiently solve these problems, such a system is not currently sufficiently provided.

Means for Solving the Problems

[0005] To solve this problem, the present invention provides a system comprising a storage means for storing operational information of local organizations, a conversion means for converting audio data into text data, a translation means for translating text data into multiple languages, an analysis means for adjusting the schedule of local events, a monitoring means for monitoring the activities of the elderly, and a display means for providing information generated from each means to users. This system will streamline the operation of local organizations, reduce the burden on operators, and promote the sustainable development of communities.

[0006] A "memory device" is a device or function for storing and managing data and information.

[0007] "Conversion means" refers to a device or method that has the function of converting audio data into text data.

[0008] "Translation means" refers to a device or method for converting text data into a different language.

[0009] "Analysis means" refers to a device or method for analyzing input data and deriving the optimal solution.

[0010] "Monitoring means" refers to a device or mechanism for continuously observing the situation or trends of a specific target and detecting any abnormalities.

[0011] "Display means" refers to a device or function for providing information to a user visually. [Brief explanation of the drawing]

[0012] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4]It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Embodiments for Carrying Out the Invention

[0013] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0014] First, the language used in the following description will be explained.

[0015] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0016] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0017] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs, various parameters, and the like. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes.

[0018] In the following embodiments, the numbered communication I / F (Interface) is an interface that includes a communication processor, an antenna, and the like. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0019] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0020] [First Embodiment]

[0021] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0022] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0023] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0024] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0025] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0026] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0027] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0028] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0029] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0030] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0031] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0032] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0033] This invention is a system designed to support the operation of local organizations, aiming to improve operational efficiency and revitalize communities by integrating and providing multiple functions. The system comprises multiple means, each playing a specific role.

[0034] First, the server comprehensively manages information related to the operation of the local organization. It has a database as a storage method, which stores member information, meeting records, schedules of local events, and information on monitoring elderly residents. User terminals provide an interface to access this database, allowing users to retrieve necessary information and input new information.

[0035] Audio data from meetings is collected using terminals and transmitted to a server in real time. The server uses a conversion device to convert the audio data into text data and records it. The generated meeting minutes can then be reviewed and edited by the user.

[0036] Furthermore, users can request translations of meeting minutes and other related documents into other languages. The server uses translation tools and generative AI to perform multilingual translations. The translation results are then displayed to the user in a usable format.

[0037] For coordinating local events, users input their preferred dates on a terminal, and the server uses analysis tools to adjust participants' schedules and determine the optimal schedule. This facilitates smooth communication among participants and enables efficient scheduling.

[0038] Furthermore, as part of the elderly monitoring service, the server monitors the elderly person's activity status using monitoring devices. Here, GPS information and activity data are utilized, and if an abnormality is detected, an alert is sent to the user's terminal.

[0039] As a concrete example, consider an annual general meeting of a local community organization. In this case, the user starts the meeting, and the device collects the audio. The server converts it to text and saves it as meeting minutes. If there are members speaking different languages, the server performs the necessary translations. After the meeting, the user reviews the translated minutes and shares them with other participants from their device. This collaboration enables all stakeholders to make quick decisions based on accurate information.

[0040] Thus, the system provided by the present invention efficiently and effectively contributes to the diverse operational needs of local organizations, supporting the revitalization and sustainable development of local communities.

[0041] The following describes the processing flow.

[0042] Step 1:

[0043] The user launches the application on their device and presses the "Start Meeting" button. The device begins collecting audio data and sends it to the server in real time.

[0044] Step 2:

[0045] The server converts the received audio data into text data using a speech recognition API. This text data is then stored in a database as meeting minutes.

[0046] Step 3:

[0047] The user operates the terminal to review the generated meeting minutes. If necessary, the user can modify the contents of the meeting minutes.

[0048] Step 4:

[0049] When a user requests multilingual translation, they press the translate button on their device and specify the required languages. The device then sends this request to the server.

[0050] Step 5:

[0051] The server uses a generative AI to translate the meeting minutes into the specified language. The translated text is stored in a database, and a notification is sent to the user when the translation is complete.

[0052] Step 6:

[0053] Users can view translated meeting minutes on their devices. They can also use their devices to download translated meeting minutes as needed, or share them with other members.

[0054] Step 7:

[0055] When a user is coordinating the schedule for a local event, they enter potential dates on their device and send them to the server. The server then aggregates the participants' suggested dates and analyzes to determine the optimal schedule.

[0056] Step 8:

[0057] The server analyzes the optimal schedule and records it in a database, then notifies all participants of the decided schedule. The terminal then displays the schedule in a format that is easy for the user to understand.

[0058] Step 9:

[0059] In the elderly monitoring service, the server periodically monitors the activity data of the elderly person. If an anomaly is detected, the server immediately sends an alert to the registered contact. The user receives the alert on their device and can take the necessary action.

[0060] (Example 1)

[0061] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0062] The operation of local community organizations requires managing diverse information, facilitating communication among members who speak different languages, effectively coordinating event schedules, and ensuring the safe monitoring of the elderly. However, managing these tasks individually using conventional methods is inefficient and consumes a great deal of effort and time. This invention aims to efficiently meet these multiple needs with a single system and support the operation of local communities.

[0063] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0064] In this invention, the server includes a storage means for storing data related to the operation of a local organization, a conversion means for converting audio data into text information, and a translation means for translating into multiple languages. This enables centralized management of various types of information of the local organization, promotion of real-time communication, and support for understanding among participants through multilingual support.

[0065] A "memory device" is a device used to centrally store all data related to the operation of a local organization, for later reference or updating.

[0066] "Conversion method" refers to the process or technology used to convert audio data collected at meetings and other events into analyzable text information.

[0067] "Translation methods" refer to technologies that use generative AI models to convert text information into multiple languages, enabling mutual understanding between languages.

[0068] "Analysis means" refers to a device or program that has the function of calculating the optimal event schedule based on input date information or existing schedule data.

[0069] "Monitoring means" refers to devices or systems for observing the activity status of elderly people in real time and detecting abnormalities necessary to ensure their safety.

[0070] "Display means" refers to devices or methods that present information to users visually or audibly and enable interaction.

[0071] "Editing methods" refer to the processes and functions used to allow users to easily modify, supplement, and save generated meeting minutes and translations.

[0072] A "coordination tool" is a system that takes into account various conditions among participants and efficiently adjusts the schedule of local events.

[0073] An "alarm system" is a mechanism for issuing a warning to the user when an abnormality is detected by the monitoring system.

[0074] Modes for carrying out the invention

[0075] This invention aims to build a system to support the operation of local organizations and to achieve efficient and effective operation. This system integrates multiple digital means to manage, analyze, and present the information necessary for operation.

[0076] The server stores data related to the operation of local organizations in an SQL database. This database contains member information, meeting minutes, schedules for local events, and activity data for senior citizens. The server uses the Google® Cloud Speech-to-Text API to convert audio data sent from devices during meetings into text data. This text data is recorded as meeting minutes and can be reviewed and modified by users later.

[0077] Furthermore, the server can use OpenAI's (registered trademark) generative AI model to perform multilingual translation of text data. This translation function facilitates smoother communication among members who speak different languages.

[0078] The user terminal functions as an access interface to the server, allowing users to easily input and retrieve data. Users enter the desired event dates into the terminal, and the system proposes the optimal schedule. This enables efficient event management that avoids scheduling conflicts among participants.

[0079] In elderly monitoring services, a server monitors activity levels through various sensors and GPS devices, and sends an alert to the terminal if an anomaly is detected. Users can take quick action based on these alerts, ensuring the safety of the elderly.

[0080] A concrete example is the annual general meeting of a local community organization. At the start of the meeting, the user collects audio via their device, and the server instantly transcribes the audio into text. Furthermore, if there are members speaking different languages, the server performs translation and provides real-time support during the meeting. This collaboration allows all participants to make decisions based on accurate information.

[0081] Example of a prompt:

[0082] "This system should convert audio data collected during community group meetings into text and generate meeting minutes that are easy for participants to understand. It should also translate the minutes if there are members with multiple language skills. The results should be displayed for easy review."

[0083] In this way, this system integrates a variety of functions necessary for the operation of local organizations, contributing to community revitalization and smooth management.

[0084] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0085] Step 1:

[0086] The server uses an SQL database to store operational information for local organizations. Users can input member information and event schedules into the database via their terminals. This input data is stored directly in digital format and can be searched and updated as needed. The stored data serves as foundational information used in subsequent processing steps.

[0087] Step 2:

[0088] The user starts a meeting and collects audio data in real time on their device. The collected audio data is sent from the device to the server. The server uses the Google Cloud Speech-to-Text API to convert the audio data into text data. The input to this conversion process is an audio file, and the output is transcribed text. The generated text is recorded in a database as meeting minutes.

[0089] Step 3:

[0090] Users can view meeting minutes in the database and request translations as needed. The server utilizes OpenAI's generative AI model to translate text data into multiple languages. The text received as input is translated into the selected language, and the result is provided to the user. This prompt text is passed to the generative AI to improve translation accuracy.

[0091] Step 4:

[0092] Users enter their desired dates for local events from their devices. This input is analyzed by the server along with existing event information. An optimal schedule is then formulated using a coordination mechanism. When determining the schedule, adjustments are made based on the entered desired dates and existing event schedule data to check for any conflicts. The determined schedule is then notified to the user.

[0093] Step 5:

[0094] The server monitors the activity levels of elderly individuals and collects data in real time. Specifically, it analyzes input from GPS devices and various sensors, and issues an alarm if an anomaly is detected. This immediately sends an alert to the user's terminal, ensuring the safety of the elderly. In this process, the monitoring system identifies deviations from normal behavior patterns based on various activity data input.

[0095] (Application Example 1)

[0096] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0097] In the operation of local community organizations, challenges include inefficient information sharing, communication difficulties due to language barriers, and ensuring the safety of the elderly. Furthermore, there is a need to collect and share information in real time and promote cooperation among residents.

[0098] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0099] In this invention, the server includes data acquisition means for collecting audio materials in real time, information analysis means for coordinating the schedule of community activities, and dynamic monitoring means for monitoring the behavior of elderly individuals. This enables real-time information collection and sharing, facilitating efficient information exchange among community residents and ensuring the safety of the elderly.

[0100] A "community group" is a collection of residents or organizations that share a common purpose and engage in activities within a specific region.

[0101] "Business information" refers to all information, including records and data related to the operation of local organizations.

[0102] "Data management means" refers to a device or method for systematically storing, organizing, and retrieving information.

[0103] "Audio materials" refer to audio data collected at meetings, events, and other similar occasions.

[0104] "Textual data" refers to data obtained by converting audio data into text.

[0105] "Data conversion means" refers to a device or method for converting audio material into text material.

[0106] "Language conversion means" refers to a device or method for translating written material into a different language.

[0107] "Information analysis means" refers to a device or method that analyzes collected data and generates information to support decision-making.

[0108] An "elderly individual" refers to an individual belonging to the age group considered to be elderly.

[0109] "Dynamic monitoring means" refers to a device or method for monitoring an individual's actions and location and detecting abnormalities.

[0110] "Information presentation means" refers to a device or method for displaying analyzed information in an easily understandable manner to the user.

[0111] "Data acquisition means" refers to a device or method for collecting information such as audio or text.

[0112] "Information sharing means" refers to a device or method for providing collected and analyzed information to multiple users.

[0113] The system implementing this invention is designed to support the operation of local organizations. Specifically, a server collects audio materials using real-time data acquisition means. The audio materials are converted into text materials by data conversion means. This text material is translated into multiple languages ​​using language conversion means. The translated information is presented to the user by information presentation means.

[0114] Furthermore, the server is equipped with information analysis tools to analyze and adjust the schedule of community activities. In addition, for the safety management of the elderly, a dynamic monitoring system monitors the actions of individual elderly people and issues an alarm if an abnormality is detected. This enables both the safety of the elderly and the efficiency of community activities.

[0115] User terminals are configured to receive information provided by the server anytime, anywhere. The terminals utilize information sharing tools to share translated text materials and activity schedules with all residents of the community. This sharing function allows members of community organizations to access information smoothly, without being restricted by specific languages ​​or locations.

[0116] As a concrete example, in a certain area, there are regular cleaning activities, and audio guides and schedules are provided through an app. Residents can access the system through this app and easily check activity information. Furthermore, it is possible to utilize the information in the form of prompts generated by a generative AI model, such as, "An audio guide is needed for the local cleaning activity. Please use the app to gather participants, record the activity, translate and share it." In this way, this invention can dramatically improve the operational efficiency of community organizations.

[0117] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0118] Step 1:

[0119] The server collects audio data in real time. Users record audio from meetings and events using their devices and send it to the server. Based on this input, the data acquisition system records the audio data in digital format. This data is temporarily stored for use in subsequent processing.

[0120] Step 2:

[0121] The server converts audio data into text data. Using data conversion means, it applies a speech recognition algorithm to convert the audio data into text data. This conversion makes the audio information available in a format that users can review and edit.

[0122] Step 3:

[0123] The server translates textual materials into multiple languages. The resulting text data is then translated into the specified target language using language conversion tools. A generative AI model assists in this process, streamlining the translation. The translation results make the information available to users in different languages.

[0124] Step 4:

[0125] The terminal displays and shares translated information with the user. The translation results are displayed on the user's screen using an information presentation tool. Furthermore, the translated data is shared with other members of the local organization using an information sharing tool. This process removes language barriers and facilitates the smooth flow of information.

[0126] Step 5:

[0127] The server adjusts the schedule for local events. It analyzes the dates entered by users and uses information analysis tools to determine the optimal dates for participation. Participant schedule information is used as input, and the adjusted schedule is output as the analysis result.

[0128] Step 6:

[0129] The server monitors the movements of elderly individuals and detects any anomalies. A motion monitoring system receives GPS data as input and monitors their range of movement. It is configured to issue an alarm when an anomaly is detected. This operation ensures the safety of elderly individuals.

[0130] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0131] This invention is a system for streamlining the operation of local organizations and improving the user experience, and by incorporating an emotion engine, it enables the provision of services that respond to the emotions of users. This system comprises a storage means, a conversion means, a translation means, an analysis means, a monitoring means, a display means, and an emotion engine.

[0132] First, the server comprehensively manages information for the operation of local organizations. This information is stored in a database using storage devices and includes membership information, meeting minutes, schedules, and monitoring information for the elderly. User terminals provide an interface for accessing this database, allowing users to view information and enter new information as needed.

[0133] During meetings and events, users collect audio data via their devices, and the server converts the audio data into text using a conversion device. If translation is required, the server's translation device translates the text data into multiple languages ​​and provides it to the user. Furthermore, the server uses an analysis device to coordinate the schedule of local events, proposes the optimal date, and reflects it in a display device that allows the user to operate intuitively.

[0134] Furthermore, by using an emotion engine, the system can recognize emotions from the user's voice and text. For example, if tension or stress is detected during a meeting, the server generates a notification and sends a message to the user via a display device encouraging relaxation. In addition, in the elderly monitoring function, if an unusual emotional state is detected, an alert is quickly issued and relevant parties are notified.

[0135] As a concrete example, consider a local board meeting. Users collect opinions from the meeting using their devices and send them to a server. The server transcribes the audio into text and then translates it into the required language. An emotion engine analyzes the atmosphere of the meeting, and if it determines that participants are experiencing stress, it informs the resource manager and provides information to support better communication.

[0136] Thus, the system provided by the present invention significantly improves the operational efficiency of local organizations, enhances the user experience through emotion recognition, and helps maintain harmony throughout the community.

[0137] The following describes the processing flow.

[0138] Step 1:

[0139] The user launches the application on their device and presses the start meeting button. The device begins collecting audio data and sends it to the server in real time.

[0140] Step 2:

[0141] The server uses a conversion mechanism to convert the received audio data into text data. This text data is then stored in a database as meeting minutes.

[0142] Step 3:

[0143] The server uses an emotion engine to analyze the user's emotions from the transmitted voice and text data. Based on the analysis, if specific emotions such as tension or stress are detected, it generates support messages in real time.

[0144] Step 4:

[0145] When a user requests a translation, they select the desired language on their device and send the request to the server. The server uses a translation tool to translate the text data into the specified language and saves the result to the database.

[0146] Step 5:

[0147] The server uses analytical tools to coordinate the schedule of local events. Users input multiple candidate dates from their terminals, and the server calculates the optimal date based on these inputs and records it in the database.

[0148] Step 6:

[0149] Through the terminal, users can view meeting minutes and translation results provided by the server. Furthermore, an interface incorporating sentiment analysis results allows users to receive feedback tailored to their own stress levels.

[0150] Step 7:

[0151] In monitoring elderly individuals, the server uses monitoring tools to continuously analyze the user's behavior and emotional state. If abnormal emotions are detected, an alert is promptly sent to the relevant administrator's terminal, prompting them to take appropriate action.

[0152] (Example 2)

[0153] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0154] Managing diverse information and providing multilingual support are essential for community organizations. Furthermore, it's crucial to accurately understand participants' emotional states and facilitate smooth communication. However, comprehensive systems for efficiently achieving these goals are limited, making operational efficiency and participant satisfaction challenging.

[0155] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0156] In this invention, the server includes a storage means for storing operational information of a regional organization, a conversion means for converting audio information into text information, and a language conversion means for translating the text information into other languages. This enables the management of diverse information.

[0157] A "regional organization" is an organization or group that exists geographically or functionally within a specific area.

[0158] "Operational information" refers to all data and records related to the activities and management of an organization.

[0159] "Memory device" refers to a device or technology for storing information in digital format.

[0160] "Audio information" refers to recorded data that includes language and sounds.

[0161] "Textual information" refers to data expressed in text format.

[0162] "Conversion means" refers to a device or technology for converting information in one format into another format.

[0163] "Language conversion means" refers to a device or technology for translating information written in one language into another language.

[0164] "Analytical means" refers to devices or techniques for analyzing data and extracting specific conclusions or information.

[0165] "Surveillance means" refers to devices or technologies used to continuously observe individuals or situations.

[0166] "Emotion identification means" refers to a device or technology for detecting and identifying emotions from information.

[0167] "Notification generation means" refers to a device or technology for generating notifications based on specific information or conditions.

[0168] "Display means" refers to a device or technology for visually presenting information.

[0169] This system aims to streamline the operation of local organizations and improve communication by understanding participants' emotions. The system has a structure that combines multiple technological means.

[0170] The server plays a central role in information management and data processing, and operates on specific hardware. In particular, it uses high-capacity hard disk drives or solid-state drives as storage media for data. To convert audio information into text, the server employs speech recognition software. For multilingual support, a machine translation system is installed on the server to perform translation processing in real time. For emotion identification, a sentiment analysis engine utilizing a generative AI model is incorporated, allowing it to determine emotions from audio and text. This engine detects the user's stress level and level of comfort, generates notifications as needed, and communicates them to the user via the terminal.

[0171] User terminals can be smartphones, tablets, or personal computers, which provide the user interface. The terminal communicates with the server via an internet connection, sending and receiving data in real time. Furthermore, the display on the terminal uses HTML, CSS, and JavaScript (registered trademark) to create an intuitive interface and enhance usability.

[0172] As a concrete example, consider a scenario in a local board meeting where a user transmits the meeting audio to a server using their device. This audio is transcribed by the server and then translated into multiple languages. An emotion recognition engine analyzes the stress detected during the meeting, and the server provides feedback to the user through a notification generation mechanism. This feedback includes content designed to promote relaxation.

[0173] An example of a prompt message might be, "Detect the stress levels of all participants during the meeting and suggest appropriate actions as needed." This would not only improve the operational efficiency of local organizations but also support harmony within the community as a whole.

[0174] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0175] Step 1:

[0176] Users collect audio data from meetings and events using devices such as smartphones and tablets. They record the audio clearly using the device's microphone and send the data to a server. The input is audio data, and the output is data transmission to the server. This process digitizes the content of meetings.

[0177] Step 2:

[0178] The server converts the received audio data into text data using a conversion mechanism. This process utilizes speech recognition software to convert the audio data into text format. The input is audio data, and the output is text data. As a result, the audio is saved as text and used for subsequent processing.

[0179] Step 3:

[0180] If necessary, the server translates the converted text data into multiple languages ​​using translation tools. A machine translation system is used to convert the text data into the specified languages. The input is text data, and the output is text data in multiple languages. This allows information to be shared with users who speak different languages.

[0181] Step 4:

[0182] The server identifies emotions from text data using emotion recognition methods based on a generative AI model. The input is translated or untranslated text data, and the output is the emotion analysis result. The server analyzes emotions from the context of the text and detects tension and stress.

[0183] Step 5:

[0184] The server generates relaxation-promoting notifications based on identified emotions and presents them to the user using a notification generation mechanism. The input is the emotion analysis result, and the output is the notification message. This allows the user to receive feedback on their own and other participants' emotional states.

[0185] Step 6:

[0186] The user terminal delivers notifications from the server to the user using a display mechanism. The notification content is visually presented on the display screen. The input is the notification message, and the output is the user's confirmation of the display. Through this operation, the user receives specific suggestions and warnings.

[0187] (Application Example 2)

[0188] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0189] In the environments where elderly people live, there is a need to monitor not only their physical condition but also their emotional state in real time, and to take swift and appropriate action when abnormalities are detected. Furthermore, there is a challenge in the lack of information that allows caregivers to efficiently provide care to the elderly and to take approaches tailored to each individual's situation.

[0190] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0191] In this invention, the server includes a storage means for storing operational information of local organizations, a conversion means for converting audio data into text data, a translation means for translating into multiple languages, an analysis means for adjusting the schedule of local events, a monitoring means for monitoring the movements of elderly people, an emotion analysis means for analyzing the emotions of users and notifying relevant parties if an abnormality is detected, and a notification generation means for generating appropriate notifications in the care environment and prompting relevant parties. This makes it possible to continuously monitor the physical and emotional condition of elderly people in care facilities and respond quickly.

[0192] A "memory device" refers to a device or system that stores operational information and data of a local organization and saves it in a format that can be used as needed.

[0193] A "conversion means" is a device or system that performs the process of converting data acquired as audio into text information.

[0194] "Translation means" refers to a device or system that has the function of converting the translated text into another language.

[0195] "Analysis means" refers to a device or system that examines and analyzes information to create an optimal schedule or plan.

[0196] "Monitoring measures" refer to devices or systems that observe the behavior and circumstances of elderly people and detect any abnormalities.

[0197] "Display means" refers to a device or system for visually presenting obtained information to the user.

[0198] An "emotional analysis tool" is a device or system that analyzes a user's emotions from their voice and behavior and responds accordingly.

[0199] A "notification generation means" is a device or system that has the function of constructing messages for sending necessary information or warnings to users or caregivers.

[0200] The system of this invention consists of a server and a user terminal. The server has a storage means for centrally managing the operational information of local organizations, which includes membership information, meeting minutes, schedules, and information on the status of elderly members. The user terminal provides the user with an interface for accessing this information, enabling them to check and input new information.

[0201] The server includes a conversion mechanism that uses speech recognition APIs (e.g., Google Cloud Speech-to-Text) to convert audio data into text data. If the text data needs to be translated into multiple languages, this is done using translation tools such as the Google Cloud Translation API. For analysis, it uses algorithms to optimize scheduling of meetings and local events, proposing the most suitable schedule.

[0202] In an example implemented in a nursing care facility, the server analyzes the voices and behaviors of elderly residents using an emotion analysis library (e.g., IBM Watson® Natural Language Understanding) to understand their emotional state in real time. If an anomaly is detected through the monitoring system, an alert is quickly sent to the relevant parties.

[0203] Furthermore, the notification generation system constructs messages to inform users and caregivers of emotional states or abnormalities, and provides these messages to caregivers via smartphone or tablet displays. Here, a generation AI model uses prompts to create appropriate messages. For example, if a user shows signs of anxiety, a message such as, "The user appears to be emotionally unstable. Please speak to them gently to reassure them," might be displayed to the caregiver.

[0204] An example of a prompt message for the generating AI model would be: "Analyze the user's voice data and determine their emotional state. If stress or anxiety is detected, generate an appropriate message to notify the caregiver."

[0205] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0206] Step 1:

[0207] The user collects voice recordings of elderly individuals using a device. The voice data is then sent to a server. The input is the user's voice data, and the output is the voice data that has been transferred to the server.

[0208] Step 2:

[0209] The server receives audio data and converts it to text data using a speech recognition API. The input is the received audio data, and the output is the transcribed audio data. The audio is analyzed using a speech recognition API (e.g., Google Cloud Speech-to-Text) and the corresponding text is generated.

[0210] Step 3:

[0211] The server uses an emotion analysis library (e.g., IBM Watson Natural Language Understanding) to analyze the user's emotional state from text data. The input is transcribed audio data, and the output is the analysis result indicating the user's emotional state. The text is analyzed to determine emotions such as positive or negative.

[0212] Step 4:

[0213] The server, as needed, uses the analysis results to create an appropriate notification message using a generative AI model. The input is the analysis result of the emotional state, and the output is the message to be provided to the caregiver. Prompt sentences are input to the generative AI model to construct an appropriate message based on the emotional state.

[0214] Step 5:

[0215] The server uses a notification generation mechanism to send a notification message to the caregiver's terminal. The input is the generated notification message, and the output is the message displayed on the caregiver's terminal. The message is sent to the terminal via the network to inform the caregiver of the situation.

[0216] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0217] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0218] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0219] [Second Embodiment]

[0220] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0221] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0222] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0223] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0224] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0225] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0226] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0227] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0228] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0229] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0230] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0231] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0232] This invention is a system designed to support the operation of local organizations, aiming to improve operational efficiency and revitalize communities by integrating and providing multiple functions. The system comprises multiple means, each playing a specific role.

[0233] First, the server comprehensively manages information related to the operation of the local organization. It has a database as a storage method, which stores member information, meeting records, schedules of local events, and information on monitoring elderly residents. User terminals provide an interface to access this database, allowing users to retrieve necessary information and input new information.

[0234] Audio data from meetings is collected using terminals and transmitted to a server in real time. The server uses a conversion device to convert the audio data into text data and records it. The generated meeting minutes can then be reviewed and edited by the user.

[0235] Furthermore, users can request translations of meeting minutes and other related documents into other languages. The server uses translation tools and generative AI to perform multilingual translations. The translation results are then displayed to the user in a usable format.

[0236] For coordinating local events, users input their preferred dates on a terminal, and the server uses analysis tools to adjust participants' schedules and determine the optimal schedule. This facilitates smooth communication among participants and enables efficient scheduling.

[0237] Furthermore, as part of the elderly monitoring service, the server monitors the elderly person's activity status using monitoring devices. Here, GPS information and activity data are utilized, and if an abnormality is detected, an alert is sent to the user's terminal.

[0238] As a concrete example, consider an annual general meeting of a local community organization. In this case, the user starts the meeting, and the device collects the audio. The server converts it to text and saves it as meeting minutes. If there are members speaking different languages, the server performs the necessary translations. After the meeting, the user reviews the translated minutes and shares them with other participants from their device. This collaboration enables all stakeholders to make quick decisions based on accurate information.

[0239] Thus, the system provided by the present invention efficiently and effectively contributes to the diverse operational needs of local organizations, supporting the revitalization and sustainable development of local communities.

[0240] The following describes the processing flow.

[0241] Step 1:

[0242] The user launches the application on their device and presses the "Start Meeting" button. The device begins collecting audio data and sends it to the server in real time.

[0243] Step 2:

[0244] The server converts the received audio data into text data using a speech recognition API. This text data is then stored in a database as meeting minutes.

[0245] Step 3:

[0246] The user operates the terminal to review the generated meeting minutes. If necessary, the user can modify the contents of the meeting minutes.

[0247] Step 4:

[0248] When a user requests multilingual translation, they press the translate button on their device and specify the required languages. The device then sends this request to the server.

[0249] Step 5:

[0250] The server uses AI-generated text to translate the meeting minutes into the specified language. The translated text is stored in a database, and a notification is sent to the user when the translation is complete.

[0251] Step 6:

[0252] Users can view translated meeting minutes on their devices. They can also use their devices to download translated meeting minutes as needed, or share them with other members.

[0253] Step 7:

[0254] When a user is coordinating the schedule for a local event, they enter potential dates on their device and send them to the server. The server then aggregates the participants' suggested dates and analyzes to determine the optimal schedule.

[0255] Step 8:

[0256] The server analyzes the optimal schedule and records it in a database, then notifies all participants of the decided schedule. The terminal then displays the schedule in a format that is easy for the user to understand.

[0257] Step 9:

[0258] In the elderly monitoring service, the server periodically monitors the activity data of the elderly person. If an anomaly is detected, the server immediately sends an alert to the registered contact. The user receives the alert on their device and can take the necessary action.

[0259] (Example 1)

[0260] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0261] The operation of local community organizations requires managing diverse information, facilitating communication among members who speak different languages, effectively coordinating event schedules, and ensuring the safe monitoring of the elderly. However, managing these tasks individually using conventional methods is inefficient and consumes a great deal of effort and time. This invention aims to efficiently meet these multiple needs with a single system and support the operation of local communities.

[0262] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0263] In this invention, the server includes a storage means for storing data related to the operation of a local organization, a conversion means for converting audio data into text information, and a translation means for translating into multiple languages. This enables centralized management of various types of information of the local organization, promotion of real-time communication, and support for understanding among participants through multilingual support.

[0264] A "memory device" is a device used to centrally store all data related to the operation of a local organization, for later reference or updating.

[0265] "Conversion method" refers to the process or technology used to convert audio data collected at meetings and other events into analyzable text information.

[0266] "Translation methods" refer to technologies that use generative AI models to convert text information into multiple languages, enabling mutual understanding between languages.

[0267] "Analysis means" refers to a device or program that has the function of calculating the optimal event schedule based on input date information or existing schedule data.

[0268] "Monitoring means" refers to devices or systems for observing the activity status of elderly people in real time and detecting abnormalities necessary to ensure their safety.

[0269] "Display means" refers to devices or methods that present information to users visually or audibly and enable interaction.

[0270] "Editing methods" refer to the processes and functions used to allow users to easily modify, supplement, and save generated meeting minutes and translations.

[0271] A "coordination tool" is a system that takes into account various conditions among participants and efficiently adjusts the schedule of local events.

[0272] An "alarm system" is a mechanism for issuing a warning to the user when an abnormality is detected by the monitoring system.

[0273] Modes for carrying out the invention

[0274] This invention aims to build a system to support the operation of local organizations and to achieve efficient and effective operation. This system integrates multiple digital means to manage, analyze, and present the information necessary for operation.

[0275] The server stores data related to the operation of the local organization in an SQL database. This database contains member information, meeting minutes, schedules for local events, and activity data for senior citizens. The server uses the Google Cloud Speech-to-Text API to convert audio data sent from devices during meetings into text data. This text data is recorded as meeting minutes and can be reviewed and modified by users later.

[0276] Furthermore, the server can use OpenAI's generative AI model to perform multilingual translation of text data. This translation function facilitates smoother communication among members who speak different languages.

[0277] The user terminal functions as an access interface to the server, allowing users to easily input and retrieve data. Users enter the desired event dates into the terminal, and the system proposes the optimal schedule. This enables efficient event management that avoids scheduling conflicts among participants.

[0278] In elderly monitoring services, a server monitors activity levels through various sensors and GPS devices, and sends an alert to the terminal if an anomaly is detected. Users can take quick action based on these alerts, ensuring the safety of the elderly.

[0279] As a specific example, the annual general meeting of a local organization can be cited. At the start of the meeting, the user collects audio using a terminal, and the server immediately converts the audio into text. Also, when there are members who speak different languages, the server performs translation and responds in real time during the meeting. Through this cooperation, all participants can make decisions based on accurate information.

[0280] Example of a prompt sentence:

[0281] "In this system, please convert the audio data collected during the meeting of the local organization into text and generate minutes of the meeting so that participants can easily understand. Also, if there are members who speak multiple languages, please perform translation. Please display the results so that they can be confirmed."

[0282] In this way, this system integrates various functions necessary for the operation of local organizations and contributes to community activation and smooth operation.

[0283] The flow of the specific process in Example 1 will be described using FIG. 11.

[0284] Step 1:

[0285] The server uses an SQL database to store the operation information of the local organization. The user can input member information and event schedules into the database through the terminal. This input data is directly saved in digital format and can be searched and updated as needed. The stored data becomes the basic information to be used in subsequent processing steps.

[0286] Step 2:

[0287] The user starts a meeting and collects audio data in real time on their device. The collected audio data is sent from the device to the server. The server uses the Google Cloud Speech-to-Text API to convert the audio data into text data. The input to this conversion process is an audio file, and the output is transcribed text. The generated text is recorded in a database as meeting minutes.

[0288] Step 3:

[0289] Users can view meeting minutes in the database and request translations as needed. The server utilizes OpenAI's generative AI model to translate text data into multiple languages. The text received as input is translated into the selected language, and the result is provided to the user. This prompt text is passed to the generative AI to improve translation accuracy.

[0290] Step 4:

[0291] Users enter their desired dates for local events from their devices. This input is analyzed by the server along with existing event information. An optimal schedule is then formulated using a coordination mechanism. When determining the schedule, adjustments are made based on the entered desired dates and existing event schedule data to check for any conflicts. The determined schedule is then notified to the user.

[0292] Step 5:

[0293] The server monitors the activity levels of elderly individuals and collects data in real time. Specifically, it analyzes input from GPS devices and various sensors, and issues an alarm if an anomaly is detected. This immediately sends an alert to the user's terminal, ensuring the safety of the elderly. In this process, the monitoring system identifies deviations from normal behavior patterns based on various activity data input.

[0294] (Application Example 1)

[0295] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0296] In the operation of local community organizations, challenges include inefficient information sharing, communication difficulties due to language barriers, and ensuring the safety of the elderly. Furthermore, there is a need to collect and share information in real time and promote cooperation among residents.

[0297] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0298] In this invention, the server includes data acquisition means for collecting audio materials in real time, information analysis means for coordinating the schedule of community activities, and dynamic monitoring means for monitoring the behavior of elderly individuals. This enables real-time information collection and sharing, facilitating efficient information exchange among community residents and ensuring the safety of the elderly.

[0299] A "community group" is a collection of residents or organizations that share a common purpose and engage in activities within a specific region.

[0300] "Business information" refers to all information, including records and data related to the operation of local organizations.

[0301] "Data management means" refers to a device or method for systematically storing, organizing, and retrieving information.

[0302] "Audio materials" refer to audio data collected at meetings, events, and other similar occasions.

[0303] "Textual data" refers to data obtained by converting audio data into text.

[0304] "Data conversion means" refers to a device or method for converting audio material into text material.

[0305] The "language conversion means" is a device or method for translating written materials into different languages.

[0306] The "information analysis means" is a device or method for analyzing the collected data and generating information to support decision-making.

[0307] The "elderly individual" refers to an individual belonging to the age group regarded as the elderly.

[0308] The "dynamic monitoring means" is a device or method for monitoring an individual's actions and position and detecting abnormalities.

[0309] The "information presentation means" is a device or method for presenting the analyzed information to the user in an easy-to-understand manner.

[0310] The "data acquisition means" is a device or method for collecting information such as voice and text.

[0311] The "information sharing means" is a device or method for providing the collected and analyzed information to multiple users.

[0312] The system for implementing this invention is designed to support the operation of local organizations. Specifically, the server collects voice materials and uses real-time data acquisition means. The voice materials are converted into written materials by the data conversion means. These written materials are translated into multiple languages using the language conversion means. The translated information is presented to the user by the information presentation means.

[0313] Also, the server has information analysis means for analyzing and adjusting the schedule of local activities. Furthermore, for the safety management of the elderly, the dynamic monitoring means monitors the actions of elderly individuals and has the role of issuing an alarm when an abnormality is detected. This enables the safety of the elderly and the efficiency of local activities.

[0314] User terminals are configured to receive information provided by the server anytime, anywhere. The terminals utilize information sharing tools to share translated text materials and activity schedules with all residents of the community. This sharing function allows members of community organizations to access information smoothly, without being restricted by specific languages ​​or locations.

[0315] As a concrete example, in a certain area, there are regular cleaning activities, and audio guides and schedules are provided through an app. Residents can access the system through this app and easily check activity information. Furthermore, it is possible to utilize the information in the form of prompts generated by a generative AI model, such as, "An audio guide is needed for the local cleaning activity. Please use the app to gather participants, record the activity, translate and share it." In this way, this invention can dramatically improve the operational efficiency of community organizations.

[0316] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0317] Step 1:

[0318] The server collects audio data in real time. Users record audio from meetings and events using their devices and send it to the server. Based on this input, the data acquisition system records the audio data in digital format. This data is temporarily stored for use in subsequent processing.

[0319] Step 2:

[0320] The server converts audio data into text data. Using data conversion means, it applies a speech recognition algorithm to convert the audio data into text data. This conversion makes the audio information available in a format that users can review and edit.

[0321] Step 3:

[0322] The server translates textual materials into multiple languages. The resulting text data is then translated into the specified target language using language conversion tools. A generative AI model assists in this process, streamlining the translation. The translation results make the information available to users in different languages.

[0323] Step 4:

[0324] The terminal displays and shares translated information with the user. The translation results are displayed on the user's screen using an information presentation tool. Furthermore, the translated data is shared with other members of the local organization using an information sharing tool. This process removes language barriers and facilitates the smooth flow of information.

[0325] Step 5:

[0326] The server adjusts the schedule for local events. It analyzes the dates entered by users and uses information analysis tools to determine the optimal dates for participation. Participant schedule information is used as input, and the adjusted schedule is output as the analysis result.

[0327] Step 6:

[0328] The server monitors the movements of elderly individuals and detects any anomalies. A motion monitoring system receives GPS data as input and monitors their range of movement. It is configured to issue an alarm when an anomaly is detected. This operation ensures the safety of elderly individuals.

[0329] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0330] This invention is a system for streamlining the operation of local organizations and improving the user experience, and by incorporating an emotion engine, it enables the provision of services that respond to the emotions of users. This system comprises a storage means, a conversion means, a translation means, an analysis means, a monitoring means, a display means, and an emotion engine.

[0331] First, the server comprehensively manages information for the operation of local organizations. This information is stored in a database using storage devices and includes membership information, meeting minutes, schedules, and monitoring information for the elderly. User terminals provide an interface for accessing this database, allowing users to view information and enter new information as needed.

[0332] During meetings and events, users collect audio data via their devices, and the server converts the audio data into text using a conversion device. If translation is required, the server's translation device translates the text data into multiple languages ​​and provides it to the user. Furthermore, the server uses an analysis device to coordinate the schedule of local events, proposes the optimal date, and reflects it in a display device that allows the user to operate intuitively.

[0333] Furthermore, by using an emotion engine, the system can recognize emotions from the user's voice and text. For example, if tension or stress is detected during a meeting, the server generates a notification and sends a message to the user via a display device encouraging relaxation. In addition, in the elderly monitoring function, if an unusual emotional state is detected, an alert is quickly issued and relevant parties are notified.

[0334] As a concrete example, consider a local board meeting. Users collect opinions from the meeting using their devices and send them to a server. The server transcribes the audio into text and then translates it into the required language. An emotion engine analyzes the atmosphere of the meeting, and if it determines that participants are experiencing stress, it informs the resource manager and provides information to support better communication.

[0335] Thus, the system provided by the present invention significantly improves the operational efficiency of local organizations, enhances the user experience through emotion recognition, and helps maintain harmony throughout the community.

[0336] The following describes the processing flow.

[0337] Step 1:

[0338] The user launches the application on their device and presses the start meeting button. The device begins collecting audio data and sends it to the server in real time.

[0339] Step 2:

[0340] The server uses a conversion mechanism to convert the received audio data into text data. This text data is then stored in a database as meeting minutes.

[0341] Step 3:

[0342] The server uses an emotion engine to analyze the user's emotions from the transmitted voice and text data. Based on the analysis, if specific emotions such as tension or stress are detected, it generates support messages in real time.

[0343] Step 4:

[0344] When a user requests a translation, they select the desired language on their device and send the request to the server. The server uses a translation tool to translate the text data into the specified language and saves the result to the database.

[0345] Step 5:

[0346] The server uses analytical tools to coordinate the schedule of local events. Users input multiple candidate dates from their terminals, and the server calculates the optimal date based on these inputs and records it in the database.

[0347] Step 6:

[0348] Through the terminal, users can view meeting minutes and translation results provided by the server. Furthermore, an interface incorporating sentiment analysis results allows users to receive feedback tailored to their own stress levels.

[0349] Step 7:

[0350] In monitoring elderly individuals, the server uses monitoring tools to continuously analyze the user's behavior and emotional state. If abnormal emotions are detected, an alert is promptly sent to the relevant administrator's terminal, prompting them to take appropriate action.

[0351] (Example 2)

[0352] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0353] Managing diverse information and providing multilingual support are essential for community organizations. Furthermore, it's crucial to accurately understand participants' emotional states and facilitate smooth communication. However, comprehensive systems for efficiently achieving these goals are limited, making operational efficiency and participant satisfaction challenging.

[0354] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0355] In this invention, the server includes a storage means for storing operational information of a regional organization, a conversion means for converting audio information into text information, and a language conversion means for translating the text information into other languages. This enables the management of diverse information.

[0356] A "regional organization" is an organization or group that exists geographically or functionally within a specific area.

[0357] "Operational information" refers to all data and records related to the activities and management of an organization.

[0358] "Memory device" refers to a device or technology for storing information in digital format.

[0359] "Audio information" refers to recorded data that includes language and sounds.

[0360] "Textual information" refers to data expressed in text format.

[0361] "Conversion means" refers to a device or technology for converting information in one format into another format.

[0362] "Language conversion means" refers to a device or technology for translating information written in one language into another language.

[0363] "Analytical means" refers to devices or techniques for analyzing data and extracting specific conclusions or information.

[0364] "Surveillance means" refers to devices or technologies used to continuously observe individuals or situations.

[0365] "Emotion identification means" refers to a device or technology for detecting and identifying emotions from information.

[0366] "Notification generation means" refers to a device or technology for generating notifications based on specific information or conditions.

[0367] "Display means" refers to a device or technology for visually presenting information.

[0368] This system aims to streamline the operation of local organizations and improve communication by understanding participants' emotions. The system has a structure that combines multiple technological means.

[0369] The server plays a central role in information management and data processing, and operates on specific hardware. In particular, it uses high-capacity hard disk drives or solid-state drives as storage media for data. To convert audio information into text, the server employs speech recognition software. For multilingual support, a machine translation system is installed on the server to perform translation processing in real time. For emotion identification, a sentiment analysis engine utilizing a generative AI model is incorporated, allowing it to determine emotions from audio and text. This engine detects the user's stress level and level of comfort, generates notifications as needed, and communicates them to the user via the terminal.

[0370] User terminals can be smartphones, tablets, or personal computers, which provide the user interface. The terminals communicate with the server via an internet connection, sending and receiving data in real time. Furthermore, the display methods on the terminals utilize HTML, CSS, and JavaScript to create an intuitive interface and enhance usability.

[0371] As a concrete example, consider a scenario in a local board meeting where a user transmits the meeting audio to a server using their device. This audio is transcribed by the server and then translated into multiple languages. An emotion recognition engine analyzes the stress detected during the meeting, and the server provides feedback to the user through a notification generation mechanism. This feedback includes content designed to promote relaxation.

[0372] An example of a prompt message might be, "Detect the stress levels of all participants during the meeting and suggest appropriate actions as needed." This would not only improve the operational efficiency of local organizations but also support harmony within the community as a whole.

[0373] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0374] Step 1:

[0375] Users collect audio data from meetings and events using devices such as smartphones and tablets. They record the audio clearly using the device's microphone and send the data to a server. The input is audio data, and the output is data transmission to the server. This process digitizes the content of meetings.

[0376] Step 2:

[0377] The server converts the received audio data into text data using a conversion mechanism. This process utilizes speech recognition software to convert the audio data into text format. The input is audio data, and the output is text data. As a result, the audio is saved as text and used for subsequent processing.

[0378] Step 3:

[0379] If necessary, the server translates the converted text data into multiple languages ​​using translation tools. A machine translation system is used to convert the text data into the specified languages. The input is text data, and the output is text data in multiple languages. This allows information to be shared with users who speak different languages.

[0380] Step 4:

[0381] The server identifies emotions from text data using emotion recognition methods based on a generative AI model. The input is translated or untranslated text data, and the output is the emotion analysis result. The server analyzes emotions from the context of the text and detects tension and stress.

[0382] Step 5:

[0383] The server generates relaxation-promoting notifications based on identified emotions and presents them to the user using a notification generation mechanism. The input is the emotion analysis result, and the output is the notification message. This allows the user to receive feedback on their own and other participants' emotional states.

[0384] Step 6:

[0385] The user terminal delivers notifications from the server to the user using a display mechanism. The notification content is visually presented on the display screen. The input is the notification message, and the output is the user's confirmation of the display. Through this operation, the user receives specific suggestions and warnings.

[0386] (Application Example 2)

[0387] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0388] In the environments where elderly people live, there is a need to monitor not only their physical condition but also their emotional state in real time, and to take swift and appropriate action when abnormalities are detected. Furthermore, there is a challenge in the lack of information that allows caregivers to efficiently provide care to the elderly and to take approaches tailored to each individual's situation.

[0389] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0390] In this invention, the server includes a storage means for storing operational information of local organizations, a conversion means for converting audio data into text data, a translation means for translating into multiple languages, an analysis means for adjusting the schedule of local events, a monitoring means for monitoring the movements of elderly people, an emotion analysis means for analyzing the emotions of users and notifying relevant parties if an abnormality is detected, and a notification generation means for generating appropriate notifications in the care environment and prompting relevant parties. This makes it possible to continuously monitor the physical and emotional condition of elderly people in care facilities and respond quickly.

[0391] A "memory device" refers to a device or system that stores operational information and data of a local organization and saves it in a format that can be used as needed.

[0392] A "conversion means" is a device or system that performs the process of converting data acquired as audio into text information.

[0393] "Translation means" refers to a device or system that has the function of converting the translated text into another language.

[0394] "Analysis means" refers to a device or system that examines and analyzes information to create an optimal schedule or plan.

[0395] "Monitoring measures" refer to devices or systems that observe the behavior and circumstances of elderly people and detect any abnormalities.

[0396] "Display means" refers to a device or system for visually presenting obtained information to the user.

[0397] An "emotional analysis tool" is a device or system that analyzes a user's emotions from their voice and behavior and responds accordingly.

[0398] A "notification generation means" is a device or system that has the function of constructing messages for sending necessary information or warnings to users or caregivers.

[0399] The system of this invention consists of a server and a user terminal. The server has a storage means for centrally managing the operational information of local organizations, which includes membership information, meeting minutes, schedules, and information on the status of elderly members. The user terminal provides the user with an interface for accessing this information, enabling them to check and input new information.

[0400] The server includes a conversion mechanism that uses speech recognition APIs (e.g., Google Cloud Speech-to-Text) to convert audio data into text data. If the text data needs to be translated into multiple languages, this is done using translation tools such as the Google Cloud Translation API. For analysis, it uses algorithms to optimize scheduling of meetings and local events, proposing the most suitable schedule.

[0401] In an example implemented in a nursing care facility, the server analyzes the voice and behavior of elderly residents using an emotion analysis library (e.g., IBM Watson Natural Language Understanding) to understand their emotional state in real time. If an anomaly is detected through the monitoring system, an alert is quickly sent to the relevant parties.

[0402] Furthermore, the notification generation system constructs messages to inform users and caregivers of emotional states or abnormalities, and provides these messages to caregivers via smartphone or tablet displays. Here, a generation AI model uses prompts to create appropriate messages. For example, if a user shows signs of anxiety, a message such as, "The user appears to be emotionally unstable. Please speak to them gently to reassure them," might be displayed to the caregiver.

[0403] An example of a prompt message for the generating AI model would be: "Analyze the user's voice data and determine their emotional state. If stress or anxiety is detected, generate an appropriate message to notify the caregiver."

[0404] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0405] Step 1:

[0406] The user collects voice recordings of elderly individuals using a device. The voice data is then sent to a server. The input is the user's voice data, and the output is the voice data that has been transferred to the server.

[0407] Step 2:

[0408] The server receives audio data and converts it to text data using a speech recognition API. The input is the received audio data, and the output is the transcribed audio data. The audio is analyzed using a speech recognition API (e.g., Google Cloud Speech-to-Text) and the corresponding text is generated.

[0409] Step 3:

[0410] The server uses an emotion analysis library (e.g., IBM Watson Natural Language Understanding) to analyze the user's emotional state from text data. The input is transcribed audio data, and the output is the analysis result indicating the user's emotional state. The text is analyzed to determine emotions such as positive or negative.

[0411] Step 4:

[0412] The server, as needed, uses the analysis results to create an appropriate notification message using a generative AI model. The input is the analysis result of the emotional state, and the output is the message to be provided to the caregiver. Prompt sentences are input to the generative AI model to construct an appropriate message based on the emotional state.

[0413] Step 5:

[0414] The server uses a notification generation mechanism to send a notification message to the caregiver's terminal. The input is the generated notification message, and the output is the message displayed on the caregiver's terminal. The message is sent to the terminal via the network to inform the caregiver of the situation.

[0415] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0416] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0417] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0418] [Third Embodiment]

[0419] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0420] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0421] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0422] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0423] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0424] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0425] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0426] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0427] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0428] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0429] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0430] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0431] This invention is a system designed to support the operation of local organizations, aiming to improve operational efficiency and revitalize communities by integrating and providing multiple functions. The system comprises multiple means, each playing a specific role.

[0432] First, the server comprehensively manages information related to the operation of the local organization. It has a database as a storage method, which stores member information, meeting records, schedules of local events, and information on monitoring elderly residents. User terminals provide an interface to access this database, allowing users to retrieve necessary information and input new information.

[0433] Audio data from meetings is collected using terminals and transmitted to a server in real time. The server uses a conversion device to convert the audio data into text data and records it. The generated meeting minutes can then be reviewed and edited by the user.

[0434] Furthermore, users can request translations of meeting minutes and other related documents into other languages. The server uses translation tools and generative AI to perform multilingual translations. The translation results are then displayed to the user in a usable format.

[0435] For coordinating local events, users input their preferred dates on a terminal, and the server uses analysis tools to adjust participants' schedules and determine the optimal schedule. This facilitates smooth communication among participants and enables efficient scheduling.

[0436] Furthermore, as part of the elderly monitoring service, the server monitors the elderly person's activity status using monitoring devices. Here, GPS information and activity data are utilized, and if an abnormality is detected, an alert is sent to the user's terminal.

[0437] As a concrete example, consider an annual general meeting of a local community organization. In this case, the user starts the meeting, and the device collects the audio. The server converts it to text and saves it as meeting minutes. If there are members speaking different languages, the server performs the necessary translations. After the meeting, the user reviews the translated minutes and shares them with other participants from their device. This collaboration enables all stakeholders to make quick decisions based on accurate information.

[0438] Thus, the system provided by the present invention efficiently and effectively contributes to the diverse operational needs of local organizations, supporting the revitalization and sustainable development of local communities.

[0439] The following describes the processing flow.

[0440] Step 1:

[0441] The user launches the application on their device and presses the "Start Meeting" button. The device begins collecting audio data and sends it to the server in real time.

[0442] Step 2:

[0443] The server converts the received audio data into text data using a speech recognition API. This text data is then stored in a database as meeting minutes.

[0444] Step 3:

[0445] The user operates the terminal to review the generated meeting minutes. If necessary, the user can modify the contents of the meeting minutes.

[0446] Step 4:

[0447] When a user requests multilingual translation, they press the translate button on their device and specify the required languages. The device then sends this request to the server.

[0448] Step 5:

[0449] The server uses AI-generated text to translate the meeting minutes into the specified language. The translated text is stored in a database, and a notification is sent to the user when the translation is complete.

[0450] Step 6:

[0451] Users can view translated meeting minutes on their devices. They can also use their devices to download translated meeting minutes as needed, or share them with other members.

[0452] Step 7:

[0453] When a user is coordinating the schedule for a local event, they enter potential dates on their device and send them to the server. The server then aggregates the participants' suggested dates and analyzes to determine the optimal schedule.

[0454] Step 8:

[0455] The server analyzes the optimal schedule and records it in a database, then notifies all participants of the decided schedule. The terminal then displays the schedule in a format that is easy for the user to understand.

[0456] Step 9:

[0457] In the elderly monitoring service, the server periodically monitors the activity data of the elderly person. If an anomaly is detected, the server immediately sends an alert to the registered contact. The user receives the alert on their device and can take the necessary action.

[0458] (Example 1)

[0459] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0460] The operation of local community organizations requires managing diverse information, facilitating communication among members who speak different languages, effectively coordinating event schedules, and ensuring the safe monitoring of the elderly. However, managing these tasks individually using conventional methods is inefficient and consumes a great deal of effort and time. This invention aims to efficiently meet these multiple needs with a single system and support the operation of local communities.

[0461] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0462] In this invention, the server includes a storage means for storing data related to the operation of a local organization, a conversion means for converting audio data into text information, and a translation means for translating into multiple languages. This enables centralized management of various types of information of the local organization, promotion of real-time communication, and support for understanding among participants through multilingual support.

[0463] A "memory device" is a device used to centrally store all data related to the operation of a local organization, for later reference or updating.

[0464] "Conversion method" refers to the process or technology used to convert audio data collected at meetings and other events into analyzable text information.

[0465] "Translation methods" refer to technologies that use generative AI models to convert text information into multiple languages, enabling mutual understanding between languages.

[0466] "Analysis means" refers to a device or program that has the function of calculating the optimal event schedule based on input date information or existing schedule data.

[0467] "Monitoring means" refers to devices or systems for observing the activity status of elderly people in real time and detecting abnormalities necessary to ensure their safety.

[0468] "Display means" refers to devices or methods that present information to users visually or audibly and enable interaction.

[0469] "Editing methods" refer to the processes and functions used to allow users to easily modify, supplement, and save generated meeting minutes and translations.

[0470] A "coordination tool" is a system that takes into account various conditions among participants and efficiently adjusts the schedule of local events.

[0471] An "alarm system" is a mechanism for issuing a warning to the user when an abnormality is detected by the monitoring system.

[0472] Modes for carrying out the invention

[0473] This invention aims to build a system to support the operation of local organizations and to achieve efficient and effective operation. This system integrates multiple digital means to manage, analyze, and present the information necessary for operation.

[0474] The server stores data related to the operation of the local organization in an SQL database. This database contains member information, meeting minutes, schedules for local events, and activity data for senior citizens. The server uses the Google Cloud Speech-to-Text API to convert audio data sent from devices during meetings into text data. This text data is recorded as meeting minutes and can be reviewed and modified by users later.

[0475] Furthermore, the server can use OpenAI's generative AI model to perform multilingual translation of text data. This translation function facilitates smoother communication among members who speak different languages.

[0476] The user terminal functions as an access interface to the server, allowing users to easily input and retrieve data. Users enter the desired event dates into the terminal, and the system proposes the optimal schedule. This enables efficient event management that avoids scheduling conflicts among participants.

[0477] In elderly monitoring services, a server monitors activity levels through various sensors and GPS devices, and sends an alert to the terminal if an anomaly is detected. Users can take quick action based on these alerts, ensuring the safety of the elderly.

[0478] A concrete example is the annual general meeting of a local community organization. At the start of the meeting, the user collects audio via their device, and the server instantly transcribes the audio into text. Furthermore, if there are members speaking different languages, the server performs translation and provides real-time support during the meeting. This collaboration allows all participants to make decisions based on accurate information.

[0479] Example of a prompt:

[0480] "This system should convert audio data collected during community group meetings into text and generate meeting minutes that are easy for participants to understand. It should also translate the minutes if there are members with multiple languages. The results should be displayed for easy review."

[0481] In this way, this system integrates a variety of functions necessary for the operation of local organizations, contributing to community revitalization and smooth management.

[0482] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0483] Step 1:

[0484] The server uses an SQL database to store operational information for local organizations. Users can input member information and event schedules into the database via their terminals. This input data is stored directly in digital format and can be searched and updated as needed. The stored data serves as foundational information used in subsequent processing steps.

[0485] Step 2:

[0486] The user starts a meeting and collects audio data in real time on their device. The collected audio data is sent from the device to the server. The server uses the Google Cloud Speech-to-Text API to convert the audio data into text data. The input to this conversion process is an audio file, and the output is transcribed text. The generated text is recorded in a database as meeting minutes.

[0487] Step 3:

[0488] Users can view meeting minutes in the database and request translations as needed. The server utilizes OpenAI's generative AI model to translate text data into multiple languages. The text received as input is translated into the selected language, and the result is provided to the user. This prompt text is passed to the generative AI to improve translation accuracy.

[0489] Step 4:

[0490] Users enter their desired dates for local events from their devices. This input is analyzed by the server along with existing event information. An optimal schedule is then formulated using a coordination mechanism. When determining the schedule, adjustments are made based on the entered desired dates and existing event schedule data to check for any conflicts. The determined schedule is then notified to the user.

[0491] Step 5:

[0492] The server monitors the activity levels of elderly individuals and collects data in real time. Specifically, it analyzes input from GPS devices and various sensors, and issues an alarm if an anomaly is detected. This immediately sends an alert to the user's terminal, ensuring the safety of the elderly. In this process, the monitoring system identifies deviations from normal behavior patterns based on various activity data input.

[0493] (Application Example 1)

[0494] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0495] In the operation of local community organizations, challenges include inefficient information sharing, communication difficulties due to language barriers, and ensuring the safety of the elderly. Furthermore, there is a need to collect and share information in real time and promote cooperation among residents.

[0496] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0497] In this invention, the server includes data acquisition means for collecting audio materials in real time, information analysis means for coordinating the schedule of community activities, and dynamic monitoring means for monitoring the behavior of elderly individuals. This enables real-time information collection and sharing, facilitating efficient information exchange among community residents and ensuring the safety of the elderly.

[0498] A "community group" is a collection of residents or organizations that share a common purpose and engage in activities within a specific region.

[0499] "Business information" refers to all information, including records and data related to the operation of local organizations.

[0500] "Data management means" refers to a device or method for systematically storing, organizing, and retrieving information.

[0501] "Audio materials" refer to audio data collected at meetings, events, and other similar occasions.

[0502] "Textual data" refers to data obtained by converting audio data into text.

[0503] "Data conversion means" refers to a device or method for converting audio material into text material.

[0504] "Language conversion means" refers to a device or method for translating written material into a different language.

[0505] "Information analysis means" refers to a device or method that analyzes collected data and generates information to support decision-making.

[0506] An "elderly individual" refers to an individual belonging to the age group considered to be elderly.

[0507] "Dynamic monitoring means" refers to a device or method for monitoring an individual's actions and location and detecting abnormalities.

[0508] "Information presentation means" refers to a device or method for displaying analyzed information in an easily understandable manner to the user.

[0509] "Data acquisition means" refers to a device or method for collecting information such as audio or text.

[0510] "Information sharing means" refers to a device or method for providing collected and analyzed information to multiple users.

[0511] The system implementing this invention is designed to support the operation of local organizations. Specifically, a server collects audio materials using real-time data acquisition means. The audio materials are converted into text materials by data conversion means. This text material is translated into multiple languages ​​using language conversion means. The translated information is presented to the user by information presentation means.

[0512] Furthermore, the server is equipped with information analysis tools to analyze and adjust the schedule of community activities. In addition, for the safety management of the elderly, a dynamic monitoring system monitors the actions of individual elderly people and issues an alarm if an abnormality is detected. This enables both the safety of the elderly and the efficiency of community activities.

[0513] User terminals are configured to receive information provided by the server anytime, anywhere. The terminals utilize information sharing tools to share translated text materials and activity schedules with all residents of the community. This sharing function allows members of community organizations to access information smoothly, without being restricted by specific languages ​​or locations.

[0514] As a concrete example, in a certain area, there are regular cleaning activities, and audio guides and schedules are provided through an app. Residents can access the system through this app and easily check activity information. Furthermore, it is possible to utilize the information in the form of prompts generated by a generative AI model, such as, "An audio guide is needed for the local cleaning activity. Please use the app to gather participants, record the activity, translate and share it." In this way, this invention can dramatically improve the operational efficiency of community organizations.

[0515] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0516] Step 1:

[0517] The server collects audio data in real time. Users record audio from meetings and events using their devices and send it to the server. Based on this input, the data acquisition system records the audio data in digital format. This data is temporarily stored for use in subsequent processing.

[0518] Step 2:

[0519] The server converts audio data into text data. Using data conversion means, it applies a speech recognition algorithm to convert the audio data into text data. This conversion makes the audio information available in a format that users can review and edit.

[0520] Step 3:

[0521] The server translates textual materials into multiple languages. The resulting text data is then translated into the specified target language using language conversion tools. A generative AI model assists in this process, streamlining the translation. The translation results make the information available to users in different languages.

[0522] Step 4:

[0523] The terminal displays and shares translated information with the user. The translation results are displayed on the user's screen using an information presentation tool. Furthermore, the translated data is shared with other members of the local organization using an information sharing tool. This process removes language barriers and facilitates the smooth flow of information.

[0524] Step 5:

[0525] The server adjusts the schedule for local events. It analyzes the dates entered by users and uses information analysis tools to determine the optimal dates for participation. Participant schedule information is used as input, and the adjusted schedule is output as the analysis result.

[0526] Step 6:

[0527] The server monitors the movements of elderly individuals and detects any anomalies. A motion monitoring system receives GPS data as input and monitors their range of movement. It is configured to issue an alarm when an anomaly is detected. This operation ensures the safety of elderly individuals.

[0528] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0529] This invention is a system for streamlining the operation of local organizations and improving the user experience, and by incorporating an emotion engine, it enables the provision of services that respond to the emotions of users. This system comprises a storage means, a conversion means, a translation means, an analysis means, a monitoring means, a display means, and an emotion engine.

[0530] First, the server comprehensively manages information for the operation of local organizations. This information is stored in a database using storage devices and includes membership information, meeting minutes, schedules, and monitoring information for the elderly. User terminals provide an interface for accessing this database, allowing users to view information and enter new information as needed.

[0531] During meetings and events, users collect audio data via their devices, and the server converts the audio data into text using a conversion device. If translation is required, the server's translation device translates the text data into multiple languages ​​and provides it to the user. Furthermore, the server uses an analysis device to coordinate the schedule of local events, proposes the optimal date, and reflects it in a display device that allows the user to operate intuitively.

[0532] Furthermore, by using an emotion engine, the system can recognize emotions from the user's voice and text. For example, if tension or stress is detected during a meeting, the server generates a notification and sends a message to the user via a display device encouraging relaxation. In addition, in the elderly monitoring function, if an unusual emotional state is detected, an alert is quickly issued and relevant parties are notified.

[0533] As a concrete example, consider a local board meeting. Users collect opinions from the meeting using their devices and send them to a server. The server transcribes the audio into text and then translates it into the required language. An emotion engine analyzes the atmosphere of the meeting, and if it determines that participants are experiencing stress, it informs the resource manager and provides information to support better communication.

[0534] Thus, the system provided by the present invention significantly improves the operational efficiency of local organizations, enhances the user experience through emotion recognition, and helps maintain harmony throughout the community.

[0535] The following describes the processing flow.

[0536] Step 1:

[0537] The user launches the application on their device and presses the start meeting button. The device begins collecting audio data and sends it to the server in real time.

[0538] Step 2:

[0539] The server uses a conversion mechanism to convert the received audio data into text data. This text data is then stored in a database as meeting minutes.

[0540] Step 3:

[0541] The server uses an emotion engine to analyze the user's emotions from the transmitted voice and text data. Based on the analysis, if specific emotions such as tension or stress are detected, it generates support messages in real time.

[0542] Step 4:

[0543] When a user requests a translation, they select the desired language on their device and send the request to the server. The server uses a translation tool to translate the text data into the specified language and saves the result to the database.

[0544] Step 5:

[0545] The server uses analytical tools to coordinate the schedule of local events. Users input multiple candidate dates from their terminals, and the server calculates the optimal date based on these inputs and records it in the database.

[0546] Step 6:

[0547] Through the terminal, users can view meeting minutes and translation results provided by the server. Furthermore, an interface incorporating sentiment analysis results allows users to receive feedback tailored to their own stress levels.

[0548] Step 7:

[0549] In monitoring elderly individuals, the server uses monitoring tools to continuously analyze the user's behavior and emotional state. If abnormal emotions are detected, an alert is promptly sent to the relevant administrator's terminal, prompting them to take appropriate action.

[0550] (Example 2)

[0551] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0552] Managing diverse information and providing multilingual support are essential for community organizations. Furthermore, it's crucial to accurately understand participants' emotional states and facilitate smooth communication. However, comprehensive systems for efficiently achieving these goals are limited, making operational efficiency and participant satisfaction challenging.

[0553] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0554] In this invention, the server includes a storage means for storing operational information of a regional organization, a conversion means for converting audio information into text information, and a language conversion means for translating the text information into other languages. This enables the management of diverse information.

[0555] A "regional organization" is an organization or group that exists geographically or functionally within a specific area.

[0556] "Operational information" refers to all data and records related to the activities and management of an organization.

[0557] "Memory device" refers to a device or technology for storing information in digital format.

[0558] "Audio information" refers to recorded data that includes language and sounds.

[0559] "Textual information" refers to data expressed in text format.

[0560] "Conversion means" refers to a device or technology for converting information in one format into another format.

[0561] "Language conversion means" refers to a device or technology for translating information written in one language into another language.

[0562] "Analytical means" refers to devices or techniques for analyzing data and extracting specific conclusions or information.

[0563] "Surveillance means" refers to devices or technologies used to continuously observe individuals or situations.

[0564] "Emotion identification means" refers to a device or technology for detecting and identifying emotions from information.

[0565] "Notification generation means" refers to a device or technology for generating notifications based on specific information or conditions.

[0566] "Display means" refers to a device or technology for visually presenting information.

[0567] This system aims to streamline the operation of local organizations and improve communication by understanding participants' emotions. The system has a structure that combines multiple technological means.

[0568] The server plays a central role in information management and data processing, and operates on specific hardware. In particular, it uses high-capacity hard disk drives or solid-state drives as storage media for data. To convert audio information into text, the server employs speech recognition software. For multilingual support, a machine translation system is installed on the server to perform translation processing in real time. For emotion identification, a sentiment analysis engine utilizing a generative AI model is incorporated, allowing it to determine emotions from audio and text. This engine detects the user's stress level and level of comfort, generates notifications as needed, and communicates them to the user via the terminal.

[0569] User terminals can be smartphones, tablets, or personal computers, which provide the user interface. The terminals communicate with the server via an internet connection, sending and receiving data in real time. Furthermore, the display methods on the terminals utilize HTML, CSS, and JavaScript to create an intuitive interface and enhance usability.

[0570] As a concrete example, consider a scenario in a local board meeting where a user transmits the meeting audio to a server using their device. This audio is transcribed by the server and then translated into multiple languages. An emotion recognition engine analyzes the stress detected during the meeting, and the server provides feedback to the user through a notification generation mechanism. This feedback includes content designed to promote relaxation.

[0571] An example of a prompt message might be, "Detect the stress levels of all participants during the meeting and suggest appropriate actions as needed." This would not only improve the operational efficiency of local organizations but also support harmony within the community as a whole.

[0572] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0573] Step 1:

[0574] Users collect audio data from meetings and events using devices such as smartphones and tablets. They record the audio clearly using the device's microphone and send the data to a server. The input is audio data, and the output is data transmission to the server. This process digitizes the content of meetings.

[0575] Step 2:

[0576] The server converts the received audio data into text data using a conversion mechanism. This process utilizes speech recognition software to convert the audio data into text format. The input is audio data, and the output is text data. As a result, the audio is saved as text and used for subsequent processing.

[0577] Step 3:

[0578] If necessary, the server translates the converted text data into multiple languages ​​using translation tools. A machine translation system is used to convert the text data into the specified languages. The input is text data, and the output is text data in multiple languages. This allows information to be shared with users who speak different languages.

[0579] Step 4:

[0580] The server identifies emotions from text data using emotion recognition methods based on a generative AI model. The input is translated or untranslated text data, and the output is the emotion analysis result. The server analyzes emotions from the context of the text and detects tension and stress.

[0581] Step 5:

[0582] The server generates relaxation-promoting notifications based on identified emotions and presents them to the user using a notification generation mechanism. The input is the emotion analysis result, and the output is the notification message. This allows the user to receive feedback on their own and other participants' emotional states.

[0583] Step 6:

[0584] The user terminal delivers notifications from the server to the user using a display mechanism. The notification content is visually presented on the display screen. The input is the notification message, and the output is the user's confirmation of the display. Through this operation, the user receives specific suggestions and warnings.

[0585] (Application Example 2)

[0586] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0587] In the environments where elderly people live, there is a need to monitor not only their physical condition but also their emotional state in real time, and to take swift and appropriate action when abnormalities are detected. Furthermore, there is a challenge in the lack of information that allows caregivers to efficiently provide care to the elderly and to take approaches tailored to each individual's situation.

[0588] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0589] In this invention, the server includes a storage means for storing operational information of local organizations, a conversion means for converting audio data into text data, a translation means for translating into multiple languages, an analysis means for adjusting the schedule of local events, a monitoring means for monitoring the movements of elderly people, an emotion analysis means for analyzing the emotions of users and notifying relevant parties if an abnormality is detected, and a notification generation means for generating appropriate notifications in the care environment and prompting relevant parties. This makes it possible to continuously monitor the physical and emotional condition of elderly people in care facilities and respond quickly.

[0590] A "memory device" refers to a device or system that stores operational information and data of a local organization and saves it in a format that can be used as needed.

[0591] A "conversion means" is a device or system that performs the process of converting data acquired as audio into text information.

[0592] "Translation means" refers to a device or system that has the function of converting the translated text into another language.

[0593] "Analysis means" refers to a device or system that examines and analyzes information to create an optimal schedule or plan.

[0594] "Monitoring measures" refer to devices or systems that observe the behavior and circumstances of elderly people and detect any abnormalities.

[0595] "Display means" refers to a device or system for visually presenting obtained information to the user.

[0596] An "emotional analysis tool" is a device or system that analyzes a user's emotions from their voice and behavior and responds accordingly.

[0597] A "notification generation means" is a device or system that has the function of constructing messages for sending necessary information or warnings to users or caregivers.

[0598] The system of this invention consists of a server and a user terminal. The server has a storage means for centrally managing the operational information of local organizations, which includes membership information, meeting minutes, schedules, and information on the status of elderly members. The user terminal provides the user with an interface for accessing this information, enabling them to check and input new information.

[0599] The server includes a conversion mechanism that uses speech recognition APIs (e.g., Google Cloud Speech-to-Text) to convert audio data into text data. If the text data needs to be translated into multiple languages, this is done using translation tools such as the Google Cloud Translation API. For analysis, it uses algorithms to optimize scheduling of meetings and local events, proposing the most suitable schedule.

[0600] In an example implemented in a nursing care facility, the server analyzes the voice and behavior of elderly residents using an emotion analysis library (e.g., IBM Watson Natural Language Understanding) to understand their emotional state in real time. If an anomaly is detected through the monitoring system, an alert is quickly sent to the relevant parties.

[0601] Furthermore, the notification generation system constructs messages to inform users and caregivers of emotional states or abnormalities, and provides these messages to caregivers via smartphone or tablet displays. Here, a generation AI model uses prompts to create appropriate messages. For example, if a user shows signs of anxiety, a message such as, "The user appears to be emotionally unstable. Please speak to them gently to reassure them," might be displayed to the caregiver.

[0602] An example of a prompt message for the generating AI model would be: "Analyze the user's voice data and determine their emotional state. If stress or anxiety is detected, generate an appropriate message to notify the caregiver."

[0603] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0604] Step 1:

[0605] The user collects voice recordings of elderly individuals using a device. The voice data is then sent to a server. The input is the user's voice data, and the output is the voice data that has been transferred to the server.

[0606] Step 2:

[0607] The server receives audio data and converts it to text data using a speech recognition API. The input is the received audio data, and the output is the transcribed audio data. The audio is analyzed using a speech recognition API (e.g., Google Cloud Speech-to-Text) and the corresponding text is generated.

[0608] Step 3:

[0609] The server uses an emotion analysis library (e.g., IBM Watson Natural Language Understanding) to analyze the user's emotional state from text data. The input is transcribed audio data, and the output is the analysis result indicating the user's emotional state. The text is analyzed to determine emotions such as positive or negative.

[0610] Step 4:

[0611] The server, as needed, uses the analysis results to create an appropriate notification message using a generative AI model. The input is the analysis result of the emotional state, and the output is the message to be provided to the caregiver. Prompt sentences are input to the generative AI model to construct an appropriate message based on the emotional state.

[0612] Step 5:

[0613] The server uses a notification generation mechanism to send a notification message to the caregiver's terminal. The input is the generated notification message, and the output is the message displayed on the caregiver's terminal. The message is sent to the terminal via the network to inform the caregiver of the situation.

[0614] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0615] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0616] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0617] [Fourth Embodiment]

[0618] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0619] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0620] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0621] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0622] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0623] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0624] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0625] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0626] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0627] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0628] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0629] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0630] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0631] This invention is a system designed to support the operation of local organizations, aiming to improve operational efficiency and revitalize communities by integrating and providing multiple functions. The system comprises multiple means, each playing a specific role.

[0632] First, the server comprehensively manages information related to the operation of the local organization. It has a database as a storage method, which stores member information, meeting records, schedules of local events, and information on monitoring elderly residents. User terminals provide an interface to access this database, allowing users to retrieve necessary information and input new information.

[0633] Audio data from meetings is collected using terminals and transmitted to a server in real time. The server uses a conversion device to convert the audio data into text data and records it. The generated meeting minutes can then be reviewed and edited by the user.

[0634] Furthermore, users can request translations of meeting minutes and other related documents into other languages. The server uses translation tools and generative AI to perform multilingual translations. The translation results are then displayed to the user in a usable format.

[0635] For coordinating local events, users input their preferred dates on a terminal, and the server uses analysis tools to adjust participants' schedules and determine the optimal schedule. This facilitates smooth communication among participants and enables efficient scheduling.

[0636] Furthermore, as part of the elderly monitoring service, the server monitors the elderly person's activity status using monitoring devices. Here, GPS information and activity data are utilized, and if an abnormality is detected, an alert is sent to the user's terminal.

[0637] As a concrete example, consider an annual general meeting of a local community organization. In this case, the user starts the meeting, and the device collects the audio. The server converts it to text and saves it as meeting minutes. If there are members speaking different languages, the server performs the necessary translations. After the meeting, the user reviews the translated minutes and shares them with other participants from their device. This collaboration enables all stakeholders to make quick decisions based on accurate information.

[0638] Thus, the system provided by the present invention efficiently and effectively contributes to the diverse operational needs of local organizations, supporting the revitalization and sustainable development of local communities.

[0639] The following describes the processing flow.

[0640] Step 1:

[0641] The user launches the application on their device and presses the "Start Meeting" button. The device begins collecting audio data and sends it to the server in real time.

[0642] Step 2:

[0643] The server converts the received audio data into text data using a speech recognition API. This text data is then stored in a database as meeting minutes.

[0644] Step 3:

[0645] The user operates the terminal to review the generated meeting minutes. If necessary, the user can modify the contents of the meeting minutes.

[0646] Step 4:

[0647] When a user requests multilingual translation, they press the translate button on their device and specify the required languages. The device then sends this request to the server.

[0648] Step 5:

[0649] The server uses AI-generated text to translate the meeting minutes into the specified language. The translated text is stored in a database, and a notification is sent to the user when the translation is complete.

[0650] Step 6:

[0651] Users can view translated meeting minutes on their devices. They can also use their devices to download translated meeting minutes as needed, or share them with other members.

[0652] Step 7:

[0653] When a user is coordinating the schedule for a local event, they enter potential dates on their device and send them to the server. The server then aggregates the participants' suggested dates and analyzes to determine the optimal schedule.

[0654] Step 8:

[0655] The server analyzes the optimal schedule and records it in a database, then notifies all participants of the decided schedule. The terminal then displays the schedule in a format that is easy for the user to understand.

[0656] Step 9:

[0657] In the elderly monitoring service, the server periodically monitors the activity data of the elderly person. If an anomaly is detected, the server immediately sends an alert to the registered contact. The user receives the alert on their device and can take the necessary action.

[0658] (Example 1)

[0659] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0660] The operation of local community organizations requires managing diverse information, facilitating communication among members who speak different languages, effectively coordinating event schedules, and ensuring the safe monitoring of the elderly. However, managing these tasks individually using conventional methods is inefficient and consumes a great deal of effort and time. This invention aims to efficiently meet these multiple needs with a single system and support the operation of local communities.

[0661] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0662] In this invention, the server includes a storage means for storing data related to the operation of a local organization, a conversion means for converting audio data into text information, and a translation means for translating into multiple languages. This enables centralized management of various types of information of the local organization, promotion of real-time communication, and support for understanding among participants through multilingual support.

[0663] A "memory device" is a device used to centrally store all data related to the operation of a local organization, for later reference or updating.

[0664] "Conversion method" refers to the process or technology used to convert audio data collected at meetings and other events into analyzable text information.

[0665] "Translation methods" refer to technologies that use generative AI models to convert text information into multiple languages, enabling mutual understanding between languages.

[0666] "Analysis means" refers to a device or program that has the function of calculating the optimal event schedule based on input date information or existing schedule data.

[0667] "Monitoring means" refers to devices or systems for observing the activity status of elderly people in real time and detecting abnormalities necessary to ensure their safety.

[0668] "Display means" refers to devices or methods that present information to users visually or audibly and enable interaction.

[0669] "Editing methods" refer to the processes and functions used to allow users to easily modify, supplement, and save generated meeting minutes and translations.

[0670] A "coordination tool" is a system that takes into account various conditions among participants and efficiently adjusts the schedule of local events.

[0671] An "alarm system" is a mechanism for issuing a warning to the user when an abnormality is detected by the monitoring system.

[0672] Modes for carrying out the invention

[0673] This invention aims to build a system to support the operation of local organizations and to achieve efficient and effective operation. This system integrates multiple digital means to manage, analyze, and present the information necessary for operation.

[0674] The server stores data related to the operation of the local organization in an SQL database. This database contains member information, meeting minutes, schedules for local events, and activity data for senior citizens. The server uses the Google Cloud Speech-to-Text API to convert audio data sent from devices during meetings into text data. This text data is recorded as meeting minutes and can be reviewed and modified by users later.

[0675] Furthermore, the server can use OpenAI's generative AI model to perform multilingual translation of text data. This translation function facilitates smoother communication among members who speak different languages.

[0676] The user terminal functions as an access interface to the server, allowing users to easily input and retrieve data. Users enter the desired event dates into the terminal, and the system proposes the optimal schedule. This enables efficient event management that avoids scheduling conflicts among participants.

[0677] In elderly monitoring services, a server monitors activity levels through various sensors and GPS devices, and sends an alert to the terminal if an anomaly is detected. Users can take quick action based on these alerts, ensuring the safety of the elderly.

[0678] A concrete example is the annual general meeting of a local community organization. At the start of the meeting, the user collects audio via their device, and the server instantly transcribes the audio into text. Furthermore, if there are members speaking different languages, the server performs translation and provides real-time support during the meeting. This collaboration allows all participants to make decisions based on accurate information.

[0679] Example of a prompt:

[0680] "This system should convert audio data collected during community group meetings into text and generate meeting minutes that are easy for participants to understand. It should also translate the minutes if there are members with multiple languages. The results should be displayed for easy review."

[0681] In this way, this system integrates a variety of functions necessary for the operation of local organizations, contributing to community revitalization and smooth management.

[0682] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0683] Step 1:

[0684] The server uses an SQL database to store operational information for local organizations. Users can input member information and event schedules into the database via their terminals. This input data is stored directly in digital format and can be searched and updated as needed. The stored data serves as foundational information used in subsequent processing steps.

[0685] Step 2:

[0686] The user starts a meeting and collects audio data in real time on their device. The collected audio data is sent from the device to the server. The server uses the Google Cloud Speech-to-Text API to convert the audio data into text data. The input to this conversion process is an audio file, and the output is transcribed text. The generated text is recorded in a database as meeting minutes.

[0687] Step 3:

[0688] Users can view meeting minutes in the database and request translations as needed. The server utilizes OpenAI's generative AI model to translate text data into multiple languages. The text received as input is translated into the selected language, and the result is provided to the user. This prompt text is passed to the generative AI to improve translation accuracy.

[0689] Step 4:

[0690] Users enter their desired dates for local events from their devices. This input is analyzed by the server along with existing event information. An optimal schedule is then formulated using a coordination mechanism. When determining the schedule, adjustments are made based on the entered desired dates and existing event schedule data to check for any conflicts. The determined schedule is then notified to the user.

[0691] Step 5:

[0692] The server monitors the activity levels of elderly individuals and collects data in real time. Specifically, it analyzes input from GPS devices and various sensors, and issues an alarm if an anomaly is detected. This immediately sends an alert to the user's terminal, ensuring the safety of the elderly. In this process, the monitoring system identifies deviations from normal behavior patterns based on various activity data input.

[0693] (Application Example 1)

[0694] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0695] In the operation of local community organizations, challenges include inefficient information sharing, communication difficulties due to language barriers, and ensuring the safety of the elderly. Furthermore, there is a need to collect and share information in real time and promote cooperation among residents.

[0696] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0697] In this invention, the server includes data acquisition means for collecting audio materials in real time, information analysis means for coordinating the schedule of community activities, and dynamic monitoring means for monitoring the behavior of elderly individuals. This enables real-time information collection and sharing, facilitating efficient information exchange among community residents and ensuring the safety of the elderly.

[0698] A "community group" is a collection of residents or organizations that share a common purpose and engage in activities within a specific region.

[0699] "Business information" refers to all information, including records and data related to the operation of local organizations.

[0700] "Data management means" refers to a device or method for systematically storing, organizing, and retrieving information.

[0701] "Audio materials" refer to audio data collected at meetings, events, and other similar occasions.

[0702] "Textual data" refers to data obtained by converting audio data into text.

[0703] "Data conversion means" refers to a device or method for converting audio material into text material.

[0704] "Language conversion means" refers to a device or method for translating written material into a different language.

[0705] "Information analysis means" refers to a device or method that analyzes collected data and generates information to support decision-making.

[0706] An "elderly individual" refers to an individual belonging to the age group considered to be elderly.

[0707] "Dynamic monitoring means" refers to a device or method for monitoring an individual's actions and location and detecting abnormalities.

[0708] "Information presentation means" refers to a device or method for displaying analyzed information in an easily understandable manner to the user.

[0709] "Data acquisition means" refers to a device or method for collecting information such as audio or text.

[0710] "Information sharing means" refers to a device or method for providing collected and analyzed information to multiple users.

[0711] The system implementing this invention is designed to support the operation of local organizations. Specifically, a server collects audio materials using real-time data acquisition means. The audio materials are converted into text materials by data conversion means. This text material is translated into multiple languages ​​using language conversion means. The translated information is presented to the user by information presentation means.

[0712] Furthermore, the server is equipped with information analysis tools to analyze and adjust the schedule of community activities. In addition, for the safety management of the elderly, a dynamic monitoring system monitors the actions of individual elderly people and issues an alarm if an abnormality is detected. This enables both the safety of the elderly and the efficiency of community activities.

[0713] User terminals are configured to receive information provided by the server anytime, anywhere. The terminals utilize information sharing tools to share translated text materials and activity schedules with all residents of the community. This sharing function allows members of community organizations to access information smoothly, without being restricted by specific languages ​​or locations.

[0714] As a concrete example, in a certain area, there are regular cleaning activities, and audio guides and schedules are provided through an app. Residents can access the system through this app and easily check activity information. Furthermore, it is possible to utilize the information in the form of prompts generated by a generative AI model, such as, "An audio guide is needed for the local cleaning activity. Please use the app to gather participants, record the activity, translate and share it." In this way, this invention can dramatically improve the operational efficiency of community organizations.

[0715] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0716] Step 1:

[0717] The server collects audio data in real time. Users record audio from meetings and events using their devices and send it to the server. Based on this input, the data acquisition system records the audio data in digital format. This data is temporarily stored for use in subsequent processing.

[0718] Step 2:

[0719] The server converts audio data into text data. Using data conversion means, it applies a speech recognition algorithm to convert the audio data into text data. This conversion makes the audio information available in a format that users can review and edit.

[0720] Step 3:

[0721] The server translates textual materials into multiple languages. The resulting text data is then translated into the specified target language using language conversion tools. A generative AI model assists in this process, streamlining the translation. The translation results make the information available to users in different languages.

[0722] Step 4:

[0723] The terminal displays and shares translated information with the user. The translation results are displayed on the user's screen using an information presentation tool. Furthermore, the translated data is shared with other members of the local organization using an information sharing tool. This process removes language barriers and facilitates the smooth flow of information.

[0724] Step 5:

[0725] The server adjusts the schedule for local events. It analyzes the dates entered by users and uses information analysis tools to determine the optimal dates for participation. Participant schedule information is used as input, and the adjusted schedule is output as the analysis result.

[0726] Step 6:

[0727] The server monitors the movements of elderly individuals and detects any anomalies. A motion monitoring system receives GPS data as input and monitors their range of movement. It is configured to issue an alarm when an anomaly is detected. This operation ensures the safety of elderly individuals.

[0728] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0729] This invention is a system for streamlining the operation of local organizations and improving the user experience, and by incorporating an emotion engine, it enables the provision of services that respond to the emotions of users. This system comprises a storage means, a conversion means, a translation means, an analysis means, a monitoring means, a display means, and an emotion engine.

[0730] First, the server comprehensively manages information for the operation of local organizations. This information is stored in a database using storage devices and includes membership information, meeting minutes, schedules, and monitoring information for the elderly. User terminals provide an interface for accessing this database, allowing users to view information and enter new information as needed.

[0731] During meetings and events, users collect audio data via their devices, and the server converts the audio data into text using a conversion device. If translation is required, the server's translation device translates the text data into multiple languages ​​and provides it to the user. Furthermore, the server uses an analysis device to coordinate the schedule of local events, proposes the optimal date, and reflects it in a display device that allows the user to operate intuitively.

[0732] Furthermore, by using an emotion engine, the system can recognize emotions from the user's voice and text. For example, if tension or stress is detected during a meeting, the server generates a notification and sends a message to the user via a display device encouraging relaxation. In addition, in the elderly monitoring function, if an unusual emotional state is detected, an alert is quickly issued and relevant parties are notified.

[0733] As a concrete example, consider a local board meeting. Users collect opinions from the meeting using their devices and send them to a server. The server transcribes the audio into text and then translates it into the required language. An emotion engine analyzes the atmosphere of the meeting, and if it determines that participants are experiencing stress, it informs the resource manager and provides information to support better communication.

[0734] Thus, the system provided by the present invention significantly improves the operational efficiency of local organizations, enhances the user experience through emotion recognition, and helps maintain harmony throughout the community.

[0735] The following describes the processing flow.

[0736] Step 1:

[0737] The user launches the application on their device and presses the start meeting button. The device begins collecting audio data and sends it to the server in real time.

[0738] Step 2:

[0739] The server uses a conversion mechanism to convert the received audio data into text data. This text data is then stored in a database as meeting minutes.

[0740] Step 3:

[0741] The server uses an emotion engine to analyze the user's emotions from the transmitted voice and text data. Based on the analysis, if specific emotions such as tension or stress are detected, it generates support messages in real time.

[0742] Step 4:

[0743] When a user requests a translation, they select the desired language on their device and send the request to the server. The server uses a translation tool to translate the text data into the specified language and saves the result to the database.

[0744] Step 5:

[0745] The server uses analytical tools to coordinate the schedule of local events. Users input multiple candidate dates from their terminals, and the server calculates the optimal date based on these inputs and records it in the database.

[0746] Step 6:

[0747] Through the terminal, users can view meeting minutes and translation results provided by the server. Furthermore, an interface incorporating sentiment analysis results allows users to receive feedback tailored to their own stress levels.

[0748] Step 7:

[0749] In monitoring elderly individuals, the server uses monitoring tools to continuously analyze the user's behavior and emotional state. If abnormal emotions are detected, an alert is promptly sent to the relevant administrator's terminal, prompting them to take appropriate action.

[0750] (Example 2)

[0751] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0752] Managing diverse information and providing multilingual support are essential for community organizations. Furthermore, it's crucial to accurately understand participants' emotional states and facilitate smooth communication. However, comprehensive systems for efficiently achieving these goals are limited, making operational efficiency and participant satisfaction challenging.

[0753] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0754] In this invention, the server includes a storage means for storing operational information of a regional organization, a conversion means for converting audio information into text information, and a language conversion means for translating the text information into other languages. This enables the management of diverse information.

[0755] A "regional organization" is an organization or group that exists geographically or functionally within a specific area.

[0756] "Operational information" refers to all data and records related to the activities and management of an organization.

[0757] "Memory device" refers to a device or technology for storing information in digital format.

[0758] "Audio information" refers to recorded data that includes language and sounds.

[0759] "Textual information" refers to data expressed in text format.

[0760] "Conversion means" refers to a device or technology for converting information in one format into another format.

[0761] "Language conversion means" refers to a device or technology for translating information written in one language into another language.

[0762] "Analytical means" refers to devices or techniques for analyzing data and extracting specific conclusions or information.

[0763] "Surveillance means" refers to devices or technologies used to continuously observe individuals or situations.

[0764] "Emotion identification means" refers to a device or technology for detecting and identifying emotions from information.

[0765] "Notification generation means" refers to a device or technology for generating notifications based on specific information or conditions.

[0766] "Display means" refers to a device or technology for visually presenting information.

[0767] This system aims to streamline the operation of local organizations and improve communication by understanding participants' emotions. The system has a structure that combines multiple technological means.

[0768] The server plays a central role in information management and data processing, and operates on specific hardware. In particular, it uses high-capacity hard disk drives or solid-state drives as storage media for data. To convert audio information into text, the server employs speech recognition software. For multilingual support, a machine translation system is installed on the server to perform translation processing in real time. For emotion identification, a sentiment analysis engine utilizing a generative AI model is incorporated, allowing it to determine emotions from audio and text. This engine detects the user's stress level and level of comfort, generates notifications as needed, and communicates them to the user via the terminal.

[0769] User terminals can be smartphones, tablets, or personal computers, which provide the user interface. The terminals communicate with the server via an internet connection, sending and receiving data in real time. Furthermore, the display methods on the terminals utilize HTML, CSS, and JavaScript to create an intuitive interface and enhance usability.

[0770] As a concrete example, consider a scenario in a local board meeting where a user transmits the meeting audio to a server using their device. This audio is transcribed by the server and then translated into multiple languages. An emotion recognition engine analyzes the stress detected during the meeting, and the server provides feedback to the user through a notification generation mechanism. This feedback includes content designed to promote relaxation.

[0771] An example of a prompt message might be, "Detect the stress levels of all participants during the meeting and suggest appropriate actions as needed." This would not only improve the operational efficiency of local organizations but also support harmony within the community as a whole.

[0772] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0773] Step 1:

[0774] Users collect audio data from meetings and events using devices such as smartphones and tablets. They record the audio clearly using the device's microphone and send the data to a server. The input is audio data, and the output is data transmission to the server. This process digitizes the content of meetings.

[0775] Step 2:

[0776] The server converts the received audio data into text data using a conversion mechanism. This process utilizes speech recognition software to convert the audio data into text format. The input is audio data, and the output is text data. As a result, the audio is saved as text and used for subsequent processing.

[0777] Step 3:

[0778] If necessary, the server translates the converted text data into multiple languages ​​using translation tools. A machine translation system is used to convert the text data into the specified languages. The input is text data, and the output is text data in multiple languages. This allows information to be shared with users who speak different languages.

[0779] Step 4:

[0780] The server identifies emotions from text data using emotion recognition methods based on a generative AI model. The input is translated or untranslated text data, and the output is the emotion analysis result. The server analyzes emotions from the context of the text and detects tension and stress.

[0781] Step 5:

[0782] The server generates relaxation-promoting notifications based on identified emotions and presents them to the user using a notification generation mechanism. The input is the emotion analysis result, and the output is the notification message. This allows the user to receive feedback on their own and other participants' emotional states.

[0783] Step 6:

[0784] The user terminal delivers notifications from the server to the user using a display mechanism. The notification content is visually presented on the display screen. The input is the notification message, and the output is the user's confirmation of the display. Through this operation, the user receives specific suggestions and warnings.

[0785] (Application Example 2)

[0786] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0787] In the environments where elderly people live, there is a need to monitor not only their physical condition but also their emotional state in real time, and to take swift and appropriate action when abnormalities are detected. Furthermore, there is a challenge in the lack of information that allows caregivers to efficiently provide care to the elderly and to take approaches tailored to each individual's situation.

[0788] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0789] In this invention, the server includes a storage means for storing operational information of local organizations, a conversion means for converting audio data into text data, a translation means for translating into multiple languages, an analysis means for adjusting the schedule of local events, a monitoring means for monitoring the movements of elderly people, an emotion analysis means for analyzing the emotions of users and notifying relevant parties if an abnormality is detected, and a notification generation means for generating appropriate notifications in the care environment and prompting relevant parties. This makes it possible to continuously monitor the physical and emotional condition of elderly people in care facilities and respond quickly.

[0790] A "memory device" refers to a device or system that stores operational information and data of a local organization and saves it in a format that can be used as needed.

[0791] A "conversion means" is a device or system that performs the process of converting data acquired as audio into text information.

[0792] "Translation means" refers to a device or system that has the function of converting the translated text into another language.

[0793] "Analysis means" refers to a device or system that examines and analyzes information to create an optimal schedule or plan.

[0794] "Monitoring measures" refer to devices or systems that observe the behavior and circumstances of elderly people and detect any abnormalities.

[0795] "Display means" refers to a device or system for visually presenting obtained information to the user.

[0796] An "emotional analysis tool" is a device or system that analyzes a user's emotions from their voice and behavior and responds accordingly.

[0797] A "notification generation means" is a device or system that has the function of constructing messages for sending necessary information or warnings to users or caregivers.

[0798] The system of this invention consists of a server and a user terminal. The server has a storage means for centrally managing the operational information of local organizations, which includes membership information, meeting minutes, schedules, and information on the status of elderly members. The user terminal provides the user with an interface for accessing this information, enabling them to check and input new information.

[0799] The server includes a conversion mechanism that uses speech recognition APIs (e.g., Google Cloud Speech-to-Text) to convert audio data into text data. If the text data needs to be translated into multiple languages, this is done using translation tools such as the Google Cloud Translation API. For analysis, it uses algorithms to optimize scheduling of meetings and local events, proposing the most suitable schedule.

[0800] In an example implemented in a nursing care facility, the server analyzes the voice and behavior of elderly residents using an emotion analysis library (e.g., IBM Watson Natural Language Understanding) to understand their emotional state in real time. If an anomaly is detected through the monitoring system, an alert is quickly sent to the relevant parties.

[0801] Furthermore, the notification generation system constructs messages to inform users and caregivers of emotional states or abnormalities, and provides these messages to caregivers via smartphone or tablet displays. Here, a generation AI model uses prompts to create appropriate messages. For example, if a user shows signs of anxiety, a message such as, "The user appears to be emotionally unstable. Please speak to them gently to reassure them," might be displayed to the caregiver.

[0802] An example of a prompt message for the generating AI model would be: "Analyze the user's voice data and determine their emotional state. If stress or anxiety is detected, generate an appropriate message to notify the caregiver."

[0803] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0804] Step 1:

[0805] The user collects voice recordings of elderly individuals using a device. The voice data is then sent to a server. The input is the user's voice data, and the output is the voice data that has been transferred to the server.

[0806] Step 2:

[0807] The server receives audio data and converts it to text data using a speech recognition API. The input is the received audio data, and the output is the transcribed audio data. The audio is analyzed using a speech recognition API (e.g., Google Cloud Speech-to-Text) and the corresponding text is generated.

[0808] Step 3:

[0809] The server uses an emotion analysis library (e.g., IBM Watson Natural Language Understanding) to analyze the user's emotional state from text data. The input is transcribed audio data, and the output is the analysis result indicating the user's emotional state. The text is analyzed to determine emotions such as positive or negative.

[0810] Step 4:

[0811] The server, as needed, uses the analysis results to create an appropriate notification message using a generative AI model. The input is the analysis result of the emotional state, and the output is the message to be provided to the caregiver. Prompt sentences are input to the generative AI model to construct an appropriate message based on the emotional state.

[0812] Step 5:

[0813] The server uses a notification generation mechanism to send a notification message to the caregiver's terminal. The input is the generated notification message, and the output is the message displayed on the caregiver's terminal. The message is sent to the terminal via the network to inform the caregiver of the situation.

[0814] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0815] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0816] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0817] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0818] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0819] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0820] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0821] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0822] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0823] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0824] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0825] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0826] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0827] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0828] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0829] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0830] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0831] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0832] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0833] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0834] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0835] The following is further disclosed regarding the embodiments described above.

[0836] (Claim 1)

[0837] A storage means for storing operational information of local organizations,

[0838] A conversion method for converting audio data into text data,

[0839] A translation means for translating the aforementioned text data into multiple languages,

[0840] Analytical means for adjusting the schedule of local events,

[0841] Monitoring means for monitoring the trends of the elderly,

[0842] A display means that provides the information generated from each of the above means to the user,

[0843] A system that includes this.

[0844] (Claim 2)

[0845] The system according to claim 1, wherein the translation means performs multilingual translation in real time.

[0846] (Claim 3)

[0847] The system according to claim 1, wherein the monitoring means grasps the activity of the elderly person and issues an alarm when it detects an abnormality.

[0848] "Example 1"

[0849] (Claim 1)

[0850] A storage means for storing data related to the operation of a local organization,

[0851] A conversion means for converting audio data into text information,

[0852] A translation means for translating the aforementioned text information into multiple languages,

[0853] Analytical tools for coordinating the schedule of local events,

[0854] Monitoring methods to monitor and predict the behavior of the elderly,

[0855] A display means that provides the user with the information obtained from each of the above means,

[0856] An editing method that allows users to edit the generated meeting minutes,

[0857] A translation method that uses a generative AI model to promote multilingual mutual understanding for participants using different languages,

[0858] A means of determining the optimal event date based on the schedule information stored in the database and the entered date proposals,

[0859] An alarm system that issues a warning when an abnormality is detected in the activity information of elderly people,

[0860] A system that includes this.

[0861] (Claim 2)

[0862] The system according to claim 1, wherein the translation means performs multilingual translation in real time and improves translation accuracy using a generative AI model.

[0863] (Claim 3)

[0864] The system according to claim 1, wherein the monitoring means grasps the activity of the elderly person and issues an alarm when it detects an abnormality using a predictive model.

[0865] "Application Example 1"

[0866] (Claim 1)

[0867] A data management system for storing business information of local organizations,

[0868] A data conversion method for converting audio materials into text materials,

[0869] A language conversion means for translating the aforementioned textual material into multiple languages,

[0870] Information analysis tools for coordinating schedules for community activities,

[0871] A dynamic monitoring system for monitoring the behavior of elderly individuals,

[0872] Information presentation means that provides information generated from each of the above means to the user,

[0873] A data acquisition method for collecting audio materials in real time,

[0874] A means for dynamically displaying the aforementioned information and sharing information among residents,

[0875] A system that includes this.

[0876] (Claim 2)

[0877] The system according to claim 1, wherein the language conversion means performs multilingual conversion in real time, thereby supporting the user in communicating in different languages.

[0878] (Claim 3)

[0879] The system according to claim 1, wherein the motion monitoring means grasps the range of movement of an elderly person and issues an alarm if it exceeds that range.

[0880] "Example 2 of combining an emotion engine"

[0881] (Claim 1)

[0882] A storage device for storing operational information of a regional organization,

[0883] A conversion means for converting audio information into text information,

[0884] A language conversion means for translating the aforementioned textual information into another language,

[0885] Analytical means for adjusting the plan for regional activities,

[0886] Monitoring means for monitoring the condition of elderly people,

[0887] An emotion recognition method that identifies emotions from audio and text information,

[0888] A notification generation means that generates notifications based on identified emotions,

[0889] A display means that provides information generated from each of the above means to the user,

[0890] A system that includes this.

[0891] (Claim 2)

[0892] The system according to claim 1, wherein the language conversion means performs multilingual translation immediately.

[0893] (Claim 3)

[0894] The system according to claim 1, wherein the monitoring means grasps the behavior of the elderly person and issues an alarm when it detects an abnormality.

[0895] "Application example 2 when combining with an emotional engine"

[0896] (Claim 1)

[0897] A storage means for storing operational information of local organizations,

[0898] A conversion method for converting audio data into text data,

[0899] A translation means for translating the aforementioned text data into multiple languages,

[0900] Analytical means for adjusting the schedule of local events,

[0901] Monitoring means for monitoring the trends of the elderly,

[0902] A display means that provides the information generated from each of the above means to the user,

[0903] An emotion analysis system for analyzing the user's emotions and notifying relevant parties if an anomaly is detected,

[0904] A notification generation method that generates appropriate notifications in a caregiving environment and prompts relevant parties,

[0905] A system that includes this.

[0906] (Claim 2)

[0907] The system according to claim 1, wherein the translation means performs multilingual translation in real time.

[0908] (Claim 3)

[0909] The system according to claim 1, wherein the monitoring means grasps the activity of the elderly person, issues an alarm when an abnormality is detected, and the emotion analysis means recognizes the emotional state of the user and notifies the caregiver as necessary. [Explanation of Symbols]

[0910] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A data management system for storing business information of local organizations, A data conversion method for converting audio materials into text materials, A language conversion means for translating the aforementioned textual material into multiple languages, Information analysis tools for coordinating schedules for community activities, A dynamic monitoring system for monitoring the behavior of elderly individuals, Information presentation means that provides information generated from each of the above means to the user, A data acquisition method for collecting audio materials in real time, A means for dynamically displaying the aforementioned information and sharing information among residents, A system that includes this.

2. The system according to claim 1, wherein the language conversion means performs multilingual conversion in real time, thereby supporting the user in communicating in different languages.

3. The system according to claim 1, wherein the motion monitoring means grasps the range of movement of an elderly person and issues an alarm if it exceeds that range.