system

A system that analyzes animal movements and vocalizations to enable two-way communication, addressing the challenge of understanding animal needs and emotions, enhancing pet care and research.

JP2026104589APending Publication Date: 2026-06-25SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-13
Publication Date
2026-06-25

Smart Images

  • Figure 2026104589000001_ABST
    Figure 2026104589000001_ABST
Patent Text Reader

Abstract

Provide a system. 【Solution means】 Means for obtaining the movement of an animal, Means for obtaining the vocalization of an animal, Means for analyzing the movement and vocalization to identify the demands or emotions of the animal, Means for notifying humans of the demands or emotions of the animal based on the analysis results, Means for converting an input from a human into a signal that can be understood by the animal, Means for transmitting the signal to the animal, Means for managing the happiness level of the animal for humans, A system including these.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] It is necessary to solve problems caused by the absence or misunderstanding of communication between animals and humans. Specifically, there is a problem of preventing animal stress and poor physical condition caused by the inability to accurately grasp the needs and emotions of animals in pet owners and animal care facilities. In addition, there is a situation where there is a lack of efficient means for animal behavior scientists to accurately analyze and understand animal behavior and emotions. In particular, it is difficult for the elderly and disabled to achieve smooth communication with animals, and there is a problem of burden in daily care.

Means for Solving the Problems

[0005] This invention provides a means for acquiring and analyzing animal movements and vocalizations in real time. Using means for acquiring movements and vocalizations, the system analyzes and identifies the animal's needs and emotions. Based on these analysis results, it includes a means for notifying humans of the animal's needs or emotions. Furthermore, it can convert human instructions into signals that animals can understand and transmit them to the animals. Additionally, the analysis results of animal movements and vocalizations can be recorded in a database for learning and improvement of analysis accuracy. This enables two-way communication between animals and humans, allowing for accurate understanding of animal needs and emotions, thus supporting smooth communication in pet care and animal research.

[0006] "Means for acquiring animal movements" refers to methods for detecting and acquiring data on the body movements and postures of animals.

[0007] "Means for acquiring animal sounds" refers to methods for detecting sounds emitted by animals and acquiring that sound data.

[0008] "Means of analysis to identify an animal's needs or emotions" refers to methods of analyzing acquired behavioral and vocal data and using that data to clarify the animal's current needs and psychological state.

[0009] "Means of notifying humans" refers to means of communicating the needs or emotions of animals, identified through analysis, in a way that humans can understand.

[0010] "Means of converting into signals" refers to methods of converting instructions or messages from humans into signal formats such as sounds or vibrations that animals can understand.

[0011] "Means of communication with animals" refers to means of transmitting converted signals to animals and conveying intentions.

[0012] "Means of recording in a database" refers to a system or technology that stores and retains the analysis results of acquired actions and vocalizations.

[0013] "Learning methods" refer to techniques that utilize past data and incorporate new information to improve the accuracy and efficiency of analytical techniques. [Brief explanation of the drawing]

[0014] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.

Embodiments for Carrying Out the Invention

[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0016] First, the terms used in the following description will be explained.

[0017] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0018] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0019] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0020] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0022] [First Embodiment]

[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0035] The system of this invention is designed to enable two-way communication between animals and humans. The system operates between a server, a terminal, and a user, each playing a specific role.

[0036] Server Role

[0037] The server plays a central role in receiving and analyzing animal movement and vocalization data transmitted from terminals. The server is equipped with an AI module that integrates video recognition and speech recognition technologies. Using video recognition technology, the server analyzes video footage of animals and extracts characteristic movements and facial expressions. Speech recognition technology converts animal vocalizations into frequency spectra and analyzes them to identify the animals' emotions and requests.

[0038] Based on these analysis results, the server identifies the animal's needs and emotions and notifies the user of the results. If the user enters instructions or messages they want to give to the animal, the server converts the content into signals that the animal can understand. This signal conversion includes customization based on the animal's species and individual needs.

[0039] Terminal role

[0040] The terminal functions as a device for collecting animal movements and sounds. Equipped with a camera and microphone, the terminal captures animal movements and sounds in real time and transmits the data to a server. Analysis results and notifications are displayed to the user visually or audibly on the terminal. It also has the function of receiving signals in response to user input and transmitting them to the animals.

[0041] User roles

[0042] The user is the one who understands the animal's condition and takes appropriate action through this system. By receiving notifications of the animal's condition via the terminal, the user can understand the animal's needs and emotions. Furthermore, the user can input instructions and questions for the animal into the terminal, and this information is converted into signals appropriate for the animal by the server and transmitted.

[0043] Specific example

[0044] As a concrete example, consider a case where a dog in a household is having difficulty communicating. The user points the device at the dog and records its actions and barks. The server analyzes this data, and if it determines that the dog is feeling anxious, it immediately notifies the user. The user then inputs an appropriate voice command, which the server converts into a short, easy-to-understand voice message and transmits through the device. Through this process, the user can understand what the dog is feeling and take appropriate action to reassure it.

[0045] Thus, the present invention provides an environment in which animals and humans can understand each other, and offers an effective means to realize higher quality pet care and animal research.

[0046] The following describes the processing flow.

[0047] Step 1:

[0048] The device captures animal movements with a camera and records animal sounds with a microphone. This data is collected in real time, compressed, and then transmitted to a server via a communication line.

[0049] Step 2:

[0050] The server receives video and audio data transmitted from the terminal. Video recognition AI analyzes the animal's movement data and extracts movement patterns and facial expressions. Additionally, audio recognition AI analyzes the animal's vocalizations and analyzes the characteristics of its timbre and tone.

[0051] Step 3:

[0052] The server identifies the animal's needs and emotions based on the analysis results. It integrates video and audio results to make a comprehensive judgment about the animal's state. For example, "tail wagging" + "loud vocalizations" might be identified as "excited."

[0053] Step 4:

[0054] The server generates a text notification detailing the status and requests of the identified animal. This notification is then sent from the server to the terminal.

[0055] Step 5:

[0056] The device displays notifications received from the server to the user. It informs the user of the animal's status through visual displays and audio alerts.

[0057] Step 6:

[0058] The user enters instructions or messages they want to convey to the animal into the device. After the input is complete, the content is sent to the server.

[0059] Step 7:

[0060] The server uses language generation AI to convert user input into signals that animals can understand. These signals can be customized to suit the animal species and individual characteristics.

[0061] Step 8:

[0062] The terminal receives signals transmitted from the server and transmits them to the animals. It plays a role in conveying instructions to the animals using voice output and vibration devices.

[0063] This series of steps enables effective communication between the animal and the user, allowing for responses that are tailored to the animal's needs and emotions.

[0064] (Example 1)

[0065] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0066] In animal-human communication, accurately understanding an animal's emotions and needs, and transmitting information to the animal in a way that humans can understand, is extremely difficult. Furthermore, conventional technologies struggle to perform appropriate signal conversion according to the animal species and individual, resulting in a lack of means for efficient two-way communication.

[0067] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0068] In this invention, the server includes means for acquiring animal movements, means for acquiring animal vocalizations, and means for customizing signal conversion according to the type and individual animal. This enables accurate analysis of the animal's emotions and needs, and facilitates smooth communication between animals and humans based on this analysis.

[0069] "Means for acquiring animal movements" refers to technologies that detect an animal's posture, movement, and behavior using electronic sensors and cameras, and collect that information as digital data.

[0070] "Methods for acquiring animal sounds" refer to technologies that record sounds emitted by animals using acoustic sensors such as microphones and acquire those sounds as digital data.

[0071] An "information processing system" is a system consisting of programs and hardware for analyzing collected digital data, and is equipped with algorithms for determining the emotions and needs of animals.

[0072] "Information transmission means" refers to a system for communicating the analyzed results to humans, and is a technology that provides data to users through screen display, audio output, or other audiovisual means.

[0073] An "information conversion means" is a system for converting instructions provided by a user into signals in a format that animals can understand, and it is equipped with a variety of signal conversion algorithms.

[0074] "Information output means" refers to devices or systems for transmitting converted signals to animals, and is a technology that outputs signals by means such as sound, light, or vibration.

[0075] "Signal conversion customization means" refers to methods and techniques for performing signal conversion that takes into account the individual characteristics of the animal species and individual.

[0076] Modes for carrying out the invention

[0077] The system of this invention enables two-way communication between animals and humans through the cooperation of a server, a terminal, and a user.

[0078] Server Role

[0079] The server plays a central role in receiving and analyzing animal movement and vocalization data. Specifically, the server is equipped with an AI module that uses video recognition technology to identify animal movements and speech recognition technology to convert animal vocalizations into frequency spectra. These analysis techniques are implemented using open-source AI frameworks and dedicated hardware accelerators. For example, the server analyzes animal videos, extracts specific facial expressions and movements, and then analyzes the animal's emotions based on that.

[0080] Terminal role

[0081] The terminal functions as a device for collecting animal movements and sounds. Equipped with a camera and microphone, the terminal captures animal movements and sounds in real time and transmits the data to a server. The terminal has the function of notifying the user of the analysis results visually or audibly, and also transmits signals to convey user instructions to the animal. For example, the terminal records the movement of a dog's tail and its barks, and smoothly transmits that information to the server.

[0082] User roles

[0083] The user is the one who understands the animal's condition through the system and takes appropriate action. The user can use a terminal to grasp the animal's emotions and needs. The user also provides instructions and messages to the animal, and this information is converted into signals by the server and transmitted to the animal. For example, if the user enters "calm down" into the terminal, the server converts it into a format that the animal can understand and transmits it.

[0084] Specific example

[0085] For example, if a dog is showing signs of anxiety at home, the user can use a device to capture the dog's movements and barks. The server analyzes the data and, if it determines that the dog is feeling anxious, notifies the user. The user can then input "play some cheerful music" into the device, and the server converts this input into an audio signal that the dog can easily understand and plays it through the device. Through this process, the user can reassure the dog and deepen their mutual understanding.

[0086] As an example of a prompt to the generative AI model, we will use text such as, "If a dog is feeling anxious, how can I calm it down?"

[0087] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0088] Step 1:

[0089] The device uses a camera and microphone to capture animal movements and sounds in real time. Input consists of visual and auditory information from the animal. Output is the process of generating this information as digital data and converting it into data packets.

[0090] Step 2:

[0091] The terminal transmits the acquired digital data to the server using wireless communication. The input is the animal's movement data and vocalization data generated in step 1. The output is a notification that the data transmission to the server is complete.

[0092] Step 3:

[0093] The server uses an AI module to analyze the received motion data and vocal data. The input is digital data sent from the terminal. The output is the result of identifying the animal's characteristic movements, facial expressions, emotions, and requests. In this process, the server performs specific actions such as analyzing movements with a video processing algorithm and analyzing vocalizations with an audio processing algorithm.

[0094] Step 4:

[0095] The server identifies the animal's emotions and needs based on the analysis results and generates a message to notify the user. The input is the analysis results from step 3. The output is the notification message for the user. This helps the user understand what the animal wants.

[0096] Step 5:

[0097] The user inputs instructions and messages for the animal into the terminal based on notifications from the server. The input is the user's instructions. The output is displayed on the terminal as specific instruction data to be conveyed to the animal.

[0098] Step 6:

[0099] The server converts the user's input into signals that the animal can understand. The input is the user's instruction data from step 5. The output is the data converted into signals appropriate for the animal species and individual. At this stage, the server performs an operation to convert the instructions into a format suitable for the animal, such as voice or vibration.

[0100] Step 7:

[0101] The terminal transmits the signal converted by the server to the animal. The input is the converted signal data from step 6. The output is a signal emitted in a format that the animal can receive. Specific actions include playing voice commands using the terminal's speaker.

[0102] (Application Example 1)

[0103] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0104] When keeping animals, it is difficult for owners to accurately understand their animals' emotions and needs, and there is a particular need to monitor pet well-being in real time. This can lead to an inability to respond quickly and appropriately to the anxiety and stress the animal is experiencing, potentially compromising animal welfare. There is a need for a system that can solve this problem and enable better relationships between animals and humans.

[0105] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0106] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, means for notifying humans of the animal's requests or emotions based on the analysis results, and means for managing the animal's well-being for humans. This allows humans to understand the animal's state and well-being in real time and take appropriate action.

[0107] "Means for acquiring animal movements" refers to devices that capture the body movements and postures of animals using sensors such as cameras.

[0108] "Means for acquiring animal sounds" refers to a device that collects sounds made by animals using microphones or similar means and transmits that audio data to a server.

[0109] "Means for analyzing actions and vocalizations to identify an animal's needs or emotions" refers to algorithms or software that analyze acquired action and vocal data to determine what an animal wants or what emotions it is experiencing.

[0110] "Means of notifying humans of an animal's needs or emotions" refers to devices or interfaces that visualize or audibly communicate the analyzed state of an animal to humans.

[0111] "Means of converting human input into signals understandable to animals" refers to a system or program that transforms commands or messages input by humans into a form that animals can understand.

[0112] "Means of transmitting signals to animals" refers to devices that transmit converted signals to animals, often in the form of auditory or visual stimuli.

[0113] A "means of managing animal well-being" refers to a system that continuously evaluates changes in an animal's emotional state and needs, determines its level of well-being, and provides information to the owner.

[0114] This system works in conjunction with servers, terminals, and users to facilitate smooth communication between animals and humans.

[0115] The server plays a central role in this system, analyzing behavioral and vocal data acquired from animals. This utilizes an AI module that integrates image and speech recognition technologies. Specifically, software such as TENSORFLOW® and PyTorch are used to train and infer machine learning models for identifying animal behavior and emotional states. The server uses the analysis results to identify the animals' emotions and needs and notifies the user of this information.

[0116] The terminal is an interface device for acquiring animal movements and vocalizations. It is equipped with a camera and microphone and transmits information to the server in real time. The terminal receives the analysis results and displays them to the user visually or verbally. The terminal also receives instructions that the user wants to send to the animal, forwards them to the server, and outputs them in a format that the animal can understand.

[0117] The user plays a role in understanding the pet's emotions and needs and responding appropriately. The user receives notifications about the animal's status through the device and can input messages and instructions for the animal. These instructions are converted into signals that the animal can easily understand by the server and transmitted via the device. This allows the user to manage their pet's well-being and provide better pet care.

[0118] As a concrete example, consider a scenario where a pet care robot detects unusual barking from a dog while the user is away. The server determines that "the dog is bored" and notifies the user of appropriate action. When the user inputs the command "give the dog a toy" from their smartphone, the robot provides the dog with a toy, relieving its boredom. An example of a prompt message could be, "Please suggest ways to reduce the dog's anxiety."

[0119] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0120] Step 1:

[0121] The device captures animal movements with a camera and records animal sounds with a microphone. The input consists of video and audio data. This data is transmitted to the server in real time. The output is raw data for analysis by the server.

[0122] Step 2:

[0123] The server analyzes the received motion data using video recognition technology. Specifically, it divides the acquired video into frames and extracts characteristic movements and postures of the animal from each frame. In this process, a generative AI model such as TensorFlow is used to determine the animal's motion patterns. The output is the analysis result regarding the animal's movements.

[0124] Step 3:

[0125] The server processes the received audio data using speech recognition technology. The audio data is converted into a frequency spectrum, and features related to emotions and requests are extracted from the speech. This process uses, for example, a generative AI model using PyTorch. The output is the result of analyzing animal speech.

[0126] Step 4:

[0127] The server integrates motion analysis results and voice analysis results to identify the animal's emotions and needs. This uses data fusion technology to accurately predict the animal's emotional state from each analysis result. The output is an evaluation result regarding the animal's state.

[0128] Step 5:

[0129] The server notifies the terminal of the animal's emotions and needs extracted through analysis. The terminal converts this information into a visual or audio format for display on the user interface. The output is visualized or audio information for the user to understand.

[0130] Step 6:

[0131] The user checks the animal's condition through the terminal and inputs appropriate instructions into the terminal. These instructions are intended to prompt the animal's response and behavior in response to its condition. The input consists of instructions from the user.

[0132] Step 7:

[0133] The server receives user instructions and converts them into signals that animals can understand. This conversion process is customized according to the animal species and individual differences. The output is a signal for communication with the animal.

[0134] Step 8:

[0135] The device transmits the converted signal to the animal. This allows the animal to understand instructions from humans and take corresponding actions. The output is the signal the animal hears.

[0136] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0137] This invention provides smoother and more effective interaction by incorporating an emotion engine that recognizes user emotions into a system that enables two-way communication between animals and humans. The system mainly consists of three components: a server, a terminal, and a user, with each component playing a specific role.

[0138] Server Role

[0139] The server functions as the main module, receiving and analyzing animal behavior and vocalization data transmitted from the terminal. AI combining video and speech recognition technologies extracts characteristics of the animal's movements, facial expressions, and vocalizations to identify the animal's needs and emotions. Furthermore, a newly integrated emotion engine analyzes the user's video or audio data received from the terminal to assess the user's current emotional state. Based on these results, the server adjusts the output signal to the animal.

[0140] Terminal role

[0141] The terminal is a device for collecting animal and user movements and sounds. The terminal takes photos of animals and records audio, while simultaneously capturing the user's facial expressions and voice pitch, and sending this data to the server. This allows the server to analyze the data, including the user's emotional state.

[0142] User roles

[0143] The user is the agent who understands the animal's condition and needs through this system and takes the necessary actions. The user receives notifications about the animal's condition and emotions via a terminal, and can also input instructions for the animal based on their own emotions. The commands entered by the user are analyzed by the server's emotion engine and transmitted to the animal in the most optimal way.

[0144] Specific example

[0145] Consider a scenario in a home where a dog is seeking the user's attention. The device captures the dog's movements and barks, and also collects the user's facial expressions and voice. The server analyzes the dog's data and determines that the dog "wants attention," while simultaneously analyzing the user's video and recognizing that the user is "tired." Based on these two analysis results, the server transmits appropriate signals and sounds to the dog through the device to help it feel safe without becoming overly agitated. As a result, the dog feels somewhat safer, allowing the user to remain relaxed as well.

[0146] This system utilizes an emotion engine to provide feedback to animals that takes the user's emotions into account, resulting in more natural and personalized communication. This makes it possible to further deepen the relationship between animals and humans.

[0147] The following describes the processing flow.

[0148] Step 1:

[0149] The device captures animal movements with a camera and records animal sounds with a microphone. Furthermore, it collects the user's facial expressions and voice, and simultaneously prepares this data for transmission to a server.

[0150] Step 2:

[0151] The server receives motion, vocalization, user facial expressions, and voice data transmitted from the terminal. The animal's motion and vocalizations are analyzed using image recognition AI and voice recognition AI to identify the animal's needs and emotions.

[0152] Step 3:

[0153] The server analyzes the user's facial expressions and voice data received using an emotion engine to evaluate the user's emotional state. For example, it might determine whether the user is "tired" or "relaxed."

[0154] Step 4:

[0155] The server generates the optimal response based on the animal's needs and the user's emotional state. It converts instructions and messages to the animal into signals and voices that take the user's emotions into account.

[0156] Step 5:

[0157] The device receives the converted signal sent from the server and communicates instructions to the animal through appropriate voice output or vibration. This allows the animal to understand the user's intentions.

[0158] Step 6:

[0159] The user observes the animal's responses through a terminal and enters additional instructions as needed. The server then analyzes this input again and continues to generate adaptive responses.

[0160] This process enables smooth, emotion-conscious communication between animals and users.

[0161] (Example 2)

[0162] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0163] Modern pet communication systems have limitations in their ability to recognize animal behavior and needs, making it difficult to achieve smooth communication between humans and animals. Furthermore, they often fail to provide feedback that takes into account human emotional states, potentially leading to insufficient interaction with animals. Therefore, a system is needed that enables optimal communication for both animals and humans, fostering deeper bonds.

[0164] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0165] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, and means for acquiring human emotional states. This makes it possible to comprehensively analyze the movements and emotional states of humans and animals and generate feedback that facilitates mutual understanding.

[0166] "Means of acquiring motion" refers to technologies for detecting the body movements of animals and collecting that data.

[0167] "Means of acquiring animal sounds" refers to technologies for recording or analyzing sounds emitted by animals.

[0168] "Means for acquiring human emotional states" refers to technologies that detect the characteristics of human facial expressions and voices and identify those emotional states.

[0169] "Means of analysis" refers to techniques for identifying the intentions and emotions behind animal behavior based on collected animal actions and vocalizations.

[0170] "Means of notification" refers to technologies for conveying analyzed information to humans.

[0171] "Means of conversion" refers to technologies for converting instructions or inputs from humans into a format that animals can understand.

[0172] "Means of transmission" refers to technologies used to transmit signals and information to animals.

[0173] A "data recording device" refers to a device that stores the results of analyses of animal movements and vocalizations, and uses them to improve the accuracy of future analyses.

[0174] "Learning methods" refer to machine learning techniques used to improve the accuracy of analysis based on recorded data.

[0175] "Notification means" refers to technologies for quickly informing about abnormalities or emergencies.

[0176] This invention provides a smoother and more effective two-way communication system with animals by taking into account the emotional state of the human being. The system mainly consists of three components: a server, a terminal, and a user, each playing a specific role.

[0177] Server Role

[0178] The server is the main module that receives and analyzes animal behavior and vocal data transmitted from the terminal. The server uses a generative AI model to analyze animal behavior and vocalizations, identifying the animal's needs and emotions. Furthermore, the server's emotion engine analyzes human video and audio data received from the terminal, evaluating the user's emotional state. Based on this information, the server generates appropriate feedback for the animal and transmits it via the terminal.

[0179] Terminal role

[0180] The device is equipped with sensors, cameras, and microphones to collect animal movements and sounds, as well as the user's facial expressions and voice. The device transmits this data to a server, allowing the server to understand the state of both the animal and the user. This ensures smoother overall system operation and improves the user experience.

[0181] User roles

[0182] The user is the one who understands the animal's condition and needs through the device and takes the necessary actions. The user can also input specifications that reflect their own emotions into the device. This input is analyzed by the server and reflected as optimal feedback to the animal. This allows the user to interact with the animal in a way that suits their own life circumstances.

[0183] Specific example

[0184] Consider a real-world scenario where a dog in the home wants attention. The device captures the dog's movements and barks, and also collects the user's facial expressions and voice. The server uses a generative AI model to identify that the dog "wants attention" while recognizing that the user is "tired." As a result, a voice message to reassure the dog is generated and played back to the dog through the device. This system achieves more personalized and natural communication because the emotion engine provides feedback to the animal that takes the user's emotions into account.

[0185] Examples of prompts when using a generative AI model include:

[0186] "Analyze the dog's barks to tell us what the pet wants. Also, determine the user's current emotional state and suggest the best feedback for the pet."

[0187] These are some possibilities.

[0188] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0189] Step 1:

[0190] The device collects animal movements and sounds. Using a camera and microphone, it captures the dog's movements and barks, and sensors also collect the user's facial expressions and voice. This data is prepared to be sent to a server as it is necessary to identify the animal's needs and emotions.

[0191] Step 2:

[0192] The server receives animal and user data transmitted from the terminal. The video and audio data of the animals obtained as input are analyzed by a generative AI model to identify the characteristics of the animals' movements and determine their emotional states, such as "seeking attention." Similarly, the user's video and audio data are analyzed by an emotion engine to determine the user's emotional state, such as whether they are "tired." The data is processed by an AI algorithm and output as an emotion analysis result in text format.

[0193] Step 3:

[0194] The server generates appropriate feedback for the animal based on the analysis results. For example, it uses a prompt to generate a reassuring message for the dog, and a generative AI model creates an audio message. This message is designed to calm the animal's behavior. The generated audio data is then sent to the device.

[0195] Step 4:

[0196] The device plays audio data received from the server. This provides animals with audio that serves as a guide for their actions, and is expected to have effects such as making the animals feel safe. The device adjusts the volume and timing of the playback according to the animal's response, supporting effective communication.

[0197] Step 5:

[0198] The user receives feedback from the device and uses that feedback to decide how to interact with the animal. The user inputs instructions into the device that reflect their own emotions and circumstances, and these instructions are then analyzed by the server and reflected as optimal feedback for the animal. This allows the user to take actions appropriate to their situation.

[0199] (Application Example 2)

[0200] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0201] To facilitate smooth communication between animals and humans, it is necessary to understand each other's emotions and desires and respond appropriately based on that understanding. However, conventional systems only analyze signals from animals to humans and do not take human emotions into consideration, thus failing to achieve sufficient communication. Therefore, it is urgent to consider the emotions of both parties and achieve consistent interaction.

[0202] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0203] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, and means for analyzing human emotional states. This enables two-way interaction that takes into account not only animal emotions and requests, but also human emotional states.

[0204] "Means for acquiring motion" refers to a device or method that detects the movement of an animal's body and collects that information as data.

[0205] "Means for acquiring animal sounds" refers to a device or method for collecting animal sounds as audio data.

[0206] "Means for analyzing and identifying an animal's needs or emotions" refers to a device or method that analyzes acquired data on an animal's movements and vocalizations and uses that data to clarify the animal's desires and emotions.

[0207] "Means for analyzing human emotional states" refers to a device or method that analyzes audio and video data obtained from a user to identify their current emotional state.

[0208] "Means of notifying humans of requests or emotions" refers to a device or method that conveys information about emotions or requests analyzed from animals to a user.

[0209] "Means of converting human input into signals understandable to animals" refers to a device or method that converts commands or instructions from a user into signals in a format that animals can understand and respond to.

[0210] "Means for transmitting signals to animals" refers to a device or method that actually transmits the converted signals to animals to achieve a desired interaction.

[0211] The server receives data transmitted from terminals that capture animal movements and sounds, and uses AI that combines image recognition and speech recognition technologies to analyze it. Specifically, it uses OpenCV and TensorFlow to analyze animal movements and facial expressions, and Google Cloud AI and Microsoft Azure AI to process the audio data. This makes it possible to identify the animal's needs and emotions.

[0212] Furthermore, to analyze human emotional states, the server receives video or audio data transmitted by the user and analyzes it using a generative AI model. This allows the user's emotions to be identified, enabling the output signal to the animal to be appropriately adjusted. The Google Speech-to-Text API could be used as the voice analysis technology.

[0213] The device is equipped with a camera and microphone to efficiently collect information about animals and users. Using these devices, the device transmits information such as the animals' movements and sounds, as well as the user's facial expressions and voice tone, to a server.

[0214] This system allows users to receive notifications about the animal's condition and needs, and take necessary actions. Furthermore, user instructions are analyzed on the server and transmitted in a format recognizable to the animal, enabling natural communication.

[0215] For example, if a pet is lonely when its owner returns home tired, the server can sense the user's fatigue, generate a calming voice signal, and deliver it to the pet via the device. This allows the pet to feel secure and the owner to remain relaxed. Other examples of input prompts for the generating AI model include, "Analyze the pet's condition and suggest ways to communicate that will help the owner relax," or "Please tell me how to alleviate the dog's loneliness."

[0216] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0217] Step 1:

[0218] The device uses its camera and microphone to capture animal movements and sounds, as well as the user's facial expressions and voice. It collects animal video and audio data, and user video and audio data as input. This data is then sent directly to the server.

[0219] Step 2:

[0220] The server receives video and audio data of animals sent from the terminal. Using video recognition technologies such as OpenCV and TensorFlow, it analyzes the animals' movements and facial expressions, and uses Google Cloud AI to identify the animals' needs and emotions from the audio data. As output, it generates data on the identified needs and emotions of the animals.

[0221] Step 3:

[0222] The server receives video and audio data from the user transmitted from the terminal. This data is processed by a generative AI model to identify the user's emotional state. The output is data related to the user's emotional state.

[0223] Step 4:

[0224] The server integrates the animal's emotional data obtained in step 2 with the user's emotional data obtained in step 3 to generate appropriate output signals for the animal. Specifically, it creates voice signals and behavioral commands that will reassure the pet. The output of this step is signal data for the animal.

[0225] Step 5:

[0226] The terminal receives signal data for animals transmitted from the server. Based on this, it outputs audio signals and behavioral commands to the animals using speakers, LED displays, etc.

[0227] Step 6:

[0228] Users can monitor the animal's condition and reactions in real time and input additional instructions into the terminal as needed. This process of sending user instructions back to the server and then relaying them to the animal in an optimized form is repeated.

[0229] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0230] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0231] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0232] [Second Embodiment]

[0233] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0234] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0235] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0236] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0237] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0238] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0239] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0240] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0241] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0242] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0243] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0244] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0245] The system of this invention is designed to enable two-way communication between animals and humans. The system operates between a server, a terminal, and a user, each playing a specific role.

[0246] Server Role

[0247] The server plays a central role in receiving and analyzing animal movement and vocalization data transmitted from terminals. The server is equipped with an AI module that integrates video recognition and speech recognition technologies. Using video recognition technology, the server analyzes video footage of animals and extracts characteristic movements and facial expressions. Speech recognition technology converts animal vocalizations into frequency spectra and analyzes them to identify the animals' emotions and requests.

[0248] Based on these analysis results, the server identifies the animal's needs and emotions and notifies the user of the results. If the user enters instructions or messages they want to give to the animal, the server converts the content into signals that the animal can understand. This signal conversion includes customization based on the animal's species and individual needs.

[0249] Terminal role

[0250] The terminal functions as a device for collecting animal movements and sounds. Equipped with a camera and microphone, the terminal captures animal movements and sounds in real time and transmits the data to a server. Analysis results and notifications are displayed to the user visually or audibly on the terminal. It also has the function of receiving signals in response to user input and transmitting them to the animals.

[0251] User roles

[0252] The user is the one who understands the animal's condition and takes appropriate action through this system. By receiving notifications of the animal's condition via the terminal, the user can understand the animal's needs and emotions. Furthermore, the user can input instructions and questions for the animal into the terminal, and this information is converted into signals appropriate for the animal by the server and transmitted.

[0253] Specific example

[0254] As a concrete example, consider a case where a dog in a household is having difficulty communicating. The user points the device at the dog and records its actions and barks. The server analyzes this data, and if it determines that the dog is feeling anxious, it immediately notifies the user. The user then inputs an appropriate voice command, which the server converts into a short, easy-to-understand voice message and transmits through the device. Through this process, the user can understand what the dog is feeling and take appropriate action to reassure it.

[0255] Thus, the present invention provides an environment in which animals and humans can understand each other, and offers an effective means to realize higher quality pet care and animal research.

[0256] The following describes the processing flow.

[0257] Step 1:

[0258] The device captures animal movements with a camera and records animal sounds with a microphone. This data is collected in real time, compressed, and then transmitted to a server via a communication line.

[0259] Step 2:

[0260] The server receives video and audio data transmitted from the terminal. Video recognition AI analyzes the animal's movement data and extracts movement patterns and facial expressions. Additionally, audio recognition AI analyzes the animal's vocalizations and analyzes the characteristics of its timbre and tone.

[0261] Step 3:

[0262] The server identifies the animal's needs and emotions based on the analysis results. It integrates video and audio results to make a comprehensive judgment about the animal's state. For example, "tail wagging" + "loud vocalizations" might be identified as "excited."

[0263] Step 4:

[0264] The server generates a text notification detailing the status and requests of the identified animal. This notification is then sent from the server to the terminal.

[0265] Step 5:

[0266] The device displays notifications received from the server to the user. It informs the user of the animal's status through visual displays and audio alerts.

[0267] Step 6:

[0268] The user enters instructions or messages they want to convey to the animal into the device. After the input is complete, the content is sent to the server.

[0269] Step 7:

[0270] The server uses language generation AI to convert user input into signals that animals can understand. These signals can be customized to suit the animal species and individual characteristics.

[0271] Step 8:

[0272] The terminal receives signals transmitted from the server and transmits them to the animals. It plays a role in conveying instructions to the animals using voice output and vibration devices.

[0273] This series of steps enables effective communication between the animal and the user, allowing for responses that are tailored to the animal's needs and emotions.

[0274] (Example 1)

[0275] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0276] In animal-human communication, accurately understanding an animal's emotions and needs, and transmitting information to the animal in a way that humans can understand, is extremely difficult. Furthermore, conventional technologies struggle to perform appropriate signal conversion according to the animal species and individual, resulting in a lack of means for efficient two-way communication.

[0277] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0278] In this invention, the server includes means for acquiring animal movements, means for acquiring animal vocalizations, and means for customizing signal conversion according to the type and individual animal. This enables accurate analysis of the animal's emotions and needs, and facilitates smooth communication between animals and humans based on this analysis.

[0279] "Means for acquiring animal movements" refers to technologies that detect an animal's posture, movement, and behavior using electronic sensors and cameras, and collect that information as digital data.

[0280] "Methods for acquiring animal sounds" refer to technologies that record sounds emitted by animals using acoustic sensors such as microphones and acquire those sounds as digital data.

[0281] An "information processing system" is a system consisting of programs and hardware for analyzing collected digital data, and is equipped with algorithms for determining the emotions and needs of animals.

[0282] "Information transmission means" refers to a system for communicating the analyzed results to humans, and is a technology that provides data to users through screen display, audio output, or other audiovisual means.

[0283] An "information conversion means" is a system for converting instructions provided by a user into signals in a format that animals can understand, and it is equipped with a variety of signal conversion algorithms.

[0284] "The 'information output means' is a device or system for transmitting the converted signal to an animal, and is a technology for outputting the signal by means such as voice, light, vibration, etc."

[0285] "The'signal conversion customization means' is a method and technology for performing signal conversion considering individual characteristics according to the type and individual of an animal."

[0286] "Modes for Carrying Out the Invention"

[0287] "The system of the present invention realizes two-way communication between animals and humans through the cooperation of a server, a terminal, and a user."

[0288] "Role of the Server"

[0289] "The server plays a central role in receiving and analyzing the motion data and vocalization data of animals. Specifically, the server is equipped with an AI module, uses video recognition technology to identify the movements of animals, and further uses voice recognition technology to convert the vocalizations of animals into a frequency spectrum. These analysis technologies are realized by utilizing open-source AI frameworks or dedicated hardware accelerators. For example, the server analyzes a video of an animal, extracts specific expressions and movements, and analyzes the emotions of the animal based on them."

[0290] "Role of the Terminal"

[0291] "The terminal functions as a device for collecting the movements and vocalizations of animals. A terminal equipped with a camera and a microphone captures the movements and sounds of animals in real time and transmits the data to the server. The terminal has a function of notifying the user visually or aurally of the analysis results, and further performs signal transmission when transmitting the user's instructions to the animal. For example, the terminal records the movement and vocalization of a dog's tail and smoothly transmits the information to the server."

[0292] "Role of the User"

[0293] The user is the one who understands the animal's condition through the system and takes appropriate action. The user can use a terminal to grasp the animal's emotions and needs. The user also provides instructions and messages to the animal, and this information is converted into signals by the server and transmitted to the animal. For example, if the user enters "calm down" into the terminal, the server converts it into a format that the animal can understand and transmits it.

[0294] Specific example

[0295] For example, if a dog is showing signs of anxiety at home, the user can use a device to capture the dog's movements and barks. The server analyzes the data and, if it determines that the dog is feeling anxious, notifies the user. The user can then input "play some cheerful music" into the device, and the server converts this input into an audio signal that the dog can easily understand and plays it through the device. Through this process, the user can reassure the dog and deepen their mutual understanding.

[0296] As an example of a prompt to the generative AI model, we will use text such as, "If a dog is feeling anxious, how can I reassure it?"

[0297] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0298] Step 1:

[0299] The device uses a camera and microphone to capture animal movements and sounds in real time. Input consists of visual and auditory information from the animal. Output is the process of generating this information as digital data and converting it into data packets.

[0300] Step 2:

[0301] The terminal transmits the acquired digital data to the server using wireless communication. The input is the animal's movement data and vocalization data generated in step 1. The output is a notification that the data transmission to the server is complete.

[0302] Step 3:

[0303] The server analyzes the received motion data and vocalization data by utilizing the AI module. The input is the digital data sent from the terminal. The output is the result of identifying the characteristic movements, expressions, emotions, and demands of the animal. At this time, the server performs specific operations of analyzing the movements with video processing algorithms and analyzing the vocalizations with audio processing algorithms.

[0304] Step 4:

[0305] The server identifies the emotions and demands of the animal based on the analysis results and generates a message for notifying the user. The input is the analysis result of Step 3. The output is the notification message for the user. By this, an operation is performed to make the user understand what the animal is asking for.

[0306] Step 5:

[0307] The user inputs an instruction or message for the animal into the terminal based on the notification from the server. The input is the content of the user's instruction. The output is displayed on the terminal as specific instruction data to be conveyed to the animal.

[0308] Step 6:

[0309] The server converts the instruction input by the user into a signal that the animal can understand. The input is the user's instruction data of Step 5. The output is the data converted into a signal according to the type and individual of the animal. At this stage, the server performs an operation of converting the instruction content into a form suitable for the animal, such as voice or vibration.

[0310] Step 7:

[0311] The terminal transmits the signal converted by the server to the animal. The input is the converted signal data from step 6. The output is a signal emitted in a format that the animal can receive. Specific actions include playing voice commands using the terminal's speaker.

[0312] (Application Example 1)

[0313] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0314] When keeping animals, it is difficult for owners to accurately understand their animals' emotions and needs, and there is a particular need to monitor pet well-being in real time. This can lead to an inability to respond quickly and appropriately to the anxiety and stress the animal is experiencing, potentially compromising animal welfare. There is a need for a system that can solve this problem and enable better relationships between animals and humans.

[0315] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0316] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, means for notifying humans of the animal's requests or emotions based on the analysis results, and means for managing the animal's well-being for humans. This allows humans to understand the animal's state and well-being in real time and take appropriate action.

[0317] "Means for acquiring animal movements" refers to devices that capture the body movements and postures of animals using sensors such as cameras.

[0318] "Means for acquiring animal sounds" refers to a device that collects sounds made by animals using microphones or similar means and transmits that audio data to a server.

[0319] "Means for analyzing actions and vocalizations to identify an animal's needs or emotions" refers to algorithms or software that analyze acquired action and vocal data to determine what an animal wants or what emotions it is experiencing.

[0320] "Means of notifying humans of an animal's needs or emotions" refers to devices or interfaces that visualize or audibly communicate the analyzed state of an animal to humans.

[0321] "Means of converting human input into signals understandable to animals" refers to a system or program that transforms commands or messages input by humans into a form that animals can understand.

[0322] "Means of transmitting signals to animals" refers to devices that transmit converted signals to animals, often in the form of auditory or visual stimuli.

[0323] A "means of managing animal well-being" refers to a system that continuously evaluates changes in an animal's emotional state and needs, determines its level of well-being, and provides information to the owner.

[0324] This system works in conjunction with servers, terminals, and users to facilitate smooth communication between animals and humans.

[0325] The server plays a central role in this system, analyzing behavioral and vocal data acquired from animals. This utilizes an AI module that integrates image and speech recognition technologies. Specifically, TensorFlow and PyTorch are used to train and infer machine learning models for identifying animal behavior and emotional states. The server uses the analysis results to identify the animals' emotions and needs and notifies the user of this information.

[0326] The terminal is an interface device for acquiring animal movements and vocalizations. It is equipped with a camera and microphone and transmits information to the server in real time. The terminal receives the analysis results and displays them to the user visually or verbally. The terminal also receives instructions that the user wants to send to the animal, forwards them to the server, and outputs them in a format that the animal can understand.

[0327] The user plays a role in understanding the pet's emotions and needs and responding appropriately. The user receives notifications about the animal's status through the device and can input messages and instructions for the animal. These instructions are converted into signals that the animal can easily understand by the server and transmitted via the device. This allows the user to manage their pet's well-being and provide better pet care.

[0328] As a concrete example, consider a scenario where a pet care robot detects unusual barking from a dog while the user is away. The server determines that "the dog is bored" and notifies the user of appropriate action. When the user inputs the command "give the dog a toy" from their smartphone, the robot provides the dog with a toy, relieving its boredom. An example of a prompt message could be, "Please suggest ways to reduce the dog's anxiety."

[0329] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0330] Step 1:

[0331] The device captures animal movements with a camera and records animal sounds with a microphone. The input consists of video and audio data. This data is transmitted to the server in real time. The output is raw data for analysis by the server.

[0332] Step 2:

[0333] The server analyzes the received motion data using video recognition technology. Specifically, it divides the acquired video into frames and extracts characteristic movements and postures of the animal from each frame. In this process, a generative AI model such as TensorFlow is used to determine the animal's motion patterns. The output is the analysis result regarding the animal's movements.

[0334] Step 3:

[0335] The server processes the received audio data using speech recognition technology. The audio data is converted into a frequency spectrum, and features related to emotions and requests are extracted from the speech. This process uses, for example, a generative AI model using PyTorch. The output is the result of analyzing animal speech.

[0336] Step 4:

[0337] The server integrates motion analysis results and voice analysis results to identify the animal's emotions and needs. This uses data fusion technology to accurately predict the animal's emotional state from each analysis result. The output is an evaluation result regarding the animal's state.

[0338] Step 5:

[0339] The server notifies the terminal of the animal's emotions and needs extracted through analysis. The terminal converts this information into a visual or audio format for display on the user interface. The output is visualized or audio information for the user to understand.

[0340] Step 6:

[0341] The user checks the animal's condition through the terminal and inputs appropriate instructions into the terminal. These instructions are intended to prompt the animal's response and behavior in response to its condition. The input consists of instructions from the user.

[0342] Step 7:

[0343] The server receives user instructions and converts them into signals that animals can understand. This conversion process is customized according to the animal species and individual differences. The output is a signal for communication with the animal.

[0344] Step 8:

[0345] The device transmits the converted signal to the animal. This allows the animal to understand instructions from humans and take corresponding actions. The output is the signal the animal hears.

[0346] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0347] This invention provides smoother and more effective interaction by incorporating an emotion engine that recognizes user emotions into a system that enables two-way communication between animals and humans. The system mainly consists of three components: a server, a terminal, and a user, with each component playing a specific role.

[0348] Server Role

[0349] The server functions as the main module, receiving and analyzing animal behavior and vocalization data transmitted from the terminal. AI combining video and speech recognition technologies extracts characteristics of the animal's movements, facial expressions, and vocalizations to identify the animal's needs and emotions. Furthermore, a newly integrated emotion engine analyzes the user's video or audio data received from the terminal to assess the user's current emotional state. Based on these results, the server adjusts the output signal to the animal.

[0350] Terminal role

[0351] The terminal is a device for collecting animal and user movements and sounds. The terminal takes photos of animals and records audio, while simultaneously capturing the user's facial expressions and voice pitch, and sending this data to the server. This allows the server to analyze the data, including the user's emotional state.

[0352] User roles

[0353] The user is the agent who understands the animal's condition and needs through this system and takes the necessary actions. The user receives notifications about the animal's condition and emotions via a terminal, and can also input instructions for the animal based on their own emotions. The commands entered by the user are analyzed by the server's emotion engine and transmitted to the animal in the most optimal way.

[0354] Specific example

[0355] Consider a scenario in a home where a dog is seeking the user's attention. The device captures the dog's movements and barks, and also collects the user's facial expressions and voice. The server analyzes the dog's data and determines that the dog "wants attention," while simultaneously analyzing the user's video and recognizing that the user is "tired." Based on these two analysis results, the server transmits appropriate signals and sounds to the dog through the device to help it feel safe without becoming overly agitated. As a result, the dog feels somewhat safer, allowing the user to remain relaxed as well.

[0356] This system utilizes an emotion engine to provide feedback to animals that takes the user's emotions into account, resulting in more natural and personalized communication. This makes it possible to further deepen the relationship between animals and humans.

[0357] The following describes the processing flow.

[0358] Step 1:

[0359] The device captures animal movements with a camera and records animal sounds with a microphone. Furthermore, it collects the user's facial expressions and voice, and simultaneously prepares this data for transmission to a server.

[0360] Step 2:

[0361] The server receives motion, vocalization, user facial expressions, and voice data transmitted from the terminal. The animal's motion and vocalizations are analyzed using image recognition AI and voice recognition AI to identify the animal's needs and emotions.

[0362] Step 3:

[0363] The server analyzes the user's facial expressions and voice data received using an emotion engine to evaluate the user's emotional state. For example, it might determine whether the user is "tired" or "relaxed."

[0364] Step 4:

[0365] The server generates the optimal response based on the animal's needs and the user's emotional state. It converts instructions and messages to the animal into signals and voices that take the user's emotions into account.

[0366] Step 5:

[0367] The device receives the converted signal sent from the server and communicates instructions to the animal through appropriate voice output or vibration. This allows the animal to understand the user's intentions.

[0368] Step 6:

[0369] The user observes the animal's responses through a terminal and enters additional instructions as needed. The server then analyzes this input again and continues to generate adaptive responses.

[0370] This process enables smooth, emotion-conscious communication between animals and users.

[0371] (Example 2)

[0372] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0373] Modern pet communication systems have limitations in their ability to recognize animal behavior and needs, making it difficult to achieve smooth communication between humans and animals. Furthermore, they often fail to provide feedback that takes into account human emotional states, potentially leading to insufficient interaction with animals. Therefore, a system is needed that enables optimal communication for both animals and humans, fostering deeper bonds.

[0374] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0375] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, and means for acquiring human emotional states. This makes it possible to comprehensively analyze the movements and emotional states of humans and animals and generate feedback that facilitates mutual understanding.

[0376] "Means of acquiring motion" refers to technologies for detecting the body movements of animals and collecting that data.

[0377] "Means of acquiring animal sounds" refers to technologies for recording or analyzing sounds emitted by animals.

[0378] "Means for acquiring human emotional states" refers to technologies that detect the characteristics of human facial expressions and voices and identify those emotional states.

[0379] "Means of analysis" refers to techniques for identifying the intentions and emotions behind animal behavior based on collected animal actions and vocalizations.

[0380] "Means of notification" refers to technologies for conveying analyzed information to humans.

[0381] "Means of conversion" refers to technologies for converting instructions or inputs from humans into a format that animals can understand.

[0382] "Means of transmission" refers to technologies used to transmit signals and information to animals.

[0383] A "data recording device" refers to a device that stores the results of analyses of animal movements and vocalizations, and uses them to improve the accuracy of future analyses.

[0384] "Learning methods" refer to machine learning techniques used to improve the accuracy of analysis based on recorded data.

[0385] "Notification means" refers to technologies for quickly informing about abnormalities or emergencies.

[0386] This invention provides a smoother and more effective two-way communication system with animals by taking into account the emotional state of the human being. The system mainly consists of three components: a server, a terminal, and a user, each playing a specific role.

[0387] Server Role

[0388] The server is the main module that receives and analyzes animal behavior and vocal data transmitted from the terminal. The server uses a generative AI model to analyze animal behavior and vocalizations, identifying the animal's needs and emotions. Furthermore, the server's emotion engine analyzes human video and audio data received from the terminal, evaluating the user's emotional state. Based on this information, the server generates appropriate feedback for the animal and transmits it via the terminal.

[0389] Terminal role

[0390] The device is equipped with sensors, cameras, and microphones to collect animal movements and sounds, as well as the user's facial expressions and voice. The device transmits this data to a server, allowing the server to understand the state of both the animal and the user. This ensures smoother overall system operation and improves the user experience.

[0391] User roles

[0392] The user is the one who understands the animal's condition and needs through the device and takes the necessary actions. The user can also input specifications that reflect their own emotions into the device. This input is analyzed by the server and reflected as optimal feedback to the animal. This allows the user to interact with the animal in a way that suits their own life circumstances.

[0393] Specific example

[0394] Consider a real-world scenario where a dog in the home wants attention. The device captures the dog's movements and barks, and also collects the user's facial expressions and voice. The server uses a generative AI model to identify that the dog "wants attention" while recognizing that the user is "tired." As a result, a voice message to reassure the dog is generated and played back to the dog through the device. This system achieves more personalized and natural communication because the emotion engine provides feedback to the animal that takes the user's emotions into account.

[0395] Examples of prompts when using a generative AI model include:

[0396] "Analyze the dog's barks to tell us what the pet wants. Also, determine the user's current emotional state and suggest the best feedback for the pet."

[0397] These are some possibilities.

[0398] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0399] Step 1:

[0400] The device collects animal movements and sounds. Using a camera and microphone, it captures the dog's movements and barks, and sensors also collect the user's facial expressions and voice. This data is prepared to be sent to a server as it is necessary to identify the animal's needs and emotions.

[0401] Step 2:

[0402] The server receives animal and user data transmitted from the terminal. The video and audio data of the animals obtained as input are analyzed by a generative AI model to identify the characteristics of the animals' movements and determine their emotional states, such as "seeking attention." Similarly, the user's video and audio data are analyzed by an emotion engine to determine the user's emotional state, such as whether they are "tired." The data is processed by an AI algorithm and output as an emotion analysis result in text format.

[0403] Step 3:

[0404] The server generates appropriate feedback for the animal based on the analysis results. For example, it uses a prompt to generate a reassuring message for the dog, and a generative AI model creates an audio message. This message is designed to calm the animal's behavior. The generated audio data is then sent to the device.

[0405] Step 4:

[0406] The device plays audio data received from the server. This provides animals with audio that serves as a guide for their actions, and is expected to have effects such as making the animals feel safe. The device adjusts the volume and timing of the playback according to the animal's response, supporting effective communication.

[0407] Step 5:

[0408] The user receives feedback from the device and uses that feedback to decide how to interact with the animal. The user inputs instructions into the device that reflect their own emotions and circumstances, and these instructions are then analyzed by the server and reflected as optimal feedback for the animal. This allows the user to take actions appropriate to their situation.

[0409] (Application Example 2)

[0410] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0411] To facilitate smooth communication between animals and humans, it is necessary to understand each other's emotions and desires and respond appropriately based on that understanding. However, conventional systems only analyze signals from animals to humans and do not take human emotions into consideration, thus failing to achieve sufficient communication. Therefore, it is urgent to consider the emotions of both parties and achieve consistent interaction.

[0412] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0413] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, and means for analyzing human emotional states. This enables two-way interaction that takes into account not only animal emotions and requests, but also human emotional states.

[0414] "Means for acquiring motion" refers to a device or method that detects the movement of an animal's body and collects that information as data.

[0415] "Means for acquiring animal sounds" refers to a device or method for collecting animal sounds as audio data.

[0416] "Means for analyzing and identifying an animal's needs or emotions" refers to a device or method that analyzes acquired data on an animal's movements and vocalizations and uses that data to clarify the animal's desires and emotions.

[0417] "Means for analyzing human emotional states" refers to a device or method that analyzes audio and video data obtained from a user to identify their current emotional state.

[0418] "Means of notifying humans of requests or emotions" refers to a device or method that conveys information about emotions or requests analyzed from animals to a user.

[0419] "Means of converting human input into signals understandable to animals" refers to a device or method that converts commands or instructions from a user into signals in a format that animals can understand and respond to.

[0420] "Means for transmitting signals to animals" refers to a device or method that actually transmits the converted signals to animals to achieve a desired interaction.

[0421] The server receives data transmitted from terminals that capture animal movements and sounds, and uses AI that combines image recognition and speech recognition technologies to analyze it. Specifically, it uses OpenCV and TensorFlow to analyze animal movements and facial expressions, and Google Cloud AI and Microsoft Azure AI to process the audio data. This makes it possible to identify the animal's needs and emotions.

[0422] Furthermore, to analyze human emotional states, the server receives video or audio data transmitted by the user and analyzes it using a generative AI model. This allows the user's emotions to be identified, enabling the output signal to the animal to be appropriately adjusted. The Google Speech-to-Text API could be used as the voice analysis technology.

[0423] The device is equipped with a camera and microphone to efficiently collect information about animals and users. Using these devices, the device transmits information such as the animals' movements and sounds, as well as the user's facial expressions and voice tone, to a server.

[0424] This system allows users to receive notifications about the animal's condition and needs, and take necessary actions. Furthermore, user instructions are analyzed on the server and transmitted in a format recognizable to the animal, enabling natural communication.

[0425] For example, if a pet is lonely when its owner returns home tired, the server can sense the user's fatigue, generate a calming voice signal, and deliver it to the pet via the device. This allows the pet to feel secure and the owner to remain relaxed. Other examples of input prompts for the generating AI model include, "Analyze the pet's condition and suggest ways to communicate that will help the owner relax," or "Please tell me how to alleviate the dog's loneliness."

[0426] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0427] Step 1:

[0428] The device uses its camera and microphone to capture animal movements and sounds, as well as the user's facial expressions and voice. It collects animal video and audio data, and user video and audio data as input. This data is then sent directly to the server.

[0429] Step 2:

[0430] The server receives video and audio data of animals sent from the terminal. Using video recognition technologies such as OpenCV and TensorFlow, it analyzes the animals' movements and facial expressions, and uses Google Cloud AI to identify the animals' needs and emotions from the audio data. As output, it generates data on the identified needs and emotions of the animals.

[0431] Step 3:

[0432] The server receives video and audio data from the user transmitted from the terminal. This data is processed by a generative AI model to identify the user's emotional state. The output is data related to the user's emotional state.

[0433] Step 4:

[0434] The server integrates the animal's emotional data obtained in step 2 with the user's emotional data obtained in step 3 to generate appropriate output signals for the animal. Specifically, it creates voice signals and behavioral commands that will reassure the pet. The output of this step is signal data for the animal.

[0435] Step 5:

[0436] The terminal receives signal data for animals transmitted from the server. Based on this, it outputs audio signals and behavioral commands to the animals using speakers, LED displays, etc.

[0437] Step 6:

[0438] Users can monitor the animal's condition and reactions in real time and input additional instructions into the terminal as needed. This process of sending user instructions back to the server and then relaying them to the animal in an optimized form is repeated.

[0439] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0440] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0441] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0442] [Third Embodiment]

[0443] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0444] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0445] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0446] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0447] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0448] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0449] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0450] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0451] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0452] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0453] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0454] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0455] The system of this invention is designed to enable two-way communication between animals and humans. The system operates between a server, a terminal, and a user, each playing a specific role.

[0456] Server Role

[0457] The server plays a central role in receiving and analyzing animal movement and vocalization data transmitted from terminals. The server is equipped with an AI module that integrates video recognition and speech recognition technologies. Using video recognition technology, the server analyzes video footage of animals and extracts characteristic movements and facial expressions. Speech recognition technology converts animal vocalizations into frequency spectra and analyzes them to identify the animals' emotions and requests.

[0458] Based on these analysis results, the server identifies the animal's needs and emotions and notifies the user of the results. If the user enters instructions or messages they want to give to the animal, the server converts the content into signals that the animal can understand. This signal conversion includes customization based on the animal's species and individual needs.

[0459] Terminal role

[0460] The terminal functions as a device for collecting animal movements and sounds. Equipped with a camera and microphone, the terminal captures animal movements and sounds in real time and transmits the data to a server. Analysis results and notifications are displayed to the user visually or audibly on the terminal. It also has the function of receiving signals in response to user input and transmitting them to the animals.

[0461] User roles

[0462] The user is the one who understands the animal's condition and takes appropriate action through this system. By receiving notifications of the animal's condition via the terminal, the user can understand the animal's needs and emotions. Furthermore, the user can input instructions and questions for the animal into the terminal, and this information is converted into signals appropriate for the animal by the server and transmitted.

[0463] Specific example

[0464] As a concrete example, consider a case where a dog in a household is having difficulty communicating. The user points the device at the dog and records its actions and barks. The server analyzes this data, and if it determines that the dog is feeling anxious, it immediately notifies the user. The user then inputs an appropriate voice command, which the server converts into a short, easy-to-understand voice message and transmits through the device. Through this process, the user can understand what the dog is feeling and take appropriate action to reassure it.

[0465] Thus, the present invention provides an environment in which animals and humans can understand each other, and offers an effective means to realize higher quality pet care and animal research.

[0466] The following describes the processing flow.

[0467] Step 1:

[0468] The device captures animal movements with a camera and records animal sounds with a microphone. This data is collected in real time, compressed, and then transmitted to a server via a communication line.

[0469] Step 2:

[0470] The server receives video and audio data transmitted from the terminal. Video recognition AI analyzes the animal's movement data and extracts movement patterns and facial expressions. Additionally, audio recognition AI analyzes the animal's vocalizations and analyzes the characteristics of its timbre and tone.

[0471] Step 3:

[0472] The server identifies the animal's needs and emotions based on the analysis results. It integrates video and audio results to make a comprehensive judgment about the animal's state. For example, "tail wagging" + "loud vocalizations" might be identified as "excited."

[0473] Step 4:

[0474] The server generates a text notification detailing the status and requests of the identified animal. This notification is then sent from the server to the terminal.

[0475] Step 5:

[0476] The device displays notifications received from the server to the user. It informs the user of the animal's status through visual displays and audio alerts.

[0477] Step 6:

[0478] The user enters instructions or messages they want to convey to the animal into the device. After the input is complete, the content is sent to the server.

[0479] Step 7:

[0480] The server uses language generation AI to convert user input into signals that animals can understand. These signals can be customized to suit the animal species and individual characteristics.

[0481] Step 8:

[0482] The terminal receives signals transmitted from the server and transmits them to the animals. It plays a role in conveying instructions to the animals using voice output and vibration devices.

[0483] This series of steps enables effective communication between the animal and the user, allowing for responses that are tailored to the animal's needs and emotions.

[0484] (Example 1)

[0485] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0486] In animal-human communication, accurately understanding an animal's emotions and needs, and transmitting information to the animal in a way that humans can understand, is extremely difficult. Furthermore, conventional technologies struggle to perform appropriate signal conversion according to the animal species and individual, resulting in a lack of means for efficient two-way communication.

[0487] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0488] In this invention, the server includes means for acquiring animal movements, means for acquiring animal vocalizations, and means for customizing signal conversion according to the type and individual animal. This enables accurate analysis of the animal's emotions and needs, and facilitates smooth communication between animals and humans based on this analysis.

[0489] "Means for acquiring animal movements" refers to technologies that detect an animal's posture, movement, and behavior using electronic sensors and cameras, and collect that information as digital data.

[0490] "Methods for acquiring animal sounds" refer to technologies that record sounds emitted by animals using acoustic sensors such as microphones and acquire those sounds as digital data.

[0491] An "information processing system" is a system consisting of programs and hardware for analyzing collected digital data, and is equipped with algorithms for determining the emotions and needs of animals.

[0492] "Information transmission means" refers to a system for communicating the analyzed results to humans, and is a technology that provides data to users through screen display, audio output, or other audiovisual means.

[0493] An "information conversion means" is a system for converting instructions provided by a user into signals in a format that animals can understand, and it is equipped with a variety of signal conversion algorithms.

[0494] "Information output means" refers to devices or systems for transmitting converted signals to animals, and is a technology that outputs signals by means such as sound, light, or vibration.

[0495] "Signal conversion customization means" refers to methods and techniques for performing signal conversion that takes into account the individual characteristics of the animal species and individual.

[0496] Modes for carrying out the invention

[0497] The system of this invention enables two-way communication between animals and humans through the cooperation of a server, a terminal, and a user.

[0498] Server Role

[0499] The server plays a central role in receiving and analyzing animal movement and vocalization data. Specifically, the server is equipped with an AI module that uses video recognition technology to identify animal movements and speech recognition technology to convert animal vocalizations into frequency spectra. These analysis techniques are implemented using open-source AI frameworks and dedicated hardware accelerators. For example, the server analyzes animal videos, extracts specific facial expressions and movements, and then analyzes the animal's emotions based on that.

[0500] Terminal role

[0501] The terminal functions as a device for collecting animal movements and sounds. Equipped with a camera and microphone, the terminal captures animal movements and sounds in real time and transmits the data to a server. The terminal has the function of notifying the user of the analysis results visually or audibly, and also transmits signals to convey user instructions to the animal. For example, the terminal records the movement of a dog's tail and its barks, and smoothly transmits that information to the server.

[0502] User roles

[0503] The user is the one who understands the animal's condition through the system and takes appropriate action. The user can use a terminal to grasp the animal's emotions and needs. The user also provides instructions and messages to the animal, and this information is converted into signals by the server and transmitted to the animal. For example, if the user enters "calm down" into the terminal, the server converts it into a format that the animal can understand and transmits it.

[0504] Specific example

[0505] For example, if a dog is showing signs of anxiety at home, the user can use a device to capture the dog's movements and barks. The server analyzes the data and, if it determines that the dog is feeling anxious, notifies the user. The user can then input "play some cheerful music" into the device, and the server converts this input into an audio signal that the dog can easily understand and plays it through the device. Through this process, the user can reassure the dog and deepen their mutual understanding.

[0506] As an example of a prompt to the generative AI model, we will use text such as, "If a dog is feeling anxious, how can I reassure it?"

[0507] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0508] Step 1:

[0509] The device uses a camera and microphone to capture animal movements and sounds in real time. Input consists of visual and auditory information from the animal. Output is the process of generating this information as digital data and converting it into data packets.

[0510] Step 2:

[0511] The terminal transmits the acquired digital data to the server using wireless communication. The input is the animal's movement data and vocalization data generated in step 1. The output is a notification that the data transmission to the server is complete.

[0512] Step 3:

[0513] The server uses an AI module to analyze the received motion data and vocal data. The input is digital data sent from the terminal. The output is the result of identifying the animal's characteristic movements, facial expressions, emotions, and requests. In this process, the server performs specific actions such as analyzing movements with a video processing algorithm and analyzing vocalizations with an audio processing algorithm.

[0514] Step 4:

[0515] The server identifies the animal's emotions and needs based on the analysis results and generates a message to notify the user. The input is the analysis results from step 3. The output is the notification message for the user. This helps the user understand what the animal wants.

[0516] Step 5:

[0517] The user inputs instructions and messages for the animal into the terminal based on notifications from the server. The input is the user's instructions. The output is displayed on the terminal as specific instruction data to be conveyed to the animal.

[0518] Step 6:

[0519] The server converts the user's input into signals that the animal can understand. The input is the user's instruction data from step 5. The output is the data converted into signals appropriate for the animal species and individual. At this stage, the server performs an operation to convert the instructions into a format suitable for the animal, such as voice or vibration.

[0520] Step 7:

[0521] The terminal transmits the signal converted by the server to the animal. The input is the converted signal data from step 6. The output is a signal emitted in a format that the animal can receive. Specific actions include playing voice commands using the terminal's speaker.

[0522] (Application Example 1)

[0523] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0524] When keeping animals, it is difficult for owners to accurately understand their animals' emotions and needs, and there is a particular need to monitor pet well-being in real time. This can lead to an inability to respond quickly and appropriately to the anxiety and stress the animal is experiencing, potentially compromising animal welfare. There is a need for a system that can solve this problem and enable better relationships between animals and humans.

[0525] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0526] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, means for notifying humans of the animal's requests or emotions based on the analysis results, and means for managing the animal's well-being for humans. This allows humans to understand the animal's state and well-being in real time and take appropriate action.

[0527] "Means for acquiring animal movements" refers to devices that capture the body movements and postures of animals using sensors such as cameras.

[0528] "Means for acquiring animal sounds" refers to a device that collects sounds made by animals using microphones or similar means and transmits that audio data to a server.

[0529] "Means for analyzing actions and vocalizations to identify an animal's needs or emotions" refers to algorithms or software that analyze acquired action and vocal data to determine what an animal wants or what emotions it is experiencing.

[0530] "Means of notifying humans of an animal's needs or emotions" refers to devices or interfaces that visualize or audibly communicate the analyzed state of an animal to humans.

[0531] "Means of converting human input into signals understandable to animals" refers to a system or program that transforms commands or messages input by humans into a form that animals can understand.

[0532] "Means of transmitting signals to animals" refers to devices that transmit converted signals to animals, often in the form of auditory or visual stimuli.

[0533] A "means of managing animal well-being" refers to a system that continuously evaluates changes in an animal's emotional state and needs, determines its level of well-being, and provides information to the owner.

[0534] This system works in conjunction with servers, terminals, and users to facilitate smooth communication between animals and humans.

[0535] The server plays a central role in this system, analyzing behavioral and vocal data acquired from animals. This utilizes an AI module that integrates image and speech recognition technologies. Specifically, TensorFlow and PyTorch are used to train and infer machine learning models for identifying animal behavior and emotional states. The server uses the analysis results to identify the animals' emotions and needs and notifies the user of this information.

[0536] The terminal is an interface device for acquiring animal movements and vocalizations. It is equipped with a camera and microphone and transmits information to the server in real time. The terminal receives the analysis results and displays them to the user visually or verbally. The terminal also receives instructions that the user wants to send to the animal, forwards them to the server, and outputs them in a format that the animal can understand.

[0537] The user plays a role in understanding the pet's emotions and needs and responding appropriately. The user receives notifications about the animal's status through the device and can input messages and instructions for the animal. These instructions are converted into signals that the animal can easily understand by the server and transmitted via the device. This allows the user to manage their pet's well-being and provide better pet care.

[0538] As a concrete example, consider a scenario where a pet care robot detects unusual barking from a dog while the user is away. The server determines that "the dog is bored" and notifies the user of appropriate action. When the user inputs the command "give the dog a toy" from their smartphone, the robot provides the dog with a toy, relieving its boredom. An example of a prompt message could be, "Please suggest ways to reduce the dog's anxiety."

[0539] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0540] Step 1:

[0541] The device captures animal movements with a camera and records animal sounds with a microphone. The input consists of video and audio data. This data is transmitted to the server in real time. The output is raw data for analysis by the server.

[0542] Step 2:

[0543] The server analyzes the received motion data using video recognition technology. Specifically, it divides the acquired video into frames and extracts characteristic movements and postures of the animal from each frame. In this process, a generative AI model such as TensorFlow is used to determine the animal's motion patterns. The output is the analysis result regarding the animal's movements.

[0544] Step 3:

[0545] The server processes the received audio data using speech recognition technology. The audio data is converted into a frequency spectrum, and features related to emotions and requests are extracted from the speech. This process uses, for example, a generative AI model using PyTorch. The output is the result of analyzing animal speech.

[0546] Step 4:

[0547] The server integrates motion analysis results and voice analysis results to identify the animal's emotions and needs. This uses data fusion technology to accurately predict the animal's emotional state from each analysis result. The output is an evaluation result regarding the animal's state.

[0548] Step 5:

[0549] The server notifies the terminal of the animal's emotions and needs extracted through analysis. The terminal converts this information into a visual or audio format for display on the user interface. The output is visualized or audio information for the user to understand.

[0550] Step 6:

[0551] The user checks the animal's condition through the terminal and inputs appropriate instructions into the terminal. These instructions are intended to prompt the animal's response and behavior in response to its condition. The input consists of instructions from the user.

[0552] Step 7:

[0553] The server receives user instructions and converts them into signals that animals can understand. This conversion process is customized according to the animal species and individual differences. The output is a signal for communication with the animal.

[0554] Step 8:

[0555] The device transmits the converted signal to the animal. This allows the animal to understand instructions from humans and take corresponding actions. The output is the signal the animal hears.

[0556] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0557] This invention provides smoother and more effective interaction by incorporating an emotion engine that recognizes user emotions into a system that enables two-way communication between animals and humans. The system mainly consists of three components: a server, a terminal, and a user, with each component playing a specific role.

[0558] Server Role

[0559] The server functions as the main module, receiving and analyzing animal behavior and vocalization data transmitted from the terminal. AI combining video and speech recognition technologies extracts characteristics of the animal's movements, facial expressions, and vocalizations to identify the animal's needs and emotions. Furthermore, a newly integrated emotion engine analyzes the user's video or audio data received from the terminal to assess the user's current emotional state. Based on these results, the server adjusts the output signal to the animal.

[0560] Terminal role

[0561] The terminal is a device for collecting animal and user movements and sounds. The terminal takes photos of animals and records audio, while simultaneously capturing the user's facial expressions and voice pitch, and sending this data to the server. This allows the server to analyze the data, including the user's emotional state.

[0562] User roles

[0563] The user is the agent who understands the animal's condition and needs through this system and takes the necessary actions. The user receives notifications about the animal's condition and emotions via a terminal, and can also input instructions for the animal based on their own emotions. The commands entered by the user are analyzed by the server's emotion engine and transmitted to the animal in the most optimal way.

[0564] Specific example

[0565] Consider a scenario in a home where a dog is seeking the user's attention. The device captures the dog's movements and barks, and also collects the user's facial expressions and voice. The server analyzes the dog's data and determines that the dog "wants attention," while simultaneously analyzing the user's video and recognizing that the user is "tired." Based on these two analysis results, the server transmits appropriate signals and sounds to the dog through the device to help it feel safe without becoming overly agitated. As a result, the dog feels somewhat safer, allowing the user to remain relaxed as well.

[0566] This system utilizes an emotion engine to provide feedback to animals that takes the user's emotions into account, resulting in more natural and personalized communication. This makes it possible to further deepen the relationship between animals and humans.

[0567] The following describes the processing flow.

[0568] Step 1:

[0569] The device captures animal movements with a camera and records animal sounds with a microphone. Furthermore, it collects the user's facial expressions and voice, and simultaneously prepares this data for transmission to a server.

[0570] Step 2:

[0571] The server receives motion, vocalization, user facial expressions, and voice data transmitted from the terminal. The animal's motion and vocalizations are analyzed using image recognition AI and voice recognition AI to identify the animal's needs and emotions.

[0572] Step 3:

[0573] The server analyzes the user's facial expressions and voice data received using an emotion engine to evaluate the user's emotional state. For example, it might determine whether the user is "tired" or "relaxed."

[0574] Step 4:

[0575] The server generates the optimal response based on the animal's needs and the user's emotional state. It converts instructions and messages to the animal into signals and voices that take the user's emotions into account.

[0576] Step 5:

[0577] The device receives the converted signal sent from the server and communicates instructions to the animal through appropriate voice output or vibration. This allows the animal to understand the user's intentions.

[0578] Step 6:

[0579] The user observes the animal's responses through a terminal and enters additional instructions as needed. The server then analyzes this input again and continues to generate adaptive responses.

[0580] This process enables smooth, emotion-conscious communication between animals and users.

[0581] (Example 2)

[0582] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0583] Modern pet communication systems have limitations in their ability to recognize animal behavior and needs, making it difficult to achieve smooth communication between humans and animals. Furthermore, they often fail to provide feedback that takes into account human emotional states, potentially leading to insufficient interaction with animals. Therefore, a system is needed that enables optimal communication for both animals and humans, fostering deeper bonds.

[0584] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0585] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, and means for acquiring human emotional states. This makes it possible to comprehensively analyze the movements and emotional states of humans and animals and generate feedback that facilitates mutual understanding.

[0586] "Means of acquiring motion" refers to technologies for detecting the body movements of animals and collecting that data.

[0587] "Means of acquiring animal sounds" refers to technologies for recording or analyzing sounds emitted by animals.

[0588] "Means for acquiring human emotional states" refers to technologies that detect the characteristics of human facial expressions and voices and identify those emotional states.

[0589] "Means of analysis" refers to techniques for identifying the intentions and emotions behind animal behavior based on collected animal actions and vocalizations.

[0590] "Means of notification" refers to technologies for conveying analyzed information to humans.

[0591] "Means of conversion" refers to technologies for converting instructions or inputs from humans into a format that animals can understand.

[0592] "Means of transmission" refers to technologies used to transmit signals and information to animals.

[0593] A "data recording device" refers to a device that stores the results of analyses of animal movements and vocalizations, and uses them to improve the accuracy of future analyses.

[0594] "Learning methods" refer to machine learning techniques used to improve the accuracy of analysis based on recorded data.

[0595] "Notification means" refers to technologies for quickly informing about abnormalities or emergencies.

[0596] This invention provides a smoother and more effective two-way communication system with animals by taking into account the emotional state of the human being. The system mainly consists of three components: a server, a terminal, and a user, each playing a specific role.

[0597] Server Role

[0598] The server is the main module that receives and analyzes animal behavior and vocal data transmitted from the terminal. The server uses a generative AI model to analyze animal behavior and vocalizations, identifying the animal's needs and emotions. Furthermore, the server's emotion engine analyzes human video and audio data received from the terminal, evaluating the user's emotional state. Based on this information, the server generates appropriate feedback for the animal and transmits it via the terminal.

[0599] Terminal role

[0600] The device is equipped with sensors, cameras, and microphones to collect animal movements and sounds, as well as the user's facial expressions and voice. The device transmits this data to a server, allowing the server to understand the state of both the animal and the user. This ensures smoother overall system operation and improves the user experience.

[0601] User roles

[0602] The user is the one who understands the animal's condition and needs through the device and takes the necessary actions. The user can also input specifications that reflect their own emotions into the device. This input is analyzed by the server and reflected as optimal feedback to the animal. This allows the user to interact with the animal in a way that suits their own life circumstances.

[0603] Specific example

[0604] Consider a real-world scenario where a dog in the home wants attention. The device captures the dog's movements and barks, and also collects the user's facial expressions and voice. The server uses a generative AI model to identify that the dog "wants attention" while recognizing that the user is "tired." As a result, a voice message to reassure the dog is generated and played back to the dog through the device. This system achieves more personalized and natural communication because the emotion engine provides feedback to the animal that takes the user's emotions into account.

[0605] Examples of prompts when using a generative AI model include:

[0606] "Analyze the dog's barks to tell us what the pet wants. Also, determine the user's current emotional state and suggest the best feedback for the pet."

[0607] These are some possibilities.

[0608] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0609] Step 1:

[0610] The device collects animal movements and sounds. Using a camera and microphone, it captures the dog's movements and barks, and sensors also collect the user's facial expressions and voice. This data is prepared to be sent to a server as it is necessary to identify the animal's needs and emotions.

[0611] Step 2:

[0612] The server receives animal and user data transmitted from the terminal. The video and audio data of the animals obtained as input are analyzed by a generative AI model to identify the characteristics of the animals' movements and determine their emotional states, such as "seeking attention." Similarly, the user's video and audio data are analyzed by an emotion engine to determine the user's emotional state, such as whether they are "tired." The data is processed by an AI algorithm and output as an emotion analysis result in text format.

[0613] Step 3:

[0614] The server generates appropriate feedback for the animal based on the analysis results. For example, it uses a prompt to generate a reassuring message for the dog, and a generative AI model creates an audio message. This message is designed to calm the animal's behavior. The generated audio data is then sent to the device.

[0615] Step 4:

[0616] The device plays audio data received from the server. This provides animals with audio that serves as a guide for their actions, and is expected to have effects such as making the animals feel safe. The device adjusts the volume and timing of the playback according to the animal's response, supporting effective communication.

[0617] Step 5:

[0618] The user receives feedback from the device and uses that feedback to decide how to interact with the animal. The user inputs instructions into the device that reflect their own emotions and circumstances, and these instructions are then analyzed by the server and reflected as optimal feedback for the animal. This allows the user to take actions appropriate to their situation.

[0619] (Application Example 2)

[0620] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0621] To facilitate smooth communication between animals and humans, it is necessary to understand each other's emotions and desires and respond appropriately based on that understanding. However, conventional systems only analyze signals from animals to humans and do not take human emotions into consideration, thus failing to achieve sufficient communication. Therefore, it is urgent to consider the emotions of both parties and achieve consistent interaction.

[0622] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0623] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, and means for analyzing human emotional states. This enables two-way interaction that takes into account not only animal emotions and requests, but also human emotional states.

[0624] "Means for acquiring motion" refers to a device or method that detects the movement of an animal's body and collects that information as data.

[0625] "Means for acquiring animal sounds" refers to a device or method for collecting animal sounds as audio data.

[0626] "Means for analyzing and identifying an animal's needs or emotions" refers to a device or method that analyzes acquired data on an animal's movements and vocalizations and uses that data to clarify the animal's desires and emotions.

[0627] "Means for analyzing a person's emotional state" refers to a device or method that analyzes audio and video data obtained from a user to identify their current emotional state.

[0628] "Means of notifying humans of requests or emotions" refers to devices or methods that convey information about emotions or requests analyzed from animals to a user.

[0629] "Means of converting human input into signals understandable to animals" refers to a device or method that converts commands or instructions from a user into signals in a format that animals can understand and respond to.

[0630] "Means for transmitting signals to animals" refers to a device or method that actually transmits the converted signals to animals to achieve a desired interaction.

[0631] The server receives data transmitted from terminals that capture animal movements and sounds, and uses AI that combines image recognition and speech recognition technologies to analyze it. Specifically, it uses OpenCV and TensorFlow to analyze animal movements and facial expressions, and Google Cloud AI and Microsoft Azure AI to process the audio data. This makes it possible to identify the animal's needs and emotions.

[0632] Furthermore, to analyze human emotional states, the server receives video or audio data transmitted by the user and analyzes it using a generative AI model. This allows the user's emotions to be identified, enabling the output signal to the animal to be appropriately adjusted. The Google Speech-to-Text API could be used as the voice analysis technology.

[0633] The device is equipped with a camera and microphone to efficiently collect information about animals and users. Using these devices, the device transmits information such as the animals' movements and sounds, as well as the user's facial expressions and voice tone, to a server.

[0634] This system allows users to receive notifications about the animal's condition and needs, and take necessary actions. Furthermore, user instructions are analyzed on the server and transmitted in a format recognizable to the animal, enabling natural communication.

[0635] For example, if a pet is lonely when its owner returns home tired, the server can sense the user's fatigue, generate a calming voice signal, and deliver it to the pet via the device. This allows the pet to feel secure and the owner to remain relaxed. Other examples of input prompts for the generating AI model include, "Analyze the pet's condition and suggest ways to communicate that will help the owner relax," or "Please tell me how to alleviate the dog's loneliness."

[0636] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0637] Step 1:

[0638] The device uses its camera and microphone to capture animal movements and sounds, as well as the user's facial expressions and voice. It collects animal video and audio data, and user video and audio data as input. This data is then sent directly to the server.

[0639] Step 2:

[0640] The server receives video and audio data of animals sent from the terminal. Using video recognition technologies such as OpenCV and TensorFlow, it analyzes the animals' movements and facial expressions, and uses Google Cloud AI to identify the animals' needs and emotions from the audio data. As output, it generates data on the identified needs and emotions of the animals.

[0641] Step 3:

[0642] The server receives video and audio data from the user transmitted from the terminal. This data is processed by a generative AI model to identify the user's emotional state. The output is data related to the user's emotional state.

[0643] Step 4:

[0644] The server integrates the animal's emotional data obtained in step 2 with the user's emotional data obtained in step 3 to generate appropriate output signals for the animal. Specifically, it creates voice signals and behavioral commands that will reassure the pet. The output of this step is signal data for the animal.

[0645] Step 5:

[0646] The terminal receives signal data for animals transmitted from the server. Based on this, it outputs audio signals and behavioral commands to the animals using speakers, LED displays, etc.

[0647] Step 6:

[0648] Users can monitor the animal's condition and reactions in real time and input additional instructions into the terminal as needed. This process of sending user instructions back to the server and then relaying them to the animal in an optimized form is repeated.

[0649] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0650] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0651] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0652] [Fourth Embodiment]

[0653] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0654] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0655] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0656] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0657] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0658] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0659] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0660] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0661] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0662] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0663] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0664] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0665] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0666] The system of this invention is designed to enable two-way communication between animals and humans. The system operates between a server, a terminal, and a user, each playing a specific role.

[0667] Server Role

[0668] The server plays a central role in receiving and analyzing animal movement and vocalization data transmitted from terminals. The server is equipped with an AI module that integrates video recognition and speech recognition technologies. Using video recognition technology, the server analyzes video footage of animals and extracts characteristic movements and facial expressions. Speech recognition technology converts animal vocalizations into frequency spectra and analyzes them to identify the animals' emotions and requests.

[0669] Based on these analysis results, the server identifies the animal's needs and emotions and notifies the user of the results. If the user enters instructions or messages they want to give to the animal, the server converts the content into signals that the animal can understand. This signal conversion includes customization based on the animal's species and individual needs.

[0670] Terminal role

[0671] The terminal functions as a device for collecting animal movements and sounds. Equipped with a camera and microphone, the terminal captures animal movements and sounds in real time and transmits the data to a server. Analysis results and notifications are displayed to the user visually or audibly on the terminal. It also has the function of receiving signals in response to user input and transmitting them to the animals.

[0672] User roles

[0673] The user is the one who understands the animal's condition and takes appropriate action through this system. By receiving notifications of the animal's condition via the terminal, the user can understand the animal's needs and emotions. Furthermore, the user can input instructions and questions for the animal into the terminal, and this information is converted into signals appropriate for the animal by the server and transmitted.

[0674] Specific example

[0675] As a concrete example, consider a case where a dog in a household is having difficulty communicating. The user points the device at the dog and records its actions and barks. The server analyzes this data, and if it determines that the dog is feeling anxious, it immediately notifies the user. The user then inputs an appropriate voice command, which the server converts into a short, easy-to-understand voice message and transmits through the device. Through this process, the user can understand what the dog is feeling and take appropriate action to reassure it.

[0676] Thus, the present invention provides an environment in which animals and humans can understand each other, and offers an effective means to realize higher quality pet care and animal research.

[0677] The following describes the processing flow.

[0678] Step 1:

[0679] The device captures animal movements with a camera and records animal sounds with a microphone. This data is collected in real time, compressed, and then transmitted to a server via a communication line.

[0680] Step 2:

[0681] The server receives video and audio data transmitted from the terminal. Video recognition AI analyzes the animal's movement data and extracts movement patterns and facial expressions. Additionally, audio recognition AI analyzes the animal's vocalizations and analyzes the characteristics of its timbre and tone.

[0682] Step 3:

[0683] The server identifies the animal's needs and emotions based on the analysis results. It integrates video and audio results to make a comprehensive judgment about the animal's state. For example, "tail wagging" + "loud vocalizations" might be identified as "excited."

[0684] Step 4:

[0685] The server generates a text notification detailing the status and requests of the identified animal. This notification is then sent from the server to the terminal.

[0686] Step 5:

[0687] The device displays notifications received from the server to the user. It informs the user of the animal's status through visual displays and audio alerts.

[0688] Step 6:

[0689] The user enters instructions or messages they want to convey to the animal into the device. After the input is complete, the content is sent to the server.

[0690] Step 7:

[0691] The server uses language generation AI to convert user input into signals that animals can understand. These signals can be customized to suit the animal species and individual characteristics.

[0692] Step 8:

[0693] The terminal receives signals transmitted from the server and transmits them to the animals. It plays a role in conveying instructions to the animals using voice output and vibration devices.

[0694] This series of steps enables effective communication between the animal and the user, allowing for responses that are tailored to the animal's needs and emotions.

[0695] (Example 1)

[0696] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0697] In animal-human communication, accurately understanding an animal's emotions and needs, and transmitting information to the animal in a way that humans can understand, is extremely difficult. Furthermore, conventional technologies struggle to perform appropriate signal conversion according to the animal species and individual, resulting in a lack of means for efficient two-way communication.

[0698] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0699] In this invention, the server includes means for acquiring animal movements, means for acquiring animal vocalizations, and means for customizing signal conversion according to the type and individual animal. This enables accurate analysis of the animal's emotions and needs, and facilitates smooth communication between animals and humans based on this analysis.

[0700] "Means for acquiring animal movements" refers to technologies that detect an animal's posture, movement, and behavior using electronic sensors and cameras, and collect that information as digital data.

[0701] "Methods for acquiring animal sounds" refer to technologies that record sounds emitted by animals using acoustic sensors such as microphones and acquire those sounds as digital data.

[0702] An "information processing system" is a system consisting of programs and hardware for analyzing collected digital data, and is equipped with algorithms for determining the emotions and needs of animals.

[0703] "Information transmission means" refers to a system for communicating the analyzed results to humans, and is a technology that provides data to users through screen display, audio output, or other audiovisual means.

[0704] An "information conversion means" is a system for converting instructions provided by a user into signals in a format that animals can understand, and it is equipped with a variety of signal conversion algorithms.

[0705] "Information output means" refers to devices or systems for transmitting converted signals to animals, and is a technology that outputs signals by means such as sound, light, or vibration.

[0706] "Signal conversion customization means" refers to methods and techniques for performing signal conversion that takes into account the individual characteristics of the animal species and individual.

[0707] Modes for carrying out the invention

[0708] The system of this invention enables two-way communication between animals and humans through the cooperation of a server, a terminal, and a user.

[0709] Server Role

[0710] The server plays a central role in receiving and analyzing animal movement and vocalization data. Specifically, the server is equipped with an AI module that uses video recognition technology to identify animal movements and speech recognition technology to convert animal vocalizations into frequency spectra. These analysis techniques are implemented using open-source AI frameworks and dedicated hardware accelerators. For example, the server analyzes animal videos, extracts specific facial expressions and movements, and then analyzes the animal's emotions based on that.

[0711] Terminal role

[0712] The terminal functions as a device for collecting animal movements and sounds. Equipped with a camera and microphone, the terminal captures animal movements and sounds in real time and transmits the data to a server. The terminal has the function of notifying the user of the analysis results visually or audibly, and also transmits signals to convey user instructions to the animal. For example, the terminal records the movement of a dog's tail and its barks, and smoothly transmits that information to the server.

[0713] User roles

[0714] The user is the one who understands the animal's condition through the system and takes appropriate action. The user can use a terminal to grasp the animal's emotions and needs. The user also provides instructions and messages to the animal, and this information is converted into signals by the server and transmitted to the animal. For example, if the user enters "calm down" into the terminal, the server converts it into a format that the animal can understand and transmits it.

[0715] Specific example

[0716] For example, if a dog is showing signs of anxiety at home, the user can use a device to capture the dog's movements and barks. The server analyzes the data and, if it determines that the dog is feeling anxious, notifies the user. The user can then input "play some cheerful music" into the device, and the server converts this input into an audio signal that the dog can easily understand and plays it through the device. Through this process, the user can reassure the dog and deepen their mutual understanding.

[0717] As an example of a prompt to the generative AI model, we will use text such as, "If a dog is feeling anxious, how can I reassure it?"

[0718] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0719] Step 1:

[0720] The device uses a camera and microphone to capture animal movements and sounds in real time. Input consists of visual and auditory information from the animal. Output is the process of generating this information as digital data and converting it into data packets.

[0721] Step 2:

[0722] The terminal transmits the acquired digital data to the server using wireless communication. The input is the animal's movement data and vocalization data generated in step 1. The output is a notification that the data transmission to the server is complete.

[0723] Step 3:

[0724] The server uses an AI module to analyze the received motion data and vocal data. The input is digital data sent from the terminal. The output is the result of identifying the animal's characteristic movements, facial expressions, emotions, and requests. In this process, the server performs specific actions such as analyzing movements with a video processing algorithm and analyzing vocalizations with an audio processing algorithm.

[0725] Step 4:

[0726] The server identifies the animal's emotions and needs based on the analysis results and generates a message to notify the user. The input is the analysis results from step 3. The output is the notification message for the user. This helps the user understand what the animal wants.

[0727] Step 5:

[0728] The user inputs instructions and messages for the animal into the terminal based on notifications from the server. The input is the user's instructions. The output is displayed on the terminal as specific instruction data to be conveyed to the animal.

[0729] Step 6:

[0730] The server converts the user's input into signals that the animal can understand. The input is the user's instruction data from step 5. The output is the data converted into signals appropriate for the animal species and individual. At this stage, the server performs an operation to convert the instructions into a format suitable for the animal, such as voice or vibration.

[0731] Step 7:

[0732] The terminal transmits the signal converted by the server to the animal. The input is the converted signal data from step 6. The output is a signal emitted in a format that the animal can receive. Specific actions include playing voice commands using the terminal's speaker.

[0733] (Application Example 1)

[0734] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0735] When keeping animals, it is difficult for owners to accurately understand their animals' emotions and needs, and there is a particular need to monitor pet well-being in real time. This can lead to an inability to respond quickly and appropriately to the anxiety and stress the animal is experiencing, potentially compromising animal welfare. There is a need for a system that can solve this problem and enable better relationships between animals and humans.

[0736] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0737] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, means for notifying humans of the animal's requests or emotions based on the analysis results, and means for managing the animal's well-being for humans. This allows humans to understand the animal's state and well-being in real time and take appropriate action.

[0738] "Means for acquiring animal movements" refers to devices that capture the body movements and postures of animals using sensors such as cameras.

[0739] "Means for acquiring animal sounds" refers to a device that collects sounds made by animals using microphones or similar means and transmits that audio data to a server.

[0740] "Means for analyzing actions and vocalizations to identify an animal's needs or emotions" refers to algorithms or software that analyze acquired action and vocal data to determine what an animal wants or what emotions it is experiencing.

[0741] "Means of notifying humans of an animal's needs or emotions" refers to devices or interfaces that visualize or audibly communicate the analyzed state of an animal to humans.

[0742] "Means of converting human input into signals understandable to animals" refers to a system or program that transforms commands or messages input by humans into a form that animals can understand.

[0743] "Means of transmitting signals to animals" refers to devices that transmit converted signals to animals, often in the form of auditory or visual stimuli.

[0744] A "means of managing animal well-being" refers to a system that continuously evaluates changes in an animal's emotional state and needs, determines its level of well-being, and provides information to the owner.

[0745] This system works in conjunction with servers, terminals, and users to facilitate smooth communication between animals and humans.

[0746] The server plays a central role in this system, analyzing behavioral and vocal data acquired from animals. This utilizes an AI module that integrates image and speech recognition technologies. Specifically, TensorFlow and PyTorch are used to train and infer machine learning models for identifying animal behavior and emotional states. The server uses the analysis results to identify the animals' emotions and needs and notifies the user of this information.

[0747] The terminal is an interface device for acquiring animal movements and vocalizations. It is equipped with a camera and microphone and transmits information to the server in real time. The terminal receives the analysis results and displays them to the user visually or verbally. The terminal also receives instructions that the user wants to send to the animal, forwards them to the server, and outputs them in a format that the animal can understand.

[0748] The user plays a role in understanding the pet's emotions and needs and responding appropriately. The user receives notifications about the animal's status through the device and can input messages and instructions for the animal. These instructions are converted into signals that the animal can easily understand by the server and transmitted via the device. This allows the user to manage their pet's well-being and provide better pet care.

[0749] As a concrete example, consider a scenario where a pet care robot detects unusual barking from a dog while the user is away. The server determines that "the dog is bored" and notifies the user of appropriate action. When the user inputs the command "give the dog a toy" from their smartphone, the robot provides the dog with a toy, relieving its boredom. An example of a prompt message could be, "Please suggest ways to reduce the dog's anxiety."

[0750] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0751] Step 1:

[0752] The device captures animal movements with a camera and records animal sounds with a microphone. The input consists of video and audio data. This data is transmitted to the server in real time. The output is raw data for analysis by the server.

[0753] Step 2:

[0754] The server analyzes the received motion data using video recognition technology. Specifically, it divides the acquired video into frames and extracts characteristic movements and postures of the animal from each frame. In this process, a generative AI model such as TensorFlow is used to determine the animal's motion patterns. The output is the analysis result regarding the animal's movements.

[0755] Step 3:

[0756] The server processes the received audio data using speech recognition technology. The audio data is converted into a frequency spectrum, and features related to emotions and requests are extracted from the speech. This process uses, for example, a generative AI model using PyTorch. The output is the result of analyzing animal speech.

[0757] Step 4:

[0758] The server integrates motion analysis results and voice analysis results to identify the animal's emotions and needs. This uses data fusion technology to accurately predict the animal's emotional state from each analysis result. The output is an evaluation result regarding the animal's state.

[0759] Step 5:

[0760] The server notifies the terminal of the animal's emotions and needs extracted through analysis. The terminal converts this information into a visual or audio format for display on the user interface. The output is visualized or audio information for the user to understand.

[0761] Step 6:

[0762] The user checks the animal's condition through the terminal and inputs appropriate instructions into the terminal. These instructions are intended to prompt the animal's response and behavior in response to its condition. The input consists of instructions from the user.

[0763] Step 7:

[0764] The server receives user instructions and converts them into signals that animals can understand. This conversion process is customized according to the animal species and individual differences. The output is a signal for communication with the animal.

[0765] Step 8:

[0766] The device transmits the converted signal to the animal. This allows the animal to understand instructions from humans and take corresponding actions. The output is the signal the animal hears.

[0767] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0768] This invention provides smoother and more effective interaction by incorporating an emotion engine that recognizes user emotions into a system that enables two-way communication between animals and humans. The system mainly consists of three components: a server, a terminal, and a user, with each component playing a specific role.

[0769] Server Role

[0770] The server functions as the main module, receiving and analyzing animal behavior and vocalization data transmitted from the terminal. AI combining video and speech recognition technologies extracts characteristics of the animal's movements, facial expressions, and vocalizations to identify the animal's needs and emotions. Furthermore, a newly integrated emotion engine analyzes the user's video or audio data received from the terminal to assess the user's current emotional state. Based on these results, the server adjusts the output signal to the animal.

[0771] Terminal role

[0772] The terminal is a device for collecting animal and user movements and sounds. The terminal takes photos of animals and records audio, while simultaneously capturing the user's facial expressions and voice pitch, and sending this data to the server. This allows the server to analyze the data, including the user's emotional state.

[0773] User roles

[0774] The user is the agent who understands the animal's condition and needs through this system and takes the necessary actions. The user receives notifications about the animal's condition and emotions via a terminal, and can also input instructions for the animal based on their own emotions. The commands entered by the user are analyzed by the server's emotion engine and transmitted to the animal in the most optimal way.

[0775] Specific example

[0776] Consider a scenario in a home where a dog is seeking the user's attention. The device captures the dog's movements and barks, and also collects the user's facial expressions and voice. The server analyzes the dog's data and determines that the dog "wants attention," while simultaneously analyzing the user's video and recognizing that the user is "tired." Based on these two analysis results, the server transmits appropriate signals and sounds to the dog through the device to help it feel safe without becoming overly agitated. As a result, the dog feels somewhat safer, allowing the user to remain relaxed as well.

[0777] This system utilizes an emotion engine to provide feedback to animals that takes the user's emotions into account, resulting in more natural and personalized communication. This makes it possible to further deepen the relationship between animals and humans.

[0778] The following describes the processing flow.

[0779] Step 1:

[0780] The device captures animal movements with a camera and records animal sounds with a microphone. Furthermore, it collects the user's facial expressions and voice, and simultaneously prepares this data for transmission to a server.

[0781] Step 2:

[0782] The server receives motion, vocalization, user facial expressions, and voice data transmitted from the terminal. The animal's motion and vocalizations are analyzed using image recognition AI and voice recognition AI to identify the animal's needs and emotions.

[0783] Step 3:

[0784] The server analyzes the user's facial expressions and voice data received using an emotion engine to evaluate the user's emotional state. For example, it might determine whether the user is "tired" or "relaxed."

[0785] Step 4:

[0786] The server generates the optimal response based on the animal's needs and the user's emotional state. It converts instructions and messages to the animal into signals and voices that take the user's emotions into account.

[0787] Step 5:

[0788] The device receives the converted signal sent from the server and communicates instructions to the animal through appropriate voice output or vibration. This allows the animal to understand the user's intentions.

[0789] Step 6:

[0790] The user observes the animal's responses through a terminal and enters additional instructions as needed. The server then analyzes this input again and continues to generate adaptive responses.

[0791] This process enables smooth, emotion-conscious communication between animals and users.

[0792] (Example 2)

[0793] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0794] Modern pet communication systems have limitations in their ability to recognize animal behavior and needs, making it difficult to achieve smooth communication between humans and animals. Furthermore, they often fail to provide feedback that takes into account human emotional states, potentially leading to insufficient interaction with animals. Therefore, a system is needed that enables optimal communication for both animals and humans, fostering deeper bonds.

[0795] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0796] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, and means for acquiring human emotional states. This makes it possible to comprehensively analyze the movements and emotional states of humans and animals and generate feedback that facilitates mutual understanding.

[0797] "Means of acquiring motion" refers to technologies for detecting the body movements of animals and collecting that data.

[0798] "Means of acquiring animal sounds" refers to technologies for recording or analyzing sounds emitted by animals.

[0799] "Means for acquiring human emotional states" refers to technologies that detect the characteristics of human facial expressions and voices and identify those emotional states.

[0800] "Means of analysis" refers to techniques for identifying the intentions and emotions behind animal behavior based on collected animal actions and vocalizations.

[0801] "Means of notification" refers to technologies for conveying analyzed information to humans.

[0802] "Means of conversion" refers to technologies for converting instructions or inputs from humans into a format that animals can understand.

[0803] "Means of transmission" refers to technologies used to transmit signals and information to animals.

[0804] A "data recording device" refers to a device that stores the results of analyses of animal movements and vocalizations, and uses them to improve the accuracy of future analyses.

[0805] "Learning methods" refer to machine learning techniques used to improve the accuracy of analysis based on recorded data.

[0806] "Notification means" refers to technologies for quickly informing about abnormalities or emergencies.

[0807] This invention provides a smoother and more effective two-way communication system with animals by taking into account the emotional state of the human being. The system mainly consists of three components: a server, a terminal, and a user, each playing a specific role.

[0808] Server Role

[0809] The server is the main module that receives and analyzes animal behavior and vocal data transmitted from the terminal. The server uses a generative AI model to analyze animal behavior and vocalizations, identifying the animal's needs and emotions. Furthermore, the server's emotion engine analyzes human video and audio data received from the terminal, evaluating the user's emotional state. Based on this information, the server generates appropriate feedback for the animal and transmits it via the terminal.

[0810] Terminal role

[0811] The device is equipped with sensors, cameras, and microphones to collect animal movements and sounds, as well as the user's facial expressions and voice. The device transmits this data to a server, allowing the server to understand the state of both the animal and the user. This ensures smoother overall system operation and improves the user experience.

[0812] User roles

[0813] The user is the one who understands the animal's condition and needs through the device and takes the necessary actions. The user can also input specifications that reflect their own emotions into the device. This input is analyzed by the server and reflected as optimal feedback to the animal. This allows the user to interact with the animal in a way that suits their own life circumstances.

[0814] Specific example

[0815] Consider a real-world scenario where a dog in the home wants attention. The device captures the dog's movements and barks, and also collects the user's facial expressions and voice. The server uses a generative AI model to identify that the dog "wants attention" while recognizing that the user is "tired." As a result, a voice message to reassure the dog is generated and played back to the dog through the device. This system achieves more personalized and natural communication because the emotion engine provides feedback to the animal that takes the user's emotions into account.

[0816] Examples of prompts when using a generative AI model include:

[0817] "Analyze the dog's barks to tell us what the pet wants. Also, determine the user's current emotional state and suggest the best feedback for the pet."

[0818] These are some possibilities.

[0819] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0820] Step 1:

[0821] The device collects animal movements and sounds. Using a camera and microphone, it captures the dog's movements and barks, and sensors also collect the user's facial expressions and voice. This data is prepared to be sent to a server as it is necessary to identify the animal's needs and emotions.

[0822] Step 2:

[0823] The server receives animal and user data transmitted from the terminal. The video and audio data of the animals obtained as input are analyzed by a generative AI model to identify the characteristics of the animals' movements and determine their emotional states, such as "seeking attention." Similarly, the user's video and audio data are analyzed by an emotion engine to determine the user's emotional state, such as whether they are "tired." The data is processed by an AI algorithm and output as an emotion analysis result in text format.

[0824] Step 3:

[0825] The server generates appropriate feedback for the animal based on the analysis results. For example, it uses a prompt to generate a reassuring message for the dog, and a generative AI model creates an audio message. This message is designed to calm the animal's behavior. The generated audio data is then sent to the device.

[0826] Step 4:

[0827] The device plays audio data received from the server. This provides animals with audio that serves as a guide for their actions, and is expected to have effects such as making the animals feel safe. The device adjusts the volume and timing of the playback according to the animal's response, supporting effective communication.

[0828] Step 5:

[0829] The user receives feedback from the device and uses that feedback to decide how to interact with the animal. The user inputs instructions into the device that reflect their own emotions and circumstances, and these instructions are then analyzed by the server and reflected as optimal feedback for the animal. This allows the user to take actions appropriate to their situation.

[0830] (Application Example 2)

[0831] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0832] To facilitate smooth communication between animals and humans, it is necessary to understand each other's emotions and desires and respond appropriately based on that understanding. However, conventional systems only analyze signals from animals to humans and do not take human emotions into consideration, thus failing to achieve sufficient communication. Therefore, it is urgent to consider the emotions of both parties and achieve consistent interaction.

[0833] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0834] In this invention, the server includes means for acquiring animal movements, means for acquiring animal sounds, and means for analyzing human emotional states. This enables two-way interaction that takes into account not only animal emotions and requests, but also human emotional states.

[0835] "Means for acquiring motion" refers to a device or method that detects the movement of an animal's body and collects that information as data.

[0836] "Means for acquiring animal sounds" refers to a device or method for collecting animal sounds as audio data.

[0837] "Means for analyzing and identifying an animal's needs or emotions" refers to a device or method that analyzes acquired data on an animal's movements and vocalizations and uses that data to clarify the animal's desires and emotions.

[0838] "Means for analyzing a person's emotional state" refers to a device or method that analyzes audio and video data obtained from a user to identify their current emotional state.

[0839] "Means of notifying humans of requests or emotions" refers to devices or methods that convey information about emotions or requests analyzed from animals to a user.

[0840] "Means of converting human input into signals understandable to animals" refers to a device or method that converts commands or instructions from a user into signals in a format that animals can understand and respond to.

[0841] "Means for transmitting signals to animals" refers to a device or method that actually transmits the converted signals to animals to achieve a desired interaction.

[0842] The server receives data transmitted from terminals that capture animal movements and sounds, and uses AI that combines image recognition and speech recognition technologies to analyze it. Specifically, it uses OpenCV and TensorFlow to analyze animal movements and facial expressions, and Google Cloud AI and Microsoft Azure AI to process the audio data. This makes it possible to identify the animal's needs and emotions.

[0843] Furthermore, to analyze human emotional states, the server receives video or audio data transmitted by the user and analyzes it using a generative AI model. This allows the user's emotions to be identified, enabling the output signal to the animal to be appropriately adjusted. The Google Speech-to-Text API could be used as the voice analysis technology.

[0844] The device is equipped with a camera and microphone to efficiently collect information about animals and users. Using these devices, the device transmits information such as the animals' movements and sounds, as well as the user's facial expressions and voice tone, to a server.

[0845] This system allows users to receive notifications about the animal's condition and needs, and take necessary actions. Furthermore, user instructions are analyzed on the server and transmitted in a format recognizable to the animal, enabling natural communication.

[0846] For example, if a pet is lonely when its owner returns home tired, the server can sense the user's fatigue, generate a calming voice signal, and deliver it to the pet via the device. This allows the pet to feel secure and the owner to remain relaxed. Other examples of input prompts for the generating AI model include, "Analyze the pet's condition and suggest ways to communicate that will help the owner relax," or "Please tell me how to alleviate the dog's loneliness."

[0847] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0848] Step 1:

[0849] The device uses its camera and microphone to capture animal movements and sounds, as well as the user's facial expressions and voice. It collects animal video and audio data, and user video and audio data as input. This data is then sent directly to the server.

[0850] Step 2:

[0851] The server receives video and audio data of animals sent from the terminal. Using video recognition technologies such as OpenCV and TensorFlow, it analyzes the animals' movements and facial expressions, and uses Google Cloud AI to identify the animals' needs and emotions from the audio data. As output, it generates data on the identified needs and emotions of the animals.

[0852] Step 3:

[0853] The server receives video and audio data from the user transmitted from the terminal. This data is processed by a generative AI model to identify the user's emotional state. The output is data related to the user's emotional state.

[0854] Step 4:

[0855] The server integrates the animal's emotional data obtained in step 2 with the user's emotional data obtained in step 3 to generate appropriate output signals for the animal. Specifically, it creates voice signals and behavioral commands that will reassure the pet. The output of this step is signal data for the animal.

[0856] Step 5:

[0857] The terminal receives signal data for animals transmitted from the server. Based on this, it outputs audio signals and behavioral commands to the animals using speakers, LED displays, etc.

[0858] Step 6:

[0859] Users can monitor the animal's condition and reactions in real time and input additional instructions into the terminal as needed. This process of sending user instructions back to the server and then relaying them to the animal in an optimized form is repeated.

[0860] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0861] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0862] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0863] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0864] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0865] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0866] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0867] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0868] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0869] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0870] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0871] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0872] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0873] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0874] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0875] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0876] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0877] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0878] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0879] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0880] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0881] The following is further disclosed regarding the embodiments described above.

[0882] (Claim 1)

[0883] Means for acquiring animal movements,

[0884] Means of obtaining animal sounds,

[0885] Means for analyzing the aforementioned actions and vocalizations to identify the animal's requests or emotions,

[0886] Based on the aforementioned analysis results, a means for notifying humans of an animal's needs or emotions,

[0887] A means of converting human input into signals that animals can understand,

[0888] Means for transmitting the aforementioned signal to an animal,

[0889] A system that includes this.

[0890] (Claim 2)

[0891] The system according to claim 1, further comprising a learning means for recording the results of analyzing animal movements and vocalizations in a database and improving the accuracy of the analysis.

[0892] (Claim 3)

[0893] The system according to claim 1, further comprising a notification means for promptly issuing an alarm when a specific anomaly or emergency is detected from the movements and sounds of an animal.

[0894] "Example 1"

[0895] (Claim 1)

[0896] Means for acquiring animal movements,

[0897] Means of obtaining animal sounds,

[0898] Information processing means for analyzing the aforementioned actions and vocalizations to identify the animal's requests or emotions,

[0899] Based on the aforementioned analysis results, an information transmission means for notifying a human of an animal's requests or emotions,

[0900] An information conversion means that converts human input instructions into signals that animals can understand,

[0901] Information output means for transmitting the converted signal to an animal,

[0902] A means of customizing signal conversion according to the type and individual of the animal,

[0903] A system that includes this.

[0904] (Claim 2)

[0905] The system according to claim 1, further comprising a learning processing means for recording the results of analyzing animal movements and vocalizations on a data storage medium and improving the accuracy of the analysis.

[0906] (Claim 3)

[0907] The system according to claim 1, further comprising a notification generation means for rapidly issuing an alarm when a specific abnormality or emergency condition is detected from the movements and sounds of an animal.

[0908] "Application Example 1"

[0909] (Claim 1)

[0910] Means for acquiring animal movements,

[0911] Means of obtaining animal sounds,

[0912] Means for analyzing the aforementioned actions and vocalizations to identify the animal's requests or emotions,

[0913] Based on the aforementioned analysis results, a means for notifying humans of an animal's needs or emotions,

[0914] A means of converting human input into signals that animals can understand,

[0915] Means for transmitting the aforementioned signal to an animal,

[0916] A means of managing the well-being of animals for humans,

[0917] A system that includes this.

[0918] (Claim 2)

[0919] The system according to claim 1, comprising a database for recording the results of analyzing animal movements and vocalizations, a learning means for improving the accuracy of the analysis, and providing data for managing the well-being of animals.

[0920] (Claim 3)

[0921] The system according to claim 1, further comprising a notification means for promptly issuing an alarm when a specific abnormality or emergency is detected from the animal's movements and vocalizations, and for providing humans with recommended actions based on the animal's well-being.

[0922] "Example 2 of combining an emotion engine"

[0923] (Claim 1)

[0924] Means for acquiring animal movements,

[0925] Means of obtaining animal sounds,

[0926] Means for acquiring human emotional states,

[0927] Means for analyzing the aforementioned actions and vocalizations to identify the animal's requests or emotions,

[0928] Based on the aforementioned analysis results and the emotional state of the human, a means for notifying a human of the animal's requests or emotions,

[0929] A means of converting human input into signals that are understandable to animals and take human emotions into consideration,

[0930] Means for transmitting the aforementioned signal to an animal,

[0931] A system that includes this.

[0932] (Claim 2)

[0933] The system according to claim 1, further comprising a learning means for recording the results of the analysis of animal movements and vocalizations in a data recording device and improving the accuracy of the analysis.

[0934] (Claim 3)

[0935] The system according to claim 1, further comprising a notification means for promptly issuing an alarm when a specific anomaly or emergency is detected from the movements and sounds of an animal.

[0936] "Application example 2 when combining with an emotional engine"

[0937] (Claim 1)

[0938] Means for acquiring animal movements,

[0939] Means of obtaining animal sounds,

[0940] Means for analyzing the aforementioned actions and vocalizations to identify the animal's requests or emotions,

[0941] A means of analyzing human emotional states,

[0942] Based on the aforementioned analysis results, a means for notifying humans of an animal's needs or emotions,

[0943] A means of converting human input into signals that animals can understand,

[0944] Means for transmitting the aforementioned signal to an animal,

[0945] A system that includes this.

[0946] (Claim 2)

[0947] The system according to claim 1, further comprising a learning means for recording the results of analyzing animal movements and vocalizations in a data recording device and improving the accuracy of the analysis.

[0948] (Claim 3)

[0949] The system according to claim 1, further comprising a notification means for promptly issuing an alarm when a specific abnormality or emergency is detected from the movements and sounds of an animal. [Explanation of Symbols]

[0950] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. Means for acquiring animal movements, Means of obtaining animal sounds, Means for analyzing the aforementioned actions and vocalizations to identify the animal's requests or emotions, Based on the aforementioned analysis results, a means for notifying humans of an animal's needs or emotions, A means of converting human input into signals that animals can understand, Means for transmitting the aforementioned signal to an animal, A means of managing the well-being of animals for humans, A system that includes this.

2. The system according to claim 1, comprising a database for recording the results of analyzing animal movements and vocalizations, a learning means for improving the accuracy of the analysis, and providing data for managing the well-being of animals.

3. The system according to claim 1, further comprising a notification means for promptly issuing an alarm when a specific abnormality or emergency is detected from the animal's movements and vocalizations, and for providing humans with recommended actions based on the animal's well-being.