system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A system analyzing animal vocalizations through voice recognition and machine learning predicts emotions accurately, addressing the language barrier and enhancing pet-owner relationships by offering personalized advice and products.

JP2026100680APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-09
Publication Date: 2026-06-19

Application Information

Patent Timeline

09 Dec 2024

Application

19 Jun 2026

Publication

JP2026100680A

IPC: G06Q50/10; G06F16/63; G06F16/903; G10L19/00; G10L25/48; G16H50/20; G10L21/028; G10L13/02; G10L19/02; G10L13/04; G10L13/08; G10L21/18; G10L25/00; G06F3/01; G06F16/687; G10L19/16; G06F16/90; G16H20/70; G10L13/06; G16H10/00; G10L13/00

AI Tagging

Application Domain

Input/output for user-computer interaction Data processing applications

Technology Topics

Physical medicine and rehabilitation Mood

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

An improved needle structure for facial microplasty
CN224387856UInfusion syringes Diagnostics Physical medicine and rehabilitation Physical therapy
Combined screen and screen shroud of a portable cognitive assessment device
USD1130750SPhysical medicine and rehabilitation Physical therapy
Massager-trainer
RU244415U1Physical medicine and rehabilitation Medical equipment
Electric over-bed travel device with human transfer and body position conversion and control method
CN122350961AHuman body Medical treatment
A pillow that provides all-around support for the cervical spine during sleep.
CN224440892ULifting in real timeeasy to relaxPhysical medicine and rehabilitation Anatomy

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

There is a language barrier between animals and humans, making it difficult to accurately understand the emotions of animals, which leads to stress and anxiety for pet owners due to the inability to grasp their needs and health conditions.

Method used

A system that analyzes animal vocalizations using input devices, performs voice recognition and machine learning to predict emotions, compares the results with a database for accuracy, and provides advice through a presentation interface.

Benefits of technology

Enables more accurate emotion determination in animals, allowing users to respond appropriately and improve their relationship with pets by providing tailored advice and product suggestions.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026100680000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] An input method that accepts animal sounds as input, An analytical means for analyzing the animal's voice and predicting its emotions, A matching means for comparing a database of past cases with the predicted emotions, A presentation means that provides advice to the user based on the aforementioned matching results, A proposal means for suggesting products related to the aforementioned advice, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot performed by at least one processor, the method including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] There is a problem that there is a language barrier between animals and humans, and it is difficult to accurately understand the emotions of animals. Especially for those who keep pets, the inability to grasp the needs and health conditions of animals causes stress and anxiety. Therefore, there is a need for a technology that can analyze the emotions of animals from their voices and propose appropriate responses to users.

Means for Solving the Problems

[0005] This invention provides a system for analyzing animal vocalizations and predicting their emotions. Specifically, it acquires animal vocalizations using an input means and analyzes the sounds using an analysis means. Based on the obtained data, it predicts the emotions and compares them with a past database using a matching means. This enables more accurate emotion determination, and provides advice to the user through a presentation means. This advice includes suggestions for necessary supplies and provides guidance for the user to take specific actions.

[0006] "Input means" refers to a device or function for acquiring animal sounds, enabling the recording and transmission of the sounds to a server.

[0007] "Analysis means" refers to programs and devices that analyze acquired animal voices and predict their emotions, and utilize voice recognition technology and machine learning algorithms.

[0008] "Matching means" refers to a device or program that has the function of comparing predicted emotions with a database of previously recorded cases, and is used to improve the accuracy of the analysis.

[0009] "Presentation means" refers to interface functions that notify the user of the results of sentiment analysis and provide appropriate advice, and includes visual displays and audio output.

[0010] "Recommendation methods" refer to functions that recommend necessary products and actions to users based on emotion analysis, and provide specific product information and action plans. [Brief explanation of the drawing]

[0011] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3]This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, when an emotion engine is combined. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0012] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0013] First, let's explain the terminology used in the following explanation.

[0014] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0015] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0016] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs, various parameters, and the like. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0017] In the following embodiments, the numbered communication I / F (Interface) is an interface that includes a communication processor, an antenna, and the like. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0018] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0019] [First Embodiment]

[0020] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0021] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0022] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0023] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0024] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0025] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0026] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0027] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0028] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0029] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0030] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0031] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0032] The system of this invention aims to analyze animal sounds and understand their emotions, and is achieved through the following functions.

[0033] Voice acquisition and transmission by the device

[0034] The user uses a device to record the sounds their pet makes. A dedicated application is installed on the device, and this application is responsible for transmitting the animal's sounds to a server over the internet. The device compresses the audio data and transmits it securely, preventing data loss during transmission.

[0035] Server-based voice analysis and emotion prediction

[0036] The server analyzes the received audio data using speech recognition technology and then runs a machine learning model based on that analysis. The machine learning model has been pre-trained on a large amount of animal vocal data and has the ability to predict animal emotions from the characteristics of the vocals. After the model predicts an emotion, it compares the result with past cases in the database using a matching mechanism to improve accuracy.

[0037] Advice and product suggestions

[0038] Based on predicted emotions and comparisons with past data, the server derives advice and recommendations, which are then communicated to the user via the device. A specific application is to consider a case where a dog barks repeatedly. The server analyzes the voice and, if it predicts the dog is feeling anxious, provides advice to the user via the device, such as "pet the dog to reassure it" or "give it a treat." It also suggests specific treats or toys, offering concrete ways to soothe the dog.

[0039] This system helps users understand their animals' emotions and respond appropriately, thereby supporting a better relationship with their pets.

[0040] The following describes the processing flow.

[0041] Step 1:

[0042] The user launches a dedicated application on their device and uses the recording function to record their pet's barking. Once recording is complete, pressing the send button on the application prepares the recorded audio data for transmission to the server.

[0043] Step 2:

[0044] The device temporarily stores the recorded audio data, compresses and encrypts it, and then sends the audio data to the server via the internet. During transmission, it checks network stability and monitors the data until it receives a response indicating successful transmission.

[0045] Step 3:

[0046] The server processes the audio data received from the terminal using a speech recognition library, converting the audio data into text and extracting necessary audio features. This prepares the key features of animal voices for analysis.

[0047] Step 4:

[0048] The server uses a machine learning model to analyze voice feature data and predict animal emotions. The model is based on a large amount of training data and uses a proprietary algorithm to make accurate emotion predictions.

[0049] Step 5:

[0050] The server retrieves predicted sentiment data and compares it with past cases in its internal database. This allows it to verify results based on similar cases and improve the accuracy of sentiment determination.

[0051] Step 6:

[0052] The server generates advice for the user based on the matching results and predicted emotions. This advice includes specific recommended actions tailored to the pet's condition. If necessary, a list of suggested pet supplies is also generated.

[0053] Step 7:

[0054] The server sends generated advice and suggestions to the terminal, and the terminal displays the received information to the user within the application. Because visual information and notifications are provided instantly through the user interface, users can respond in a timely manner.

[0055] (Example 1)

[0056] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0057] Understanding an animal's emotions from its vocalizations and providing appropriate responses is essential for building a good relationship with pets. However, technologies for analyzing animal vocalizations and predicting their emotions are still limited, making it difficult to accurately grasp emotions and provide appropriate advice. Furthermore, there is a need for efficient systems to safely and accurately process animal vocal data.

[0058] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0059] In this invention, the server includes an input means for acquiring and digitizing animal sounds, a transmission means for compressing the acquired animal sounds and securely transmitting them to an information processing device, and an analysis means for analyzing the digitized sounds using speech recognition technology and extracting feature quantities. This makes it possible to accurately understand the emotions of animals and provide users with appropriate advice and supplies.

[0060] An "input device" is a device for acquiring and digitizing sounds emitted by animals.

[0061] A "transmission means" is a device that has the function of compressing the acquired animal voice data and securely transmitting it to an information processing device.

[0062] An "analysis means" is a device that has the function of analyzing acquired audio data using speech recognition technology and extracting characteristic features of the audio.

[0063] A "prediction device" is a device that uses extracted audio features to run a machine learning model and predict the emotions of animals.

[0064] A "matching device" is a device that has the function of comparing predicted emotions with past examples in a database to improve the accuracy of predictions.

[0065] A "presentation means" is a device that provides users with guiding information about animal emotions based on the matching results.

[0066] A "recommendation device" is a device that has the function of suggesting items related to guidance information to the user.

[0067] This invention is a system for analyzing animal vocalizations and predicting their emotions. Specific embodiments of this system are described below.

[0068] The device has a recording device and a dedicated application installed to capture animal sounds. This application has the function to compress the recorded sound and send it to a server over the internet. At this stage, the audio data is digitized using audio compression technologies such as MP3 or AAC and transmitted securely.

[0069] The server receives audio data transmitted from the terminal and analyzes it using speech recognition technology. The server extracts phonemes and performs spectral analysis to clarify the features of the speech. Once the features are extracted, the server inputs them into a machine learning model to predict the animal's emotions. This model is pre-trained using a large amount of animal speech data and exhibits high accuracy in emotion prediction.

[0070] Furthermore, the server compares the predicted sentiment with multiple past case databases. This further improves the accuracy of the prediction and generates more reliable results. The server then generates advice and recommendations based on the comparison results and notifies the user through the terminal. The terminal presents the information to the user visually through the application.

[0071] As a concrete example, consider a case where a user records their dog's voice and sends it to the server. In this case, the server predicts the dog's emotion as "excited" and provides advice to the user, such as "create a relaxing environment" or "use training toys."

[0072] An example of a prompt to be input to the generative AI model is: "Predict the emotion of the dog from the recorded audio: anxiety, excitement, or hunger. Based on the emotion prediction, suggest an appropriate course of action." This prompt allows the server to generate useful information for the user and help them derive appropriate actions.

[0073] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0074] Step 1:

[0075] The user records animal sounds using a device. The input is the sounds the animal makes, and the output is a digitized audio file. Audio acquisition begins by activating the recording function on the device and pressing the record button. This data is saved within a dedicated application.

[0076] Step 2:

[0077] The device compresses the recorded audio data and prepares it for transmission. The input is a digitized audio file, and the output is compressed audio data. A dedicated application within the device compresses the audio file into MP3 or AAC format and prepares it for transmission using a secure transmission protocol. This process reduces the amount of data and improves transmission efficiency.

[0078] Step 3:

[0079] The terminal sends compressed audio data to the server. The input is the compressed audio data, and the output is a confirmation that the data transfer to the server is complete. The HTTPS protocol is used for transmission to ensure data encryption and security, thereby reducing the risk of data loss.

[0080] Step 4:

[0081] The server analyzes the received audio data. The input is compressed audio data, and the output is audio features. The server uses speech recognition technology to perform phoneme extraction and spectral analysis to extract audio features. These features are then used as input for the next step.

[0082] Step 5:

[0083] The server inputs features into a machine learning model to predict animal emotions. The input is audio features, and the output is predicted data about the animal's emotions. A pre-trained generative AI model is run to predict emotions based on the audio features.

[0084] Step 6:

[0085] The server compares predicted emotions against a historical database. The input is predicted emotion data, and the output is an improved emotion prediction result. By comparing emotion data with examples in the database, it improves accuracy and generates reliable results.

[0086] Step 7:

[0087] The server generates advice for the user based on improved prediction results. The input is the improved sentiment prediction result, and the output is advice data. The advice and recommendations generated by the server are compiled and provided to the user in the next step.

[0088] Step 8:

[0089] The device receives advice from the server and presents it visually to the user. The input is advice data sent from the server, and the output is visual advice displayed on the screen. Through the device's application, it provides users with specific actionable guidelines to support their relationship with their pets.

[0090] (Application Example 1)

[0091] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0092] In households with pets, understanding and appropriately addressing the emotions of animals is essential, but there are insufficient effective means to do so easily. Conventional technologies suffer from the problem of being time-consuming and laborious in accurately analyzing pet vocalizations, identifying emotions, and providing advice to users. Furthermore, it is difficult for users to select the appropriate products at the right time. A new system is needed to efficiently solve these challenges.

[0093] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0094] In this invention, the server includes receiving means for acquiring animal sounds, analysis means for analyzing the animal sounds and predicting emotions, and comparison means for comparing the predicted emotions with past case data. This allows users to instantly understand their pet's emotions and use that information to help them respond appropriately and choose the right supplies.

[0095] "Animal sounds" refers to the various types of sounds and noises that animals make, and these sounds may contain emotions or states of mind.

[0096] "Receiving means" refers to devices or functions that effectively acquire animal sounds and appropriately incorporate that data into the system.

[0097] "Analysis means" refers to devices or methods that perform detailed analysis of acquired animal sounds and process the results to predict the animal's emotions.

[0098] A "machine learning model" refers to a computational model that learns from large amounts of data and can make predictions and classifications about unknown data.

[0099] "Comparison methods" refer to the process of comparing predicted emotions with a database of previously accumulated data to verify their accuracy and reliability.

[0100] "Notification means" refers to the means of providing users with necessary information and advice based on the results of analysis and comparison.

[0101] "Suggestion mechanisms" refer to systems that specifically recommend the necessary items or actions that users need when taking action.

[0102] "Display means" refers to devices or functions that visually present information on a user interface, making it easy for users to understand.

[0103] An "artificial intelligence model" refers to a computational model that performs processing that mimics human intelligence, such as recognizing, classifying, and predicting data patterns.

[0104] This system analyzes pet sounds and predicts their emotions. First, the user receives animal sounds using a device such as a smartphone or consumer robot. The device has a built-in dedicated receiving mechanism that can effectively capture animal sounds. The audio data is then transmitted to a server via the internet.

[0105] The server processes the received audio data using analysis tools and predicts the animal's emotions using a machine learning model. Specifically, the server uses an AI platform such as TENSORFLOW® to input data into a trained machine learning model for recognition and analysis. This enables the server to predict the animal's emotions with high accuracy.

[0106] Subsequently, the server compares the predicted emotions with a large amount of past case data to verify the accuracy of the results. Based on the results obtained from the comparison, a dedicated notification system is activated to inform the user of necessary information and advice. The notifications are displayed on the user's smartphone or the user interface of a consumer robot in a visually easy-to-understand format.

[0107] Furthermore, the server suggests products to the user that are appropriate for the predicted condition of the animal. The suggestion system displays automatically selected recommended products, making it easier for the user to respond to the animal's condition and choose the right products.

[0108] For example, if a dog barks frequently, and analysis predicts the dog is anxious, the user might be advised to "pet the dog to calm it down" or "give it a specific treat." Another example of a prompt to the generative AI model could be, "Analyze the dog's barking and suggest specific countermeasures if the cause is stress."

[0109] Thus, this system is a useful tool for understanding an animal's emotions through its vocalizations and supporting appropriate responses.

[0110] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0111] Step 1:

[0112] Audio acquisition by the device

[0113] The user records animal sounds using a device. The device is equipped with a receiving mechanism that acquires sound using a dedicated microphone, and converts the acquired sound data into a digital format. This data is stored for subsequent processing. The input is analog animal sounds, and the output is digitized sound data.

[0114] Step 2:

[0115] Sending audio data

[0116] The device transmits recorded audio data to the server via the internet. The audio data is compressed while maintaining sound quality and securely transferred to the server. The input is digitized audio data, which is compressed during transmission. The output is a confirmation of the transmission of the compressed audio data.

[0117] Step 3:

[0118] Analysis of audio data

[0119] The server processes the received audio data using an analysis tool. Here, speech recognition and emotion prediction are performed by using TensorFlow, an AI platform, to run a machine learning model. The input is compressed audio data, and the output is the predicted emotion of an animal.

[0120] Step 4:

[0121] Comparison with past cases

[0122] The server compares the predicted animal's emotion with past case data. This comparison method improves the accuracy and reliability of the prediction. The input is the predicted emotion data, and the output is the improved emotion prediction result.

[0123] Step 5:

[0124] Advice notification

[0125] The server provides the user with useful advice based on the comparison results. This information is displayed on the terminal's user interface via a notification system. The input is an improved sentiment prediction result, and the output is specific advice for the user.

[0126] Step 6:

[0127] Display of proposals

[0128] The server suggests appropriate items based on predicted emotions. These suggestions are visually presented to the user using a suggestion mechanism. The input is an improved emotion prediction result, and the output is item suggestion information for the user.

[0129] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0130] This invention provides a system that offers more accurate advice by performing both animal emotion analysis and user emotion recognition. This system is implemented using the following components:

[0131] Analysis of animal sounds and prediction of emotions

[0132] The user records animal sounds using a device. The device sends the recording data to a server, which uses speech recognition technology to transcribe the speech into text. This information is analyzed by an analytical tool to predict the animal's emotions. A machine learning model supports this process, accurately determining the animal's state.

[0133] User emotion recognition

[0134] The device is equipped with an emotion engine that analyzes the user's emotions in real time. The emotion engine analyzes the user's voice and facial expression data to recognize their current emotional state. The server receives this information and determines the relationship between the user's emotions and the emotions of the animals.

[0135] Customized advice and suggestions

[0136] The server generates advice for the user based on both sets of sentiment data it has acquired. This advice is specific and situational, allowing it to suggest more careful responses if the user's sentiment is negative, or provide additional information or product suggestions if it is positive. Furthermore, it leverages learning from successful cases in similar situations through comparison with past data.

[0137] Specific example

[0138] For example, suppose a user records their pet's barking while feeling anxious, and the server analyzes the pet's voice as indicating "loneliness." In this case, the emotion engine provides specific suggestions to alleviate the user's anxiety, such as "spend more time with your pet" or "use pet-specific relaxation products." This reduces the psychological burden on the user when taking action and provides support for building a better relationship.

[0139] This system allows users to not only engage in rich, two-way communication with animals, but also to select the optimal response that takes their own emotional state into consideration.

[0140] The following describes the processing flow.

[0141] Step 1:

[0142] The user launches the application on their device and records animal sounds. Once recording is complete, they press the send button to prepare the audio data for transmission to the server.

[0143] Step 2:

[0144] The device compresses the recorded audio data and encrypts it for secure transmission. It then sends the audio data to the server via the internet. During this process, the device monitors the transmission progress and confirms that the transmission was successful.

[0145] Step 3:

[0146] The server analyzes the audio data received from the terminal. Using speech recognition technology, the data is converted into text, and a machine learning model is applied to predict the animal's emotions. This model performs highly accurate emotion estimation based on past training data.

[0147] Step 4:

[0148] Simultaneously, the device activates an emotion engine to recognize the user's emotions. It collects the user's voice and facial expression data and analyzes their emotional state. This data is transmitted to the server in real time.

[0149] Step 5:

[0150] The server integrates emotional data from both the animal and the user, generating advice that takes into account their emotional states. It also compares this data with past database data, drawing on effective responses in similar situations. This process helps to create the optimal action plan for the user.

[0151] Step 6:

[0152] The server sends the generated advice to the terminal. The advice includes suggested pet supplies and action plans, as needed.

[0153] Step 7:

[0154] The device displays received advice and suggestions in its user interface. The user interface presents the advice visually in an easy-to-understand manner, helping the user take the necessary actions immediately.

[0155] Step 8:

[0156] Users begin taking action based on the advice provided and, if necessary, use the suggested products to care for their pets. Through this process, users can deepen their communication with animals and improve the emotional state of both parties.

[0157] (Example 2)

[0158] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0159] There is a lack of systems that can properly understand animal emotions and provide appropriate advice to users. Furthermore, there is no means to provide specific suggestions for properly caring for animals while considering the user's own emotional state. This makes it difficult for users to comprehensively understand their own and their pet's emotions and take appropriate action.

[0160] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0161] In this invention, the server includes a conversion means that converts animal sounds into text data using speech recognition technology, an analysis means that predicts the animal's emotions based on the converted data, and a recognition means that acquires and analyzes the user's emotional data in real time. This enables a comprehensive understanding of the emotions of both the animal and the user, and allows for the provision of accurate advice to the user.

[0162] "Receiving means" refers to a server or terminal function that directly receives animal sounds as input and processes them.

[0163] "Conversion means" refers to a part of a system that converts received audio information into text data using speech recognition technology.

[0164] "Analysis means" refers to a component that has the function of predicting animal emotions using a machine learning model based on text data.

[0165] "Recognition means" refers to components of a system that acquires and analyzes the user's voice and facial expression data in real time to understand their emotional state.

[0166] The "relevance determination means" is an analytical device that integrates animal and user emotional data and confirms the relationships between them.

[0167] A "comparison device" is a functional device that compares past case information with current sentiment data to derive useful information.

[0168] A "guidance means" is a display device that includes an interface for providing integrated advice to the user.

[0169] "Supply means" refers to components of a system that proposes relevant tools and products to users based on advice.

[0170] This invention is a system that analyzes animal sounds and user emotions to provide comprehensive advice. The system mainly consists of a server and terminals.

[0171] The user records the sounds of animals, such as pets, using the microphone on their device. The device then sends this audio data to a server via the internet. Typically, a smartphone or tablet is used as the device for this purpose.

[0172] The server uses speech recognition technology to convert the audio data into text. Open-source speech recognition systems are used here. The converted text data is then input into a machine learning model to analyze the animal's emotions. This machine learning model, built using tools such as TensorFlow, is capable of predicting the animal's emotions with high accuracy.

[0173] Meanwhile, user emotions are captured as data through the device's camera and microphone and analyzed by an emotion engine. This emotion analysis utilizes computer vision libraries and speech analysis libraries to determine the user's emotional state in real time from the user's voice and facial expression data. This information is also sent to a server and integrated with the animal's emotion data.

[0174] The server analyzes this data using relevant decision-making tools, compares it with past case information, and generates integrated advice. A generative AI model is used to generate this advice; for example, a model from OpenAI® may be used. By inputting prompts such as "The animal is feeling anxious, and the user is also anxious" into this AI model, it generates specific advice tailored to the situation.

[0175] The generated advice is presented to the user through the device. Based on this advice, the user can improve their relationship with their pet and take appropriate action considering their own emotional state. For example, specific suggestions such as "spend more time with your pet" or "use relaxation products" can help reduce the user's anxiety.

[0176] In this way, this system comprehensively understands the emotions of both the user and the animal, supporting better communication for both parties.

[0177] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0178] Step 1:

[0179] The user records animal sounds using the device's microphone. This recording becomes the input. Specifically, the user presses the record button and records for a set period of time. Once the recording is finished, it is saved to the device as audio data.

[0180] Step 2:

[0181] The terminal sends recorded audio data to the server. An internet connection is used for transmission. The input is the pre-recorded audio data, and the output is a network message sent to the server. Specifically, data conversion and encryption are performed, and the data is sent to the server using the HTTP protocol.

[0182] Step 3:

[0183] The server converts the received audio data into text using speech recognition technology. The input here is the audio data received by the server. The output is the converted text data. Specifically, it uses a speech recognition library to analyze the audio waveform and convert it into text in the corresponding language.

[0184] Step 4:

[0185] The server predicts the animal's emotions based on the converted text data. The input is text data obtained from speech, and the output is predicted animal emotion information. Specifically, a machine learning model analyzes the text and generates confidence scores for a set of emotion categories.

[0186] Step 5:

[0187] The device analyzes the user's voice and facial expression data in real time using an emotion engine to recognize the user's emotions. Input is the user's video and audio data acquired from the camera or microphone, and output is the user's emotional information. Image processing and audio analysis are performed during specific actions to extract emotional characteristics.

[0188] Step 6:

[0189] The server integrates animal emotion data and user emotion data. The input is animal emotion information and user emotion information, and the output is integrated data showing their relationships. Specifically, a relationship analysis algorithm is applied to calculate the strength of the relationship, taking into account the emotional states of both entities.

[0190] Step 7:

[0191] The server generates advice for the user based on integrated data using a generative AI model. The input is integrated data, and the output is the generated advice. Specifically, it provides the AI model with prompts to generate candidate advice, and then selects the best one.

[0192] Step 8:

[0193] The terminal receives advice generated from the server and presents it to the user. The input is the advice received from the server, and the output is what is displayed to the user. Specifically, the advice is displayed in the user interface, and the user can take action based on it.

[0194] (Application Example 2)

[0195] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0196] While systems exist that detect animal emotions and provide relevant advice, they have historically offered one-sided suggestions without simultaneously considering the user's emotions, thus failing to provide appropriate support tailored to the user's psychological state. Therefore, there is a need for technology that more accurately supports two-way communication between animals and users.

[0197] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0198] In this invention, the server includes a receiving means for receiving animal sounds as input, an analysis means for analyzing animal sounds and predicting emotions, and a recognition means for recognizing the user's emotions in real time. This makes it possible to simultaneously analyze the emotions of both the animal and the user and provide specific and situational advice tailored to the state of both.

[0199] "Receiving means" refers to devices or methods for receiving animal sounds as input.

[0200] "Analysis means" refers to devices or methods used to analyze animal sounds and predict their emotions.

[0201] "Recognition means" refers to devices or methods for recognizing a user's emotions in real time.

[0202] A "comparison tool" is a device or method for matching a set of past case data with predicted emotions.

[0203] "Presentation means" refers to devices or methods for providing advice to a user.

[0204] "Selection means" refers to devices or methods for suggesting products related to advice.

[0205] The system that realizes this invention collects and analyzes animal sounds and grasps the user's emotions in real time, thereby providing optimal advice based on the emotional states of both parties. The system mainly includes a terminal for receiving animal sounds, a server for analysis and recognition, and a terminal for providing information to the user.

[0206] The server uses libraries such as Python's sounddevice to record audio data for analysis and predicts emotions using machine learning models. Specific emotion analysis results are returned as predictions, such as "lonely." Additionally, user emotions are collected by the device via the camera and microphone and recognized by the emotion engine.

[0207] Based on this data, the server utilizes a generative AI model to provide optimal advice to the user. The user's device displays this advice through an interface, making it easy for the user to understand. This helps users build better relationships with animals.

[0208] For example, when the pet robot is deemed lonely, the user's device will receive advice such as, "Let's spend some time enjoying some relaxation items." This provides the user with emotional support and enriches their communication with their pet.

[0209] An example of a prompt message is, "If you record the sounds of a pet robot and perform voice emotion analysis, what emotions will be detected?"

[0210] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0211] Step 1:

[0212] The device records animal sounds through a microphone. The input is audio data, which is then sent to a server. The audio data is first digitized to a format necessary for subsequent analysis.

[0213] Step 2:

[0214] The server uses speech recognition technology to convert the received audio data into text data. It analyzes the waveform information of the audio and outputs it as text data. This text data is then used as input data for emotion prediction.

[0215] Step 3:

[0216] The server inputs the converted text data into a machine learning model to predict the animal's emotions. Based on past training data, this model determines the relationship between voice patterns and emotions and outputs emotion labels such as "lonely" or "happy."

[0217] Step 4:

[0218] The device uses data collected through the user's voice and camera to recognize the user's emotions in real time. This data is then sent to a server as an emotional state, such as "anxiety" or "joy," using a voice analysis engine and facial recognition software.

[0219] Step 5:

[0220] The server combines animal emotion data and user emotion data, and uses a generative AI model to generate optimal advice for the user. Based on the input emotion data, it constructs an advice sentence and sends it back to the terminal in text format.

[0221] Step 6:

[0222] The device displays the received advice through a user interface. Users can visually review the advice and obtain specific guidance for optimal communication with animals.

[0223] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0224] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0225] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0226] [Second Embodiment]

[0227] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0228] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0229] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0230] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0231] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0232] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0233] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0234] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0235] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0236] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0237] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0238] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0239] The system of this invention aims to analyze animal sounds and understand their emotions, and is achieved through the following functions.

[0240] Voice acquisition and transmission by the device

[0241] The user uses a device to record the sounds their pet makes. A dedicated application is installed on the device, and this application is responsible for transmitting the animal's sounds to a server over the internet. The device compresses the audio data and transmits it securely, preventing data loss during transmission.

[0242] Server-based voice analysis and emotion prediction

[0243] The server analyzes the received audio data using speech recognition technology and then runs a machine learning model based on that analysis. The machine learning model has been pre-trained on a large amount of animal vocal data and has the ability to predict animal emotions from the characteristics of the vocals. After the model predicts an emotion, it compares the result with past cases in the database using a matching mechanism to improve accuracy.

[0244] Advice and product suggestions

[0245] Based on predicted emotions and comparisons with past data, the server derives advice and recommendations, which are then communicated to the user via the device. A specific application is to consider a case where a dog barks repeatedly. The server analyzes the voice and, if it predicts the dog is feeling anxious, provides advice to the user via the device, such as "pet the dog to reassure it" or "give it a treat." It also suggests specific treats or toys, offering concrete ways to soothe the dog.

[0246] This system helps users understand their animals' emotions and respond appropriately, thereby supporting a better relationship with their pets.

[0247] The following describes the processing flow.

[0248] Step 1:

[0249] The user launches a dedicated application on their device and uses the recording function to record their pet's barking. Once recording is complete, pressing the send button on the application prepares the recorded audio data for transmission to the server.

[0250] Step 2:

[0251] The device temporarily stores the recorded audio data, compresses and encrypts it, and then sends the audio data to the server via the internet. During transmission, it checks network stability and monitors the data until it receives a response indicating successful transmission.

[0252] Step 3:

[0253] The server processes the audio data received from the terminal using a speech recognition library, converting the audio data into text and extracting necessary audio features. This prepares the key features of animal voices for analysis.

[0254] Step 4:

[0255] The server uses a machine learning model to analyze voice feature data and predict animal emotions. The model is based on a large amount of training data and uses a proprietary algorithm to make accurate emotion predictions.

[0256] Step 5:

[0257] The server retrieves predicted sentiment data and compares it with past cases in its internal database. This allows it to verify results based on similar cases and improve the accuracy of sentiment determination.

[0258] Step 6:

[0259] The server generates advice for the user based on the matching results and predicted emotions. This advice includes specific recommended actions tailored to the pet's condition. If necessary, a list of suggested pet supplies is also generated.

[0260] Step 7:

[0261] The server sends generated advice and suggestions to the terminal, and the terminal displays the received information to the user within the application. Because visual information and notifications are provided instantly through the user interface, users can respond in a timely manner.

[0262] (Example 1)

[0263] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0264] Understanding an animal's emotions from its vocalizations and providing appropriate responses is essential for building a good relationship with pets. However, technologies for analyzing animal vocalizations and predicting their emotions are still limited, making it difficult to accurately grasp emotions and provide appropriate advice. Furthermore, there is a need for efficient systems to safely and accurately process animal vocal data.

[0265] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0266] In this invention, the server includes an input means for acquiring and digitizing animal sounds, a transmission means for compressing the acquired animal sounds and securely transmitting them to an information processing device, and an analysis means for analyzing the digitized sounds using speech recognition technology and extracting feature quantities. This makes it possible to accurately understand the emotions of animals and provide users with appropriate advice and supplies.

[0267] An "input device" is a device for acquiring and digitizing sounds emitted by animals.

[0268] A "transmission means" is a device that has the function of compressing the acquired animal voice data and securely transmitting it to an information processing device.

[0269] An "analysis means" is a device that has the function of analyzing acquired audio data using speech recognition technology and extracting characteristic features of the audio.

[0270] A "prediction device" is a device that uses extracted audio features to run a machine learning model and predict the emotions of animals.

[0271] A "matching device" is a device that has the function of comparing predicted emotions with past examples in a database to improve the accuracy of predictions.

[0272] A "presentation means" is a device that provides users with guiding information about animal emotions based on the matching results.

[0273] A "recommendation device" is a device that has the function of suggesting items related to guidance information to the user.

[0274] This invention is a system for analyzing animal vocalizations and predicting their emotions. Specific embodiments of this system are described below.

[0275] The device has a recording device and a dedicated application installed to capture animal sounds. This application has the function to compress the recorded sound and send it to a server over the internet. At this stage, the audio data is digitized using audio compression technologies such as MP3 or AAC and transmitted securely.

[0276] The server receives audio data transmitted from the terminal and analyzes it using speech recognition technology. The server extracts phonemes and performs spectral analysis to clarify the features of the speech. Once the features are extracted, the server inputs them into a machine learning model to predict the animal's emotions. This model is pre-trained using a large amount of animal speech data and exhibits high accuracy in emotion prediction.

[0277] Furthermore, the server compares the predicted sentiment with multiple past case databases. This further improves the accuracy of the prediction and generates more reliable results. The server then generates advice and recommendations based on the comparison results and notifies the user through the terminal. The terminal presents the information to the user visually through the application.

[0278] As a concrete example, consider a case where a user records their dog's voice and sends it to the server. In this case, the server predicts the dog's emotion as "excited" and provides advice to the user, such as "create a relaxing environment" or "use training toys."

[0279] An example of a prompt to be input to the generative AI model is: "Predict the emotion of the dog from the recorded audio: anxiety, excitement, or hunger. Based on the emotion prediction, suggest an appropriate course of action." This prompt allows the server to generate useful information for the user and help them derive appropriate actions.

[0280] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0281] Step 1:

[0282] The user records animal sounds using a device. The input is the sounds the animal makes, and the output is a digitized audio file. Audio acquisition begins by activating the recording function on the device and pressing the record button. This data is saved within a dedicated application.

[0283] Step 2:

[0284] The device compresses the recorded audio data and prepares it for transmission. The input is a digitized audio file, and the output is compressed audio data. A dedicated application within the device compresses the audio file into MP3 or AAC format and prepares it for transmission using a secure transmission protocol. This process reduces the amount of data and improves transmission efficiency.

[0285] Step 3:

[0286] The terminal sends the compressed voice data to the server. The input is the compressed voice data, and the output is the confirmation of the completion of data transfer to the server. The HTTPS protocol is used for transmission to ensure data encryption and security. This reduces the risk of data loss.

[0287] Step 4:

[0288] The server analyzes the received voice data. The input is the compressed voice data, and the output is the feature quantity of the voice. The server uses voice recognition technology to perform phoneme extraction and spectral analysis to extract the feature quantity of the voice. This feature quantity is used as the input for the next step.

[0289] Step 5:

[0290] The server inputs the feature quantity into the machine learning model to predict the emotion of the animal. The input is the feature quantity of the voice, and the output is the prediction data regarding the emotion of the animal. The pre-trained generative AI model is operated to predict the emotion based on the features of the voice.

[0291] Step 6:

[0292] The server collates the predicted emotion with the past database. The input is the predicted emotion data, and the output is the emotion prediction result with improved accuracy. The cases and emotion data in the database are compared to improve the accuracy and generate a highly reliable result.

[0293] Step 7:

[0294] Based on the prediction result with improved accuracy, the server generates advice for the user. The input is the emotion prediction result with improved accuracy, and the output is the advice data. The server summarizes the advice and recommendations generated and provides them to the user in the next step.

[0295] Step 8:

[0296] The device receives advice from the server and presents it visually to the user. The input is advice data sent from the server, and the output is visual advice displayed on the screen. Through the device's application, it provides users with specific actionable guidelines to support their relationship with their pets.

[0297] (Application Example 1)

[0298] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0299] In households with pets, understanding and appropriately addressing the emotions of animals is essential, but there are insufficient effective means to do so easily. Conventional technologies suffer from the problem of being time-consuming and laborious in accurately analyzing pet vocalizations, identifying emotions, and providing advice to users. Furthermore, it is difficult for users to select the appropriate products at the right time. A new system is needed to efficiently solve these challenges.

[0300] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0301] In this invention, the server includes receiving means for acquiring animal sounds, analysis means for analyzing the animal sounds and predicting emotions, and comparison means for comparing the predicted emotions with past case data. This allows users to instantly understand their pet's emotions and use that information to help them respond appropriately and choose the right supplies.

[0302] "Animal sounds" refers to the various types of sounds and noises that animals make, and these sounds may contain emotions or states of mind.

[0303] "Receiving means" refers to a device or function for effectively acquiring the sounds of animals and appropriately incorporating the data into the system.

[0304] "Analysis means" refers to a device or method for analyzing in detail the acquired sounds of animals and performing processing to predict the emotions of the animals from the results.

[0305] "Machine learning model" refers to a computational model that can learn based on a large amount of data and perform predictions and classifications on unknown data.

[0306] "Comparison means" refers to a process for comparing the predicted emotions with a database accumulated in the past to confirm the accuracy and reliability.

[0307] "Notification means" refers to a means for providing necessary information and advice to the user based on the results of analysis and comparison.

[0308] "Proposal means" refers to a mechanism for specifically recommending supplies and actions necessary when the user takes action.

[0309] "Display means" refers to a device or function for visually presenting information on the user interface so that the user can easily understand it.

[0310] "Artificial intelligence model" refers to a computational model that performs recognition, classification, and prediction of data patterns and performs processing that mimics human intelligence.

[0311] This system enables the analysis of the sounds of pets and the prediction of their emotions. First, the user uses a terminal such as a smartphone or a consumer robot to receive the sounds of animals. The terminal has a built-in dedicated receiving means and can effectively acquire the sounds of animals. The audio data is transmitted to the server through the Internet.

[0312] The server processes the received audio data using analysis tools and predicts the animal's emotions using a machine learning model. Specifically, the server uses an AI platform such as TensorFlow to input the data into a pre-trained machine learning model for recognition and analysis. This allows the server to predict the animal's emotions with high accuracy.

[0313] Subsequently, the server compares the predicted emotions with a large amount of past case data to verify the accuracy of the results. Based on the results obtained from the comparison, a dedicated notification system is activated to inform the user of necessary information and advice. The notifications are displayed on the user's smartphone or the user interface of a consumer robot in a visually easy-to-understand format.

[0314] Furthermore, the server suggests products to the user that are appropriate for the predicted condition of the animal. The suggestion system displays automatically selected recommended products, making it easier for the user to respond to the animal's condition and choose the right products.

[0315] For example, if a dog barks frequently, and analysis predicts the dog is anxious, the user might be advised to "pet the dog to calm it down" or "give it a specific treat." Another example of a prompt to the generative AI model could be, "Analyze the dog's barking and suggest specific countermeasures if the cause is stress."

[0316] Thus, this system is a useful tool for understanding an animal's emotions through its vocalizations and supporting appropriate responses.

[0317] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0318] Step 1:

[0319] Audio acquisition by the device

[0320] The user records animal sounds using a device. The device is equipped with a receiving mechanism that acquires sound using a dedicated microphone, and converts the acquired sound data into a digital format. This data is stored for subsequent processing. The input is analog animal sounds, and the output is digitized sound data.

[0321] Step 2:

[0322] Sending audio data

[0323] The device transmits recorded audio data to the server via the internet. The audio data is compressed while maintaining sound quality and securely transferred to the server. The input is digitized audio data, which is compressed during transmission. The output is a confirmation of the transmission of the compressed audio data.

[0324] Step 3:

[0325] Analysis of audio data

[0326] The server processes the received audio data using an analysis tool. Here, speech recognition and emotion prediction are performed by using TensorFlow, an AI platform, to run a machine learning model. The input is compressed audio data, and the output is the predicted emotion of an animal.

[0327] Step 4:

[0328] Comparison with past cases

[0329] The server compares the predicted animal's emotion with past case data. This comparison method improves the accuracy and reliability of the prediction. The input is the predicted emotion data, and the output is the improved emotion prediction result.

[0330] Step 5:

[0331] Advice notification

[0332] The server provides the user with useful advice based on the comparison results. This information is displayed on the terminal's user interface via a notification system. The input is an improved sentiment prediction result, and the output is specific advice for the user.

[0333] Step 6:

[0334] Display of proposals

[0335] The server suggests appropriate items based on predicted emotions. These suggestions are visually presented to the user using a suggestion mechanism. The input is an improved emotion prediction result, and the output is item suggestion information for the user.

[0336] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0337] This invention provides a system that offers more accurate advice by performing both animal emotion analysis and user emotion recognition. This system is implemented using the following components:

[0338] Analysis of animal sounds and prediction of emotions

[0339] The user records animal sounds using a device. The device sends the recording data to a server, which uses speech recognition technology to transcribe the speech into text. This information is analyzed by an analytical tool to predict the animal's emotions. A machine learning model supports this process, accurately determining the animal's state.

[0340] User emotion recognition

[0341] The device is equipped with an emotion engine that analyzes the user's emotions in real time. The emotion engine analyzes the user's voice and facial expression data to recognize their current emotional state. The server receives this information and determines the relationship between the user's emotions and the emotions of the animals.

[0342] Customized advice and suggestions

[0343] The server generates advice for the user based on both sets of sentiment data it has acquired. This advice is specific and situational, allowing it to suggest more careful responses if the user's sentiment is negative, or provide additional information or product suggestions if it is positive. Furthermore, it leverages learning from successful cases in similar situations through comparison with past data.

[0344] Specific example

[0345] For example, suppose a user records their pet's barking while feeling anxious, and the server analyzes the pet's voice as indicating "loneliness." In this case, the emotion engine provides specific suggestions to alleviate the user's anxiety, such as "spend more time with your pet" or "use pet-specific relaxation products." This reduces the psychological burden on the user when taking action and provides support for building a better relationship.

[0346] This system allows users to not only engage in rich, two-way communication with animals, but also to select the optimal response that takes their own emotional state into consideration.

[0347] The following describes the processing flow.

[0348] Step 1:

[0349] The user launches the application on their device and records animal sounds. Once recording is complete, they press the send button to prepare the audio data for transmission to the server.

[0350] Step 2:

[0351] The device compresses the recorded audio data and encrypts it for secure transmission. It then sends the audio data to the server via the internet. During this process, the device monitors the transmission progress and confirms that the transmission was successful.

[0352] Step 3:

[0353] The server analyzes the audio data received from the terminal. Using speech recognition technology, the data is converted into text, and a machine learning model is applied to predict the animal's emotions. This model performs highly accurate emotion estimation based on past training data.

[0354] Step 4:

[0355] Simultaneously, the device activates an emotion engine to recognize the user's emotions. It collects the user's voice and facial expression data and analyzes their emotional state. This data is transmitted to the server in real time.

[0356] Step 5:

[0357] The server integrates emotional data from both the animal and the user, generating advice that takes into account their emotional states. It also compares this data with past database data, drawing on effective responses in similar situations. This process helps to create the optimal action plan for the user.

[0358] Step 6:

[0359] The server sends the generated advice to the terminal. The advice includes suggested pet supplies and action plans, as needed.

[0360] Step 7:

[0361] The device displays received advice and suggestions in its user interface. The user interface presents the advice visually in an easy-to-understand manner, helping the user take the necessary actions immediately.

[0362] Step 8:

[0363] Users begin taking action based on the advice provided and, if necessary, use the suggested products to care for their pets. Through this process, users can deepen their communication with animals and improve the emotional state of both parties.

[0364] (Example 2)

[0365] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0366] There is a lack of systems that can properly understand animal emotions and provide appropriate advice to users. Furthermore, there is no means to provide specific suggestions for properly caring for animals while considering the user's own emotional state. This makes it difficult for users to comprehensively understand their own and their pet's emotions and take appropriate action.

[0367] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0368] In this invention, the server includes a conversion means that converts animal sounds into text data using speech recognition technology, an analysis means that predicts the animal's emotions based on the converted data, and a recognition means that acquires and analyzes the user's emotional data in real time. This enables a comprehensive understanding of the emotions of both the animal and the user, and allows for the provision of accurate advice to the user.

[0369] "Receiving means" refers to a server or terminal function that directly receives animal sounds as input and processes them.

[0370] "Conversion means" refers to a part of a system that converts received audio information into text data using speech recognition technology.

[0371] "Analysis means" refers to a component that has the function of predicting animal emotions using a machine learning model based on text data.

[0372] "Recognition means" refers to components of a system that acquires and analyzes the user's voice and facial expression data in real time to understand their emotional state.

[0373] The "relevance determination means" is an analytical device that integrates animal and user emotional data and confirms the relationships between them.

[0374] A "comparison device" is a functional device that compares past case information with current sentiment data to derive useful information.

[0375] A "guidance means" is a display device that includes an interface for providing integrated advice to the user.

[0376] "Supply means" refers to components of a system that proposes relevant tools and products to users based on advice.

[0377] This invention is a system that analyzes animal sounds and user emotions to provide comprehensive advice. The system mainly consists of a server and terminals.

[0378] The user records the sounds of animals, such as pets, using the microphone on their device. The device then sends this audio data to a server via the internet. Typically, a smartphone or tablet is used as the device for this purpose.

[0379] The server uses speech recognition technology to convert the audio data into text. Open-source speech recognition systems are used here. The converted text data is then input into a machine learning model to analyze the animal's emotions. This machine learning model, built using tools such as TensorFlow, is capable of predicting the animal's emotions with high accuracy.

[0380] Meanwhile, user emotions are captured as data through the device's camera and microphone and analyzed by an emotion engine. This emotion analysis utilizes computer vision libraries and speech analysis libraries to determine the user's emotional state in real time from the user's voice and facial expression data. This information is also sent to a server and integrated with the animal's emotion data.

[0381] The server analyzes this data using relevant decision-making tools, compares it with past case information, and generates integrated advice. A generative AI model is used to generate this advice; for example, an OpenAI model may be used. By inputting prompts such as "The animal is feeling anxious, and the user is also anxious" into this AI model, it generates specific advice tailored to the situation.

[0382] The generated advice is presented to the user through the device. Based on this advice, the user can improve their relationship with their pet and take appropriate action considering their own emotional state. For example, specific suggestions such as "spend more time with your pet" or "use relaxation products" can help reduce the user's anxiety.

[0383] In this way, this system comprehensively understands the emotions of both the user and the animal, supporting better communication for both parties.

[0384] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0385] Step 1:

[0386] The user records animal sounds using the device's microphone. This recording becomes the input. Specifically, the user presses the record button and records for a set period of time. Once the recording is finished, it is saved to the device as audio data.

[0387] Step 2:

[0388] The terminal sends recorded audio data to the server. An internet connection is used for transmission. The input is the pre-recorded audio data, and the output is a network message sent to the server. Specifically, data conversion and encryption are performed, and the data is sent to the server using the HTTP protocol.

[0389] Step 3:

[0390] The server converts the received audio data into text using speech recognition technology. The input here is the audio data received by the server. The output is the converted text data. Specifically, it uses a speech recognition library to analyze the audio waveform and convert it into text in the corresponding language.

[0391] Step 4:

[0392] The server predicts the animal's emotions based on the converted text data. The input is text data obtained from speech, and the output is predicted animal emotion information. Specifically, a machine learning model analyzes the text and generates confidence scores for a set of emotion categories.

[0393] Step 5:

[0394] The device analyzes the user's voice and facial expression data in real time using an emotion engine to recognize the user's emotions. Input is the user's video and audio data acquired from the camera or microphone, and output is the user's emotional information. Image processing and audio analysis are performed during specific actions to extract emotional characteristics.

[0395] Step 6:

[0396] The server integrates animal emotion data and user emotion data. The input is animal emotion information and user emotion information, and the output is integrated data showing their relationships. Specifically, a relationship analysis algorithm is applied to calculate the strength of the relationship, taking into account the emotional states of both entities.

[0397] Step 7:

[0398] The server generates advice for the user based on integrated data using a generative AI model. The input is integrated data, and the output is the generated advice. Specifically, it provides the AI model with prompts to generate candidate advice, and then selects the best one.

[0399] Step 8:

[0400] The terminal receives advice generated from the server and presents it to the user. The input is the advice received from the server, and the output is what is displayed to the user. Specifically, the advice is displayed in the user interface, and the user can take action based on it.

[0401] (Application Example 2)

[0402] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0403] While systems exist that detect animal emotions and provide relevant advice, they have historically offered one-sided suggestions without simultaneously considering the user's emotions, thus failing to provide appropriate support tailored to the user's psychological state. Therefore, there is a need for technology that more accurately supports two-way communication between animals and users.

[0404] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0405] In this invention, the server includes a receiving means for receiving animal sounds as input, an analysis means for analyzing animal sounds and predicting emotions, and a recognition means for recognizing the user's emotions in real time. This makes it possible to simultaneously analyze the emotions of both the animal and the user and provide specific and situational advice tailored to the state of both.

[0406] "Receiving means" refers to devices or methods for receiving animal sounds as input.

[0407] "Analysis means" refers to devices or methods used to analyze animal sounds and predict their emotions.

[0408] "Recognition means" refers to devices or methods for recognizing a user's emotions in real time.

[0409] A "comparison tool" is a device or method for matching a set of past case data with predicted emotions.

[0410] "Presentation means" refers to devices or methods for providing advice to a user.

[0411] "Selection means" refers to devices or methods for suggesting products related to advice.

[0412] The system that realizes this invention collects and analyzes animal sounds and grasps the user's emotions in real time, thereby providing optimal advice based on the emotional states of both parties. The system mainly includes a terminal for receiving animal sounds, a server for analysis and recognition, and a terminal for providing information to the user.

[0413] The server uses libraries such as Python's sounddevice to record audio data for analysis and predicts emotions using machine learning models. Specific emotion analysis results are returned as predictions, such as "lonely." Additionally, user emotions are collected by the device via the camera and microphone and recognized by the emotion engine.

[0414] Based on this data, the server utilizes a generative AI model to provide optimal advice to the user. The user's device displays this advice through an interface, making it easy for the user to understand. This helps users build better relationships with animals.

[0415] For example, when the pet robot is deemed lonely, the user's device will receive advice such as, "Let's spend some time enjoying some relaxation items." This provides the user with emotional support and enriches their communication with their pet.

[0416] An example of a prompt message is, "If you record the sounds of a pet robot and perform voice emotion analysis, what emotions will be detected?"

[0417] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0418] Step 1:

[0419] The device records animal sounds through a microphone. The input is audio data, which is then sent to a server. The audio data is first digitized to a format necessary for subsequent analysis.

[0420] Step 2:

[0421] The server uses speech recognition technology to convert the received audio data into text data. It analyzes the waveform information of the audio and outputs it as text data. This text data is then used as input data for emotion prediction.

[0422] Step 3:

[0423] The server inputs the converted text data into a machine learning model to predict the animal's emotions. Based on past training data, this model determines the relationship between voice patterns and emotions and outputs emotion labels such as "lonely" or "happy."

[0424] Step 4:

[0425] The device uses data collected through the user's voice and camera to recognize the user's emotions in real time. This data is then sent to a server as an emotional state, such as "anxiety" or "joy," using a voice analysis engine and facial recognition software.

[0426] Step 5:

[0427] The server combines animal emotion data and user emotion data, and uses a generative AI model to generate optimal advice for the user. Based on the input emotion data, it constructs an advice sentence and sends it back to the terminal in text format.

[0428] Step 6:

[0429] The device displays the received advice through a user interface. Users can visually review the advice and obtain specific guidance for optimal communication with animals.

[0430] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0431] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0432] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0433] [Third Embodiment]

[0434] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0435] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0436] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0437] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0438] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0439] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0440] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0441] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0442] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0443] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0444] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0445] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0446] The system of this invention aims to analyze animal sounds and understand their emotions, and is achieved through the following functions.

[0447] Voice acquisition and transmission by the device

[0448] The user uses a device to record the sounds their pet makes. A dedicated application is installed on the device, and this application is responsible for transmitting the animal's sounds to a server over the internet. The device compresses the audio data and transmits it securely, preventing data loss during transmission.

[0449] Server-based voice analysis and emotion prediction

[0450] The server analyzes the received audio data using speech recognition technology and then runs a machine learning model based on that analysis. The machine learning model has been pre-trained on a large amount of animal vocal data and has the ability to predict animal emotions from the characteristics of the vocals. After the model predicts an emotion, it compares the result with past cases in the database using a matching mechanism to improve accuracy.

[0451] Advice and product suggestions

[0452] Based on predicted emotions and comparisons with past data, the server derives advice and recommendations, which are then communicated to the user via the device. A specific application is to consider a case where a dog barks repeatedly. The server analyzes the voice and, if it predicts the dog is feeling anxious, provides advice to the user via the device, such as "pet the dog to reassure it" or "give it a treat." It also suggests specific treats or toys, offering concrete ways to soothe the dog.

[0453] This system helps users understand their animals' emotions and respond appropriately, thereby supporting a better relationship with their pets.

[0454] The following describes the processing flow.

[0455] Step 1:

[0456] The user launches a dedicated application on their device and uses the recording function to record their pet's barking. Once recording is complete, pressing the send button on the application prepares the recorded audio data for transmission to the server.

[0457] Step 2:

[0458] The device temporarily stores the recorded audio data, compresses and encrypts it, and then sends the audio data to the server via the internet. During transmission, it checks network stability and monitors the data until it receives a response indicating successful transmission.

[0459] Step 3:

[0460] The server processes the audio data received from the terminal using a speech recognition library, converting the audio data into text and extracting necessary audio features. This prepares the key features of animal voices for analysis.

[0461] Step 4:

[0462] The server uses a machine learning model to analyze voice feature data and predict animal emotions. The model is based on a large amount of training data and uses a proprietary algorithm to make accurate emotion predictions.

[0463] Step 5:

[0464] The server retrieves predicted sentiment data and compares it with past cases in its internal database. This allows it to verify results based on similar cases and improve the accuracy of sentiment determination.

[0465] Step 6:

[0466] The server generates advice for the user based on the matching results and predicted emotions. This advice includes specific recommended actions tailored to the pet's condition. If necessary, a list of suggested pet supplies is also generated.

[0467] Step 7:

[0468] The server sends generated advice and suggestions to the terminal, and the terminal displays the received information to the user within the application. Because visual information and notifications are provided instantly through the user interface, users can respond in a timely manner.

[0469] (Example 1)

[0470] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0471] Understanding an animal's emotions from its vocalizations and providing appropriate responses is essential for building a good relationship with pets. However, technologies for analyzing animal vocalizations and predicting their emotions are still limited, making it difficult to accurately grasp emotions and provide appropriate advice. Furthermore, there is a need for efficient systems to safely and accurately process animal vocal data.

[0472] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0473] In this invention, the server includes an input means for acquiring and digitizing animal sounds, a transmission means for compressing the acquired animal sounds and securely transmitting them to an information processing device, and an analysis means for analyzing the digitized sounds using speech recognition technology and extracting feature quantities. This makes it possible to accurately understand the emotions of animals and provide users with appropriate advice and supplies.

[0474] An "input device" is a device for acquiring and digitizing sounds emitted by animals.

[0475] A "transmission means" is a device that has the function of compressing the acquired animal voice data and securely transmitting it to an information processing device.

[0476] An "analysis means" is a device that has the function of analyzing acquired audio data using speech recognition technology and extracting characteristic features of the audio.

[0477] A "prediction device" is a device that uses extracted audio features to run a machine learning model and predict the emotions of animals.

[0478] A "matching device" is a device that has the function of comparing predicted emotions with past examples in a database to improve the accuracy of predictions.

[0479] A "presentation means" is a device that provides users with guiding information about animal emotions based on the matching results.

[0480] A "recommendation device" is a device that has the function of suggesting items related to guidance information to the user.

[0481] This invention is a system for analyzing animal vocalizations and predicting their emotions. Specific embodiments of this system are described below.

[0482] The device has a recording device and a dedicated application installed to capture animal sounds. This application has the function to compress the recorded sound and send it to a server over the internet. At this stage, the audio data is digitized using audio compression technologies such as MP3 or AAC and transmitted securely.

[0483] The server receives audio data transmitted from the terminal and analyzes it using speech recognition technology. The server extracts phonemes and performs spectral analysis to clarify the features of the speech. Once the features are extracted, the server inputs them into a machine learning model to predict the animal's emotions. This model is pre-trained using a large amount of animal speech data and exhibits high accuracy in emotion prediction.

[0484] Furthermore, the server compares the predicted sentiment with multiple past case databases. This further improves the accuracy of the prediction and generates more reliable results. The server then generates advice and recommendations based on the comparison results and notifies the user through the terminal. The terminal presents the information to the user visually through the application.

[0485] As a concrete example, consider a case where a user records their dog's voice and sends it to the server. In this case, the server predicts the dog's emotion as "excited" and provides advice to the user, such as "create a relaxing environment" or "use training toys."

[0486] An example of a prompt to be input to the generative AI model is: "Predict the emotion of the dog from the recorded audio: anxiety, excitement, or hunger. Based on the emotion prediction, suggest an appropriate course of action." This prompt allows the server to generate useful information for the user and help them derive appropriate actions.

[0487] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0488] Step 1:

[0489] The user records animal sounds using a device. The input is the sounds the animal makes, and the output is a digitized audio file. Audio acquisition begins by activating the recording function on the device and pressing the record button. This data is saved within a dedicated application.

[0490] Step 2:

[0491] The device compresses the recorded audio data and prepares it for transmission. The input is a digitized audio file, and the output is compressed audio data. A dedicated application within the device compresses the audio file into MP3 or AAC format and prepares it for transmission using a secure transmission protocol. This process reduces the amount of data and improves transmission efficiency.

[0492] Step 3:

[0493] The terminal sends compressed audio data to the server. The input is the compressed audio data, and the output is a confirmation that the data transfer to the server is complete. The HTTPS protocol is used for transmission to ensure data encryption and security, thereby reducing the risk of data loss.

[0494] Step 4:

[0495] The server analyzes the received audio data. The input is compressed audio data, and the output is audio features. The server uses speech recognition technology to perform phoneme extraction and spectral analysis to extract audio features. These features are then used as input for the next step.

[0496] Step 5:

[0497] The server inputs features into a machine learning model to predict animal emotions. The input is audio features, and the output is predicted data about the animal's emotions. A pre-trained generative AI model is run to predict emotions based on the audio features.

[0498] Step 6:

[0499] The server compares predicted emotions against a historical database. The input is predicted emotion data, and the output is an improved emotion prediction result. By comparing emotion data with examples in the database, it improves accuracy and generates reliable results.

[0500] Step 7:

[0501] The server generates advice for the user based on improved prediction results. The input is the improved sentiment prediction result, and the output is advice data. The advice and recommendations generated by the server are compiled and provided to the user in the next step.

[0502] Step 8:

[0503] The device receives advice from the server and presents it visually to the user. The input is advice data sent from the server, and the output is visual advice displayed on the screen. Through the device's application, it provides users with specific actionable guidelines to support their relationship with their pets.

[0504] (Application Example 1)

[0505] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0506] In households with pets, understanding and appropriately addressing the emotions of animals is essential, but there are insufficient effective means to do so easily. Conventional technologies suffer from the problem of being time-consuming and laborious in accurately analyzing pet vocalizations, identifying emotions, and providing advice to users. Furthermore, it is difficult for users to select the appropriate products at the right time. A new system is needed to efficiently solve these challenges.

[0507] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0508] In this invention, the server includes receiving means for acquiring animal sounds, analysis means for analyzing the animal sounds and predicting emotions, and comparison means for comparing the predicted emotions with past case data. This allows users to instantly understand their pet's emotions and use that information to help them respond appropriately and choose the right supplies.

[0509] "Animal sounds" refers to the various types of sounds and noises that animals make, and these sounds may contain emotions or states of mind.

[0510] "Receiving means" refers to devices or functions that effectively acquire animal sounds and appropriately incorporate that data into the system.

[0511] "Analysis means" refers to devices or methods that perform detailed analysis of acquired animal sounds and process the results to predict the animal's emotions.

[0512] A "machine learning model" refers to a computational model that learns from large amounts of data and can make predictions and classifications about unknown data.

[0513] "Comparison methods" refer to the process of comparing predicted emotions with a database of previously accumulated data to verify their accuracy and reliability.

[0514] "Notification means" refers to the means of providing users with necessary information and advice based on the results of analysis and comparison.

[0515] "Suggestion mechanisms" refer to systems that specifically recommend the necessary items or actions that users need when taking action.

[0516] "Display means" refers to devices or functions that visually present information on a user interface, making it easy for users to understand.

[0517] An "artificial intelligence model" refers to a computational model that performs processing that mimics human intelligence, such as recognizing, classifying, and predicting data patterns.

[0518] This system analyzes pet sounds and predicts their emotions. First, the user receives animal sounds using a device such as a smartphone or consumer robot. The device has a built-in dedicated receiving mechanism that can effectively capture animal sounds. The audio data is then transmitted to a server via the internet.

[0519] The server processes the received audio data using analysis tools and predicts the animal's emotions using a machine learning model. Specifically, the server uses an AI platform such as TensorFlow to input the data into a pre-trained machine learning model for recognition and analysis. This allows the server to predict the animal's emotions with high accuracy.

[0520] Subsequently, the server compares the predicted emotions with a large amount of past case data to verify the accuracy of the results. Based on the results obtained from the comparison, a dedicated notification system is activated to inform the user of necessary information and advice. The notifications are displayed on the user's smartphone or the user interface of a consumer robot in a visually easy-to-understand format.

[0521] Furthermore, the server suggests products to the user that are appropriate for the predicted condition of the animal. The suggestion system displays automatically selected recommended products, making it easier for the user to respond to the animal's condition and choose the right products.

[0522] For example, if a dog barks frequently, and analysis predicts the dog is anxious, the user might be advised to "pet the dog to calm it down" or "give it a specific treat." Another example of a prompt to the generative AI model could be, "Analyze the dog's barking and suggest specific countermeasures if the cause is stress."

[0523] Thus, this system is a useful tool for understanding an animal's emotions through its vocalizations and supporting appropriate responses.

[0524] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0525] Step 1:

[0526] Audio acquisition by the device

[0527] The user records animal sounds using a device. The device is equipped with a receiving mechanism that acquires sound using a dedicated microphone, and converts the acquired sound data into a digital format. This data is stored for subsequent processing. The input is analog animal sounds, and the output is digitized sound data.

[0528] Step 2:

[0529] Sending audio data

[0530] The device transmits recorded audio data to the server via the internet. The audio data is compressed while maintaining sound quality and securely transferred to the server. The input is digitized audio data, which is compressed during transmission. The output is a confirmation of the transmission of the compressed audio data.

[0531] Step 3:

[0532] Analysis of audio data

[0533] The server processes the received audio data using an analysis tool. Here, speech recognition and emotion prediction are performed by using TensorFlow, an AI platform, to run a machine learning model. The input is compressed audio data, and the output is the predicted emotion of an animal.

[0534] Step 4:

[0535] Comparison with past cases

[0536] The server compares the predicted animal's emotion with past case data. This comparison method improves the accuracy and reliability of the prediction. The input is the predicted emotion data, and the output is the improved emotion prediction result.

[0537] Step 5:

[0538] Advice notification

[0539] The server provides the user with useful advice based on the comparison results. This information is displayed on the terminal's user interface via a notification system. The input is an improved sentiment prediction result, and the output is specific advice for the user.

[0540] Step 6:

[0541] Display of proposals

[0542] The server suggests appropriate items based on predicted emotions. These suggestions are visually presented to the user using a suggestion mechanism. The input is an improved emotion prediction result, and the output is item suggestion information for the user.

[0543] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0544] This invention provides a system that offers more accurate advice by performing both animal emotion analysis and user emotion recognition. This system is implemented using the following components:

[0545] Analysis of animal sounds and prediction of emotions

[0546] The user records animal sounds using a device. The device sends the recording data to a server, which uses speech recognition technology to transcribe the speech into text. This information is analyzed by an analytical tool to predict the animal's emotions. A machine learning model supports this process, accurately determining the animal's state.

[0547] User emotion recognition

[0548] The device is equipped with an emotion engine that analyzes the user's emotions in real time. The emotion engine analyzes the user's voice and facial expression data to recognize their current emotional state. The server receives this information and determines the relationship between the user's emotions and the emotions of the animals.

[0549] Customized advice and suggestions

[0550] The server generates advice for the user based on both sets of sentiment data it has acquired. This advice is specific and situational, allowing it to suggest more careful responses if the user's sentiment is negative, or provide additional information or product suggestions if it is positive. Furthermore, it leverages learning from successful cases in similar situations through comparison with past data.

[0551] Specific example

[0552] For example, suppose a user records their pet's barking while feeling anxious, and the server analyzes the pet's voice as indicating "loneliness." In this case, the emotion engine provides specific suggestions to alleviate the user's anxiety, such as "spend more time with your pet" or "use pet-specific relaxation products." This reduces the psychological burden on the user when taking action and provides support for building a better relationship.

[0553] This system allows users to not only engage in rich, two-way communication with animals, but also to select the optimal response that takes their own emotional state into consideration.

[0554] The following describes the processing flow.

[0555] Step 1:

[0556] The user launches the application on their device and records animal sounds. Once recording is complete, they press the send button to prepare the audio data for transmission to the server.

[0557] Step 2:

[0558] The device compresses the recorded audio data and encrypts it for secure transmission. It then sends the audio data to the server via the internet. During this process, the device monitors the transmission progress and confirms that the transmission was successful.

[0559] Step 3:

[0560] The server analyzes the audio data received from the terminal. Using speech recognition technology, the data is converted into text, and a machine learning model is applied to predict the animal's emotions. This model performs highly accurate emotion estimation based on past training data.

[0561] Step 4:

[0562] Simultaneously, the device activates an emotion engine to recognize the user's emotions. It collects the user's voice and facial expression data and analyzes their emotional state. This data is transmitted to the server in real time.

[0563] Step 5:

[0564] The server integrates emotional data from both the animal and the user, generating advice that takes into account their emotional states. It also compares this data with past database data, drawing on effective responses in similar situations. This process helps to create the optimal action plan for the user.

[0565] Step 6:

[0566] The server sends the generated advice to the terminal. The advice includes suggested pet supplies and action plans, as needed.

[0567] Step 7:

[0568] The device displays received advice and suggestions in its user interface. The user interface presents the advice visually in an easy-to-understand manner, helping the user take the necessary actions immediately.

[0569] Step 8:

[0570] Users begin taking action based on the advice provided and, if necessary, use the suggested products to care for their pets. Through this process, users can deepen their communication with animals and improve the emotional state of both parties.

[0571] (Example 2)

[0572] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0573] There is a lack of systems that can properly understand animal emotions and provide appropriate advice to users. Furthermore, there is no means to provide specific suggestions for properly caring for animals while considering the user's own emotional state. This makes it difficult for users to comprehensively understand their own and their pet's emotions and take appropriate action.

[0574] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0575] In this invention, the server includes a conversion means that converts animal sounds into text data using speech recognition technology, an analysis means that predicts the animal's emotions based on the converted data, and a recognition means that acquires and analyzes the user's emotional data in real time. This enables a comprehensive understanding of the emotions of both the animal and the user, and allows for the provision of accurate advice to the user.

[0576] "Receiving means" refers to a server or terminal function that directly receives animal sounds as input and processes them.

[0577] "Conversion means" refers to a part of a system that converts received audio information into text data using speech recognition technology.

[0578] "Analysis means" refers to a component that has the function of predicting animal emotions using a machine learning model based on text data.

[0579] "Recognition means" refers to components of a system that acquires and analyzes the user's voice and facial expression data in real time to understand their emotional state.

[0580] The "relevance determination means" is an analytical device that integrates animal and user emotional data and confirms the relationships between them.

[0581] A "comparison device" is a functional device that compares past case information with current sentiment data to derive useful information.

[0582] A "guidance means" is a display device that includes an interface for providing integrated advice to the user.

[0583] "Supply means" refers to components of a system that proposes relevant tools and products to users based on advice.

[0584] This invention is a system that analyzes animal sounds and user emotions to provide comprehensive advice. The system mainly consists of a server and terminals.

[0585] The user records the sounds of animals, such as pets, using the microphone on their device. The device then sends this audio data to a server via the internet. Typically, a smartphone or tablet is used as the device for this purpose.

[0586] The server uses speech recognition technology to convert the audio data into text. Open-source speech recognition systems are used here. The converted text data is then input into a machine learning model to analyze the animal's emotions. This machine learning model, built using tools such as TensorFlow, is capable of predicting the animal's emotions with high accuracy.

[0587] Meanwhile, user emotions are captured as data through the device's camera and microphone and analyzed by an emotion engine. This emotion analysis utilizes computer vision libraries and speech analysis libraries to determine the user's emotional state in real time from the user's voice and facial expression data. This information is also sent to a server and integrated with the animal's emotion data.

[0588] The server analyzes this data using relevant decision-making tools, compares it with past case information, and generates integrated advice. A generative AI model is used to generate this advice; for example, an OpenAI model may be used. By inputting prompts such as "The animal is feeling anxious, and the user is also anxious" into this AI model, it generates specific advice tailored to the situation.

[0589] The generated advice is presented to the user through the device. Based on this advice, the user can improve their relationship with their pet and take appropriate action considering their own emotional state. For example, specific suggestions such as "spend more time with your pet" or "use relaxation products" can help reduce the user's anxiety.

[0590] In this way, this system comprehensively understands the emotions of both the user and the animal, supporting better communication for both parties.

[0591] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0592] Step 1:

[0593] The user records animal sounds using the device's microphone. This recording becomes the input. Specifically, the user presses the record button and records for a set period of time. Once the recording is finished, it is saved to the device as audio data.

[0594] Step 2:

[0595] The terminal sends recorded audio data to the server. An internet connection is used for transmission. The input is the pre-recorded audio data, and the output is a network message sent to the server. Specifically, data conversion and encryption are performed, and the data is sent to the server using the HTTP protocol.

[0596] Step 3:

[0597] The server converts the received audio data into text using speech recognition technology. The input here is the audio data received by the server. The output is the converted text data. Specifically, it uses a speech recognition library to analyze the audio waveform and convert it into text in the corresponding language.

[0598] Step 4:

[0599] The server predicts the animal's emotions based on the converted text data. The input is text data obtained from speech, and the output is predicted animal emotion information. Specifically, a machine learning model analyzes the text and generates confidence scores for a set of emotion categories.

[0600] Step 5:

[0601] The device analyzes the user's voice and facial expression data in real time using an emotion engine to recognize the user's emotions. Input is the user's video and audio data acquired from the camera or microphone, and output is the user's emotional information. Image processing and audio analysis are performed during specific actions to extract emotional characteristics.

[0602] Step 6:

[0603] The server integrates animal emotion data and user emotion data. The input is animal emotion information and user emotion information, and the output is integrated data showing their relationships. Specifically, a relationship analysis algorithm is applied to calculate the strength of the relationship, taking into account the emotional states of both entities.

[0604] Step 7:

[0605] The server generates advice for the user based on integrated data using a generative AI model. The input is integrated data, and the output is the generated advice. Specifically, it provides the AI model with prompts to generate candidate advice, and then selects the best one.

[0606] Step 8:

[0607] The terminal receives advice generated from the server and presents it to the user. The input is the advice received from the server, and the output is what is displayed to the user. Specifically, the advice is displayed in the user interface, and the user can take action based on it.

[0608] (Application Example 2)

[0609] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0610] While systems exist that detect animal emotions and provide relevant advice, they have historically offered one-sided suggestions without simultaneously considering the user's emotions, thus failing to provide appropriate support tailored to the user's psychological state. Therefore, there is a need for technology that more accurately supports two-way communication between animals and users.

[0611] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0612] In this invention, the server includes a receiving means for receiving animal sounds as input, an analysis means for analyzing animal sounds and predicting emotions, and a recognition means for recognizing the user's emotions in real time. This makes it possible to simultaneously analyze the emotions of both the animal and the user and provide specific and situational advice tailored to the state of both.

[0613] "Receiving means" refers to devices or methods for receiving animal sounds as input.

[0614] "Analysis means" refers to devices or methods used to analyze animal sounds and predict their emotions.

[0615] "Recognition means" refers to devices or methods for recognizing a user's emotions in real time.

[0616] A "comparison tool" is a device or method for matching a set of past case data with predicted emotions.

[0617] "Presentation means" refers to devices or methods for providing advice to a user.

[0618] "Selection means" refers to devices or methods for suggesting products related to advice.

[0619] The system that realizes this invention collects and analyzes animal sounds and grasps the user's emotions in real time, thereby providing optimal advice based on the emotional states of both parties. The system mainly includes a terminal for receiving animal sounds, a server for analysis and recognition, and a terminal for providing information to the user.

[0620] The server uses libraries such as Python's sounddevice to record audio data for analysis and predicts emotions using machine learning models. Specific emotion analysis results are returned as predictions, such as "lonely." Additionally, user emotions are collected by the device via the camera and microphone and recognized by the emotion engine.

[0621] Based on this data, the server utilizes a generative AI model to provide optimal advice to the user. The user's device displays this advice through an interface, making it easy for the user to understand. This helps users build better relationships with animals.

[0622] For example, when the pet robot is deemed lonely, the user's device will receive advice such as, "Let's spend some time enjoying some relaxation items." This provides the user with emotional support and enriches their communication with their pet.

[0623] An example of a prompt message is, "If you record the sounds of a pet robot and perform voice emotion analysis, what emotions will be detected?"

[0624] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0625] Step 1:

[0626] The device records animal sounds through a microphone. The input is audio data, which is then sent to a server. The audio data is first digitized to a format necessary for subsequent analysis.

[0627] Step 2:

[0628] The server uses speech recognition technology to convert the received audio data into text data. It analyzes the waveform information of the audio and outputs it as text data. This text data is then used as input data for emotion prediction.

[0629] Step 3:

[0630] The server inputs the converted text data into a machine learning model to predict the animal's emotions. Based on past training data, this model determines the relationship between voice patterns and emotions and outputs emotion labels such as "lonely" or "happy."

[0631] Step 4:

[0632] The device uses data collected through the user's voice and camera to recognize the user's emotions in real time. This data is then sent to a server as an emotional state, such as "anxiety" or "joy," using a voice analysis engine and facial recognition software.

[0633] Step 5:

[0634] The server combines animal emotion data and user emotion data, and uses a generative AI model to generate optimal advice for the user. Based on the input emotion data, it constructs an advice sentence and sends it back to the terminal in text format.

[0635] Step 6:

[0636] The device displays the received advice through a user interface. Users can visually review the advice and obtain specific guidance for optimal communication with animals.

[0637] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0638] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0639] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0640] [Fourth Embodiment]

[0641] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0642] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0643] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0644] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0645] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0646] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0647] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0648] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0649] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0650] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0651] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0652] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0653] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0654] The system of this invention aims to analyze animal sounds and understand their emotions, and is achieved through the following functions.

[0655] Voice acquisition and transmission by the device

[0656] The user uses a device to record the sounds their pet makes. A dedicated application is installed on the device, and this application is responsible for transmitting the animal's sounds to a server over the internet. The device compresses the audio data and transmits it securely, preventing data loss during transmission.

[0657] Server-based voice analysis and emotion prediction

[0658] The server analyzes the received audio data using speech recognition technology and then runs a machine learning model based on that analysis. The machine learning model has been pre-trained on a large amount of animal vocal data and has the ability to predict animal emotions from the characteristics of the vocals. After the model predicts an emotion, it compares the result with past cases in the database using a matching mechanism to improve accuracy.

[0659] Advice and product suggestions

[0660] Based on predicted emotions and comparisons with past data, the server derives advice and recommendations, which are then communicated to the user via the device. A specific application is to consider a case where a dog barks repeatedly. The server analyzes the voice and, if it predicts the dog is feeling anxious, provides advice to the user via the device, such as "pet the dog to reassure it" or "give it a treat." It also suggests specific treats or toys, offering concrete ways to soothe the dog.

[0661] This system helps users understand their animals' emotions and respond appropriately, thereby supporting a better relationship with their pets.

[0662] The following describes the processing flow.

[0663] Step 1:

[0664] The user launches a dedicated application on their device and uses the recording function to record their pet's barking. Once recording is complete, pressing the send button on the application prepares the recorded audio data for transmission to the server.

[0665] Step 2:

[0666] The device temporarily stores the recorded audio data, compresses and encrypts it, and then sends the audio data to the server via the internet. During transmission, it checks network stability and monitors the data until it receives a response indicating successful transmission.

[0667] Step 3:

[0668] The server processes the audio data received from the terminal using a speech recognition library, converting the audio data into text and extracting necessary audio features. This prepares the key features of animal voices for analysis.

[0669] Step 4:

[0670] The server uses a machine learning model to analyze voice feature data and predict animal emotions. The model is based on a large amount of training data and uses a proprietary algorithm to make accurate emotion predictions.

[0671] Step 5:

[0672] The server retrieves predicted sentiment data and compares it with past cases in its internal database. This allows it to verify results based on similar cases and improve the accuracy of sentiment determination.

[0673] Step 6:

[0674] The server generates advice for the user based on the matching results and predicted emotions. This advice includes specific recommended actions tailored to the pet's condition. If necessary, a list of suggested pet supplies is also generated.

[0675] Step 7:

[0676] The server sends generated advice and suggestions to the terminal, and the terminal displays the received information to the user within the application. Because visual information and notifications are provided instantly through the user interface, users can respond in a timely manner.

[0677] (Example 1)

[0678] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0679] Understanding an animal's emotions from its vocalizations and providing appropriate responses is essential for building a good relationship with pets. However, technologies for analyzing animal vocalizations and predicting their emotions are still limited, making it difficult to accurately grasp emotions and provide appropriate advice. Furthermore, there is a need for efficient systems to safely and accurately process animal vocal data.

[0680] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0681] In this invention, the server includes an input means for acquiring and digitizing animal sounds, a transmission means for compressing the acquired animal sounds and securely transmitting them to an information processing device, and an analysis means for analyzing the digitized sounds using speech recognition technology and extracting feature quantities. This makes it possible to accurately understand the emotions of animals and provide users with appropriate advice and supplies.

[0682] An "input device" is a device for acquiring and digitizing sounds emitted by animals.

[0683] A "transmission means" is a device that has the function of compressing the acquired animal voice data and securely transmitting it to an information processing device.

[0684] An "analysis means" is a device that has the function of analyzing acquired audio data using speech recognition technology and extracting characteristic features of the audio.

[0685] A "prediction device" is a device that uses extracted audio features to run a machine learning model and predict the emotions of animals.

[0686] A "matching device" is a device that has the function of comparing predicted emotions with past examples in a database to improve the accuracy of predictions.

[0687] A "presentation means" is a device that provides users with guiding information about animal emotions based on the matching results.

[0688] A "recommendation device" is a device that has the function of suggesting items related to guidance information to the user.

[0689] This invention is a system for analyzing animal vocalizations and predicting their emotions. Specific embodiments of this system are described below.

[0690] The device has a recording device and a dedicated application installed to capture animal sounds. This application has the function to compress the recorded sound and send it to a server over the internet. At this stage, the audio data is digitized using audio compression technologies such as MP3 or AAC and transmitted securely.

[0691] The server receives audio data transmitted from the terminal and analyzes it using speech recognition technology. The server extracts phonemes and performs spectral analysis to clarify the features of the speech. Once the features are extracted, the server inputs them into a machine learning model to predict the animal's emotions. This model is pre-trained using a large amount of animal speech data and exhibits high accuracy in emotion prediction.

[0692] Furthermore, the server compares the predicted sentiment with multiple past case databases. This further improves the accuracy of the prediction and generates more reliable results. The server then generates advice and recommendations based on the comparison results and notifies the user through the terminal. The terminal presents the information to the user visually through the application.

[0693] As a concrete example, consider a case where a user records their dog's voice and sends it to the server. In this case, the server predicts the dog's emotion as "excited" and provides advice to the user, such as "create a relaxing environment" or "use training toys."

[0694] An example of a prompt to be input to the generative AI model is: "Predict the emotion of the dog from the recorded audio: anxiety, excitement, or hunger. Based on the emotion prediction, suggest an appropriate course of action." This prompt allows the server to generate useful information for the user and help them derive appropriate actions.

[0695] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0696] Step 1:

[0697] The user records animal sounds using a device. The input is the sounds the animal makes, and the output is a digitized audio file. Audio acquisition begins by activating the recording function on the device and pressing the record button. This data is saved within a dedicated application.

[0698] Step 2:

[0699] The device compresses the recorded audio data and prepares it for transmission. The input is a digitized audio file, and the output is compressed audio data. A dedicated application within the device compresses the audio file into MP3 or AAC format and prepares it for transmission using a secure transmission protocol. This process reduces the amount of data and improves transmission efficiency.

[0700] Step 3:

[0701] The terminal sends compressed audio data to the server. The input is the compressed audio data, and the output is a confirmation that the data transfer to the server is complete. The HTTPS protocol is used for transmission to ensure data encryption and security, thereby reducing the risk of data loss.

[0702] Step 4:

[0703] The server analyzes the received audio data. The input is compressed audio data, and the output is audio features. The server uses speech recognition technology to perform phoneme extraction and spectral analysis to extract audio features. These features are then used as input for the next step.

[0704] Step 5:

[0705] The server inputs features into a machine learning model to predict animal emotions. The input is audio features, and the output is predicted data about the animal's emotions. A pre-trained generative AI model is run to predict emotions based on the audio features.

[0706] Step 6:

[0707] The server compares predicted emotions against a historical database. The input is predicted emotion data, and the output is an improved emotion prediction result. By comparing emotion data with examples in the database, it improves accuracy and generates reliable results.

[0708] Step 7:

[0709] The server generates advice for the user based on improved prediction results. The input is the improved sentiment prediction result, and the output is advice data. The advice and recommendations generated by the server are compiled and provided to the user in the next step.

[0710] Step 8:

[0711] The device receives advice from the server and presents it visually to the user. The input is advice data sent from the server, and the output is visual advice displayed on the screen. Through the device's application, it provides users with specific actionable guidelines to support their relationship with their pets.

[0712] (Application Example 1)

[0713] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0714] In households with pets, understanding and appropriately addressing the emotions of animals is essential, but there are insufficient effective means to do so easily. Conventional technologies suffer from the problem of being time-consuming and laborious in accurately analyzing pet vocalizations, identifying emotions, and providing advice to users. Furthermore, it is difficult for users to select the appropriate products at the right time. A new system is needed to efficiently solve these challenges.

[0715] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0716] In this invention, the server includes receiving means for acquiring animal sounds, analysis means for analyzing the animal sounds and predicting emotions, and comparison means for comparing the predicted emotions with past case data. This allows users to instantly understand their pet's emotions and use that information to help them respond appropriately and choose the right supplies.

[0717] "Animal sounds" refers to the various types of sounds and noises that animals make, and these sounds may contain emotions or states of mind.

[0718] "Receiving means" refers to devices or functions that effectively acquire animal sounds and appropriately incorporate that data into the system.

[0719] "Analysis means" refers to devices or methods that perform detailed analysis of acquired animal sounds and process the results to predict the animal's emotions.

[0720] A "machine learning model" refers to a computational model that learns from large amounts of data and can make predictions and classifications about unknown data.

[0721] "Comparison methods" refer to the process of comparing predicted emotions with a database of previously accumulated data to verify their accuracy and reliability.

[0722] "Notification means" refers to the means of providing users with necessary information and advice based on the results of analysis and comparison.

[0723] "Suggestion mechanisms" refer to systems that specifically recommend the necessary items or actions that users need when taking action.

[0724] "Display means" refers to devices or functions that visually present information on a user interface, making it easy for users to understand.

[0725] An "artificial intelligence model" refers to a computational model that performs processing that mimics human intelligence, such as recognizing, classifying, and predicting data patterns.

[0726] This system analyzes pet sounds and predicts their emotions. First, the user receives animal sounds using a device such as a smartphone or consumer robot. The device has a built-in dedicated receiving mechanism that can effectively capture animal sounds. The audio data is then transmitted to a server via the internet.

[0727] The server processes the received audio data using analysis tools and predicts the animal's emotions using a machine learning model. Specifically, the server uses an AI platform such as TensorFlow to input the data into a pre-trained machine learning model for recognition and analysis. This allows the server to predict the animal's emotions with high accuracy.

[0728] Subsequently, the server compares the predicted emotions with a large amount of past case data to verify the accuracy of the results. Based on the results obtained from the comparison, a dedicated notification system is activated to inform the user of necessary information and advice. The notifications are displayed on the user's smartphone or the user interface of a consumer robot in a visually easy-to-understand format.

[0729] Furthermore, the server suggests products to the user that are appropriate for the predicted condition of the animal. The suggestion system displays automatically selected recommended products, making it easier for the user to respond to the animal's condition and choose the right products.

[0730] For example, if a dog barks frequently, and analysis predicts the dog is anxious, the user might be advised to "pet the dog to calm it down" or "give it a specific treat." Another example of a prompt to the generative AI model could be, "Analyze the dog's barking and suggest specific countermeasures if the cause is stress."

[0731] Thus, this system is a useful tool for understanding an animal's emotions through its vocalizations and supporting appropriate responses.

[0732] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0733] Step 1:

[0734] Audio acquisition by the device

[0735] The user records animal sounds using a device. The device is equipped with a receiving mechanism that acquires sound using a dedicated microphone, and converts the acquired sound data into a digital format. This data is stored for subsequent processing. The input is analog animal sounds, and the output is digitized sound data.

[0736] Step 2:

[0737] Sending audio data

[0738] The device transmits recorded audio data to the server via the internet. The audio data is compressed while maintaining sound quality and securely transferred to the server. The input is digitized audio data, which is compressed during transmission. The output is a confirmation of the transmission of the compressed audio data.

[0739] Step 3:

[0740] Analysis of audio data

[0741] The server processes the received audio data using an analysis tool. Here, speech recognition and emotion prediction are performed by using TensorFlow, an AI platform, to run a machine learning model. The input is compressed audio data, and the output is the predicted emotion of an animal.

[0742] Step 4:

[0743] Comparison with past cases

[0744] The server compares the predicted animal's emotion with past case data. This comparison method improves the accuracy and reliability of the prediction. The input is the predicted emotion data, and the output is the improved emotion prediction result.

[0745] Step 5:

[0746] Advice notification

[0747] The server provides the user with useful advice based on the comparison results. This information is displayed on the terminal's user interface via a notification system. The input is an improved sentiment prediction result, and the output is specific advice for the user.

[0748] Step 6:

[0749] Display of proposals

[0750] The server suggests appropriate items based on predicted emotions. These suggestions are visually presented to the user using a suggestion mechanism. The input is an improved emotion prediction result, and the output is item suggestion information for the user.

[0751] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0752] This invention provides a system that offers more accurate advice by performing both animal emotion analysis and user emotion recognition. This system is implemented using the following components:

[0753] Analysis of animal sounds and prediction of emotions

[0754] The user records animal sounds using a device. The device sends the recording data to a server, which uses speech recognition technology to transcribe the speech into text. This information is analyzed by an analytical tool to predict the animal's emotions. A machine learning model supports this process, accurately determining the animal's state.

[0755] User emotion recognition

[0756] The device is equipped with an emotion engine that analyzes the user's emotions in real time. The emotion engine analyzes the user's voice and facial expression data to recognize their current emotional state. The server receives this information and determines the relationship between the user's emotions and the emotions of the animals.

[0757] Customized advice and suggestions

[0758] The server generates advice for the user based on both sets of sentiment data it has acquired. This advice is specific and situational, allowing it to suggest more careful responses if the user's sentiment is negative, or provide additional information or product suggestions if it is positive. Furthermore, it leverages learning from successful cases in similar situations through comparison with past data.

[0759] Specific example

[0760] For example, suppose a user records their pet's barking while feeling anxious, and the server analyzes the pet's voice as indicating "loneliness." In this case, the emotion engine provides specific suggestions to alleviate the user's anxiety, such as "spend more time with your pet" or "use pet-specific relaxation products." This reduces the psychological burden on the user when taking action and provides support for building a better relationship.

[0761] This system allows users to not only engage in rich, two-way communication with animals, but also to select the optimal response that takes their own emotional state into consideration.

[0762] The following describes the processing flow.

[0763] Step 1:

[0764] The user launches the application on their device and records animal sounds. Once recording is complete, they press the send button to prepare the audio data for transmission to the server.

[0765] Step 2:

[0766] The device compresses the recorded audio data and encrypts it for secure transmission. It then sends the audio data to the server via the internet. During this process, the device monitors the transmission progress and confirms that the transmission was successful.

[0767] Step 3:

[0768] The server analyzes the audio data received from the terminal. Using speech recognition technology, the data is converted into text, and a machine learning model is applied to predict the animal's emotions. This model performs highly accurate emotion estimation based on past training data.

[0769] Step 4:

[0770] Simultaneously, the device activates an emotion engine to recognize the user's emotions. It collects the user's voice and facial expression data and analyzes their emotional state. This data is transmitted to the server in real time.

[0771] Step 5:

[0772] The server integrates emotional data from both the animal and the user, generating advice that takes into account their emotional states. It also compares this data with past database data, drawing on effective responses in similar situations. This process helps to create the optimal action plan for the user.

[0773] Step 6:

[0774] The server sends the generated advice to the terminal. The advice includes suggested pet supplies and action plans, as needed.

[0775] Step 7:

[0776] The device displays received advice and suggestions in its user interface. The user interface presents the advice visually in an easy-to-understand manner, helping the user take the necessary actions immediately.

[0777] Step 8:

[0778] Users begin taking action based on the advice provided and, if necessary, use the suggested products to care for their pets. Through this process, users can deepen their communication with animals and improve the emotional state of both parties.

[0779] (Example 2)

[0780] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0781] There is a lack of systems that can properly understand animal emotions and provide appropriate advice to users. Furthermore, there is no means to provide specific suggestions for properly caring for animals while considering the user's own emotional state. This makes it difficult for users to comprehensively understand their own and their pet's emotions and take appropriate action.

[0782] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0783] In this invention, the server includes a conversion means that converts animal sounds into text data using speech recognition technology, an analysis means that predicts the animal's emotions based on the converted data, and a recognition means that acquires and analyzes the user's emotional data in real time. This enables a comprehensive understanding of the emotions of both the animal and the user, and allows for the provision of accurate advice to the user.

[0784] "Receiving means" refers to a server or terminal function that directly receives animal sounds as input and processes them.

[0785] "Conversion means" refers to a part of a system that converts received audio information into text data using speech recognition technology.

[0786] "Analysis means" refers to a component that has the function of predicting animal emotions using a machine learning model based on text data.

[0787] "Recognition means" refers to components of a system that acquires and analyzes the user's voice and facial expression data in real time to understand their emotional state.

[0788] The "relevance determination means" is an analytical device that integrates animal and user emotional data and confirms the relationships between them.

[0789] A "comparison device" is a functional device that compares past case information with current sentiment data to derive useful information.

[0790] A "guidance means" is a display device that includes an interface for providing integrated advice to the user.

[0791] "Supply means" refers to components of a system that proposes relevant tools and products to users based on advice.

[0792] This invention is a system that analyzes animal sounds and user emotions to provide comprehensive advice. The system mainly consists of a server and terminals.

[0793] The user records the sounds of animals, such as pets, using the microphone on their device. The device then sends this audio data to a server via the internet. Typically, a smartphone or tablet is used as the device for this purpose.

[0794] The server uses speech recognition technology to convert the audio data into text. Open-source speech recognition systems are used here. The converted text data is then input into a machine learning model to analyze the animal's emotions. This machine learning model, built using tools such as TensorFlow, is capable of predicting the animal's emotions with high accuracy.

[0795] Meanwhile, user emotions are captured as data through the device's camera and microphone and analyzed by an emotion engine. This emotion analysis utilizes computer vision libraries and speech analysis libraries to determine the user's emotional state in real time from the user's voice and facial expression data. This information is also sent to a server and integrated with the animal's emotion data.

[0796] The server analyzes this data using relevant decision-making tools, compares it with past case information, and generates integrated advice. A generative AI model is used to generate this advice; for example, an OpenAI model may be used. By inputting prompts such as "The animal is feeling anxious, and the user is also anxious" into this AI model, it generates specific advice tailored to the situation.

[0797] The generated advice is presented to the user through the device. Based on this advice, the user can improve their relationship with their pet and take appropriate action considering their own emotional state. For example, specific suggestions such as "spend more time with your pet" or "use relaxation products" can help reduce the user's anxiety.

[0798] In this way, this system comprehensively understands the emotions of both the user and the animal, supporting better communication for both parties.

[0799] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0800] Step 1:

[0801] The user records animal sounds using the device's microphone. This recording becomes the input. Specifically, the user presses the record button and records for a set period of time. Once the recording is finished, it is saved to the device as audio data.

[0802] Step 2:

[0803] The terminal sends recorded audio data to the server. An internet connection is used for transmission. The input is the pre-recorded audio data, and the output is a network message sent to the server. Specifically, data conversion and encryption are performed, and the data is sent to the server using the HTTP protocol.

[0804] Step 3:

[0805] The server converts the received audio data into text using speech recognition technology. The input here is the audio data received by the server. The output is the converted text data. Specifically, it uses a speech recognition library to analyze the audio waveform and convert it into text in the corresponding language.

[0806] Step 4:

[0807] The server predicts the animal's emotions based on the converted text data. The input is text data obtained from speech, and the output is predicted animal emotion information. Specifically, a machine learning model analyzes the text and generates confidence scores for a set of emotion categories.

[0808] Step 5:

[0809] The device analyzes the user's voice and facial expression data in real time using an emotion engine to recognize the user's emotions. Input is the user's video and audio data acquired from the camera or microphone, and output is the user's emotional information. Image processing and audio analysis are performed during specific actions to extract emotional characteristics.

[0810] Step 6:

[0811] The server integrates animal emotion data and user emotion data. The input is animal emotion information and user emotion information, and the output is integrated data showing their relationships. Specifically, a relationship analysis algorithm is applied to calculate the strength of the relationship, taking into account the emotional states of both entities.

[0812] Step 7:

[0813] The server generates advice for the user based on integrated data using a generative AI model. The input is integrated data, and the output is the generated advice. Specifically, it provides the AI model with prompts to generate candidate advice, and then selects the best one.

[0814] Step 8:

[0815] The terminal receives advice generated from the server and presents it to the user. The input is the advice received from the server, and the output is what is displayed to the user. Specifically, the advice is displayed in the user interface, and the user can take action based on it.

[0816] (Application Example 2)

[0817] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0818] While systems exist that detect animal emotions and provide relevant advice, they have historically offered one-sided suggestions without simultaneously considering the user's emotions, thus failing to provide appropriate support tailored to the user's psychological state. Therefore, there is a need for technology that more accurately supports two-way communication between animals and users.

[0819] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0820] In this invention, the server includes a receiving means for receiving animal sounds as input, an analysis means for analyzing animal sounds and predicting emotions, and a recognition means for recognizing the user's emotions in real time. This makes it possible to simultaneously analyze the emotions of both the animal and the user and provide specific and situational advice tailored to the state of both.

[0821] "Receiving means" refers to devices or methods for receiving animal sounds as input.

[0822] "Analysis means" refers to devices or methods used to analyze animal sounds and predict their emotions.

[0823] "Recognition means" refers to devices or methods for recognizing a user's emotions in real time.

[0824] A "comparison tool" is a device or method for matching a set of past case data with predicted emotions.

[0825] "Presentation means" refers to devices or methods for providing advice to a user.

[0826] "Selection means" refers to devices or methods for suggesting products related to advice.

[0827] The system that realizes this invention collects and analyzes animal sounds and grasps the user's emotions in real time, thereby providing optimal advice based on the emotional states of both parties. The system mainly includes a terminal for receiving animal sounds, a server for analysis and recognition, and a terminal for providing information to the user.

[0828] The server uses libraries such as Python's sounddevice to record audio data for analysis and predicts emotions using machine learning models. Specific emotion analysis results are returned as predictions, such as "lonely." Additionally, user emotions are collected by the device via the camera and microphone and recognized by the emotion engine.

[0829] Based on this data, the server utilizes a generative AI model to provide optimal advice to the user. The user's device displays this advice through an interface, making it easy for the user to understand. This helps users build better relationships with animals.

[0830] For example, when the pet robot is deemed lonely, the user's device will receive advice such as, "Let's spend some time enjoying some relaxation items." This provides the user with emotional support and enriches their communication with their pet.

[0831] An example of a prompt message is, "If you record the sounds of a pet robot and perform voice emotion analysis, what emotions will be detected?"

[0832] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0833] Step 1:

[0834] The device records animal sounds through a microphone. The input is audio data, which is then sent to a server. The audio data is first digitized to a format necessary for subsequent analysis.

[0835] Step 2:

[0836] The server uses speech recognition technology to convert the received audio data into text data. It analyzes the waveform information of the audio and outputs it as text data. This text data is then used as input data for emotion prediction.

[0837] Step 3:

[0838] The server inputs the converted text data into a machine learning model to predict the animal's emotions. Based on past training data, this model determines the relationship between voice patterns and emotions and outputs emotion labels such as "lonely" or "happy."

[0839] Step 4:

[0840] The device uses data collected through the user's voice and camera to recognize the user's emotions in real time. This data is then sent to a server as an emotional state, such as "anxiety" or "joy," using a voice analysis engine and facial recognition software.

[0841] Step 5:

[0842] The server combines animal emotion data and user emotion data, and uses a generative AI model to generate optimal advice for the user. Based on the input emotion data, it constructs an advice sentence and sends it back to the terminal in text format.

[0843] Step 6:

[0844] The device displays the received advice through a user interface. Users can visually review the advice and obtain specific guidance for optimal communication with animals.

[0845] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0846] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0847] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0848] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0849] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0850] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0851] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0852] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0853] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0854] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0855] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0856] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0857] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0858] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0859] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0860] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0861] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0862] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0863] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0864] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0865] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0866] The following is further disclosed regarding the embodiments described above.

[0867] (Claim 1)

[0868] An input method that accepts animal sounds as input,

[0869] An analytical means for analyzing the animal's voice and predicting its emotions,

[0870] A matching means for comparing a database of past cases with the predicted emotions,

[0871] A presentation means that provides advice to the user based on the aforementioned matching results,

[0872] A proposal means for suggesting products related to the aforementioned advice,

[0873] A system that includes this.

[0874] (Claim 2)

[0875] The system according to claim 1, wherein the analysis means converts animal sounds into text data using speech recognition technology.

[0876] (Claim 3)

[0877] The system according to claim 1, wherein the presentation means displays the advice through a user interface.

[0878] "Example 1"

[0879] (Claim 1)

[0880] An input method for acquiring and digitizing animal sounds,

[0881] A transmission means for compressing acquired animal sounds and securely transmitting them to an information processing device,

[0882] The aforementioned digitized sound is analyzed using speech recognition technology, and an analysis means is used to extract feature quantities.

[0883] A prediction means that inputs the aforementioned features into a machine learning model to predict the emotions of animals,

[0884] A matching means that compares the predicted emotions with multiple past case databases to improve accuracy,

[0885] A presentation means that provides the user with information to guide them regarding the emotions of animals based on the aforementioned matching results,

[0886] A recommendation means for recommending items related to the aforementioned guidance information,

[0887] A system that includes this.

[0888] (Claim 2)

[0889] The system according to claim 1, wherein the analysis means converts animal sounds into data format using speech recognition technology.

[0890] (Claim 3)

[0891] The system according to claim 1, wherein the presentation means visually presents the guidance information through a user interface.

[0892] "Application Example 1"

[0893] (Claim 1)

[0894] A receiving means for acquiring animal sounds,

[0895] An analysis means for analyzing the sounds of the aforementioned animals and predicting their emotions,

[0896] A computation means for processing the audio data using a machine learning model that has learned a large amount of audio data,

[0897] A comparative method for comparing past case data with predicted emotions,

[0898] A notification means for providing advice based on the aforementioned comparison results,

[0899] A suggestion means for presenting supplies related to the aforementioned advice,

[0900] A display means for displaying the advice using the user interface of a smart device,

[0901] A system that includes this.

[0902] (Claim 2)

[0903] The system according to claim 1, wherein the analysis means converts animal sounds into data format using speech recognition technology.

[0904] (Claim 3)

[0905] The system according to claim 1, which uses an artificial intelligence model to analyze speech in a consumer robot or mobile device.

[0906] "Example 2 of combining an emotion engine"

[0907] (Claim 1)

[0908] A receiving means that accepts animal sounds as input,

[0909] A conversion means for converting the animal's voice into text data using speech recognition technology,

[0910] An analysis means for predicting animal emotions based on the converted data,

[0911] A recognition method that acquires and analyzes user emotion data in real time,

[0912] A related determination means that integrates the acquired animal and user emotional data and determines the relationship between them,

[0913] A matching means for comparing past case information with the aforementioned integrated data and performing comparative analysis,

[0914] A guidance means that provides integrated advice to the user based on the aforementioned matching results,

[0915] A supply means that proposes tools related to the aforementioned advice,

[0916] A system that includes this.

[0917] (Claim 2)

[0918] The system according to claim 1, wherein the analysis means uses a machine learning model to predict the emotions of animals with high accuracy.

[0919] (Claim 3)

[0920] The system according to claim 1, wherein the guidance means displays the advice via a user interface and facilitates a response according to the user's selection.

[0921] "Application example 2 when combining with an emotional engine"

[0922] (Claim 1)

[0923] A receiving means that accepts animal sounds as input,

[0924] An analytical means for analyzing the sounds of the aforementioned animals and predicting their emotions,

[0925] A recognition method that recognizes the user's emotions in real time,

[0926] A comparison means for matching a set of past case data with the predicted emotions,

[0927] A presentation means that provides advice based on the aforementioned matching results and the user's sentiment information,

[0928] A selection method for proposing supplies related to the aforementioned advice,

[0929] A system that includes this.

[0930] (Claim 2)

[0931] The system according to claim 1, wherein the analysis means converts animal sounds into text data using acoustic information recognition technology.

[0932] (Claim 3)

[0933] The system according to claim 1, wherein the presentation means displays the advice through a user interface. [Explanation of Symbols]

[0934] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. An input method that accepts animal sounds as input, An analytical means for analyzing the animal's voice and predicting its emotions, A matching means for comparing a database of past cases with the predicted emotions, A presentation means that provides advice to the user based on the aforementioned matching results, A proposal means for suggesting products related to the aforementioned advice, A system that includes this.

2. The system according to claim 1, wherein the analysis means converts animal sounds into text data using speech recognition technology.

3. The system according to claim 1, wherein the presentation means displays the advice through a user interface.