system
The voice-based fraud detection system addresses the inadequacies of static fraud detection by integrating speech recognition, machine learning, and emotional analysis to provide real-time, comprehensive fraud alerts, ensuring elderly individuals are protected from evolving scams.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-09
- Publication Date
- 2026-06-19
AI Technical Summary
Existing fraud detection systems are inadequate in accurately and timely identifying evolving fraud methods targeting the elderly, particularly through voice-based scams, as they rely on static analysis and fail to consider emotional nuances and real-time adaptability.
A voice-based fraud detection system that utilizes a high-performance microphone to capture ambient sound, converts it into text data using speech recognition, analyzes it for fraud patterns with machine learning algorithms, and sends alerts to registered contacts, while incorporating an emotion engine to assess emotional states and continuously update fraud databases.
The system provides real-time, accurate fraud detection by analyzing both voice content and emotional cues, reducing the risk of elderly individuals falling victim to scams and ensuring the database remains updated with the latest fraud patterns.
Smart Images

Figure 2026100717000001_ABST
Abstract
Description
【Technical Field】 , , 【0005】 , 【0001】 The technology of the present disclosure relates to a system. 【Background Art】 【0002】 Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance. 【Prior Art Documents】 【Patent Documents】 【0003】 【Patent Document 1】 Japanese Patent Application Laid-Open No. 2022-180282 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0004】 In order to improve the current situation where fraud methods targeting the elderly are diversifying and as a result, a large number of people are suffering from fraud, there is a need for means to detect signs of fraud at an unprecedented accuracy at an early stage. In particular, since fraud methods are constantly changing, a flexible and real-time detection system that can cope with them is required. 【Means for Solving the Problems】 【0005】 This invention solves the above problems by providing a system for capturing, recognizing, and analyzing voice to detect signs of fraud. Voice acquired by a voice input device is converted into text data by a voice recognition processing device, and conversations that may contain signs of fraud are analyzed by an analysis device. The analysis device determines the likelihood of fraud by comparing it with past fraud pattern data and, if necessary, notifies registered contacts via an output device with an alert. Furthermore, detection accuracy is improved based on machine learning algorithms, enabling the system to constantly respond to new fraud patterns. 【0006】 An "input device" is a device used to capture sound, and its role is to acquire ambient sounds as digital data. 【0007】 A "speech recognition processing device" is a computer program or hardware used to convert captured audio into text format. 【0008】 "Text data" refers to string data generated by a speech recognition processing device, representing speech information as textual information. 【0009】 "Fraud indicators" refer to specific words or phrases that are judged to be potentially fraudulent based on previously reported fraud methods and patterns. 【0010】 An "analysis device" is a computer system or algorithm used to analyze text data and detect signs of fraud. 【0011】 A "database" is a digital storage medium or information system that stores information about past fraud patterns for analysis devices to refer to. 【0012】 An "alert" is a warning message sent to a user or registered contact when signs of fraud are detected. 【0013】 An "output device" is hardware or software that notifies the user of detected alert information, and may perform actions such as audio output or display. 【0014】 A "machine learning algorithm" is an automated learning system used to improve the accuracy of fraud detection, and it performs pattern recognition based on past data. [Brief explanation of the drawing] 【0015】 [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when combined with an emotion engine. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when combined with an emotion engine. 【Mode for Carrying Out the Invention】 【0016】 Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings. 【0017】 First, the terms used in the following description will be explained. 【0018】 In the following embodiments, a processor with a reference number (hereinafter simply referred to as "processor") may be one arithmetic unit or a combination of a plurality of arithmetic units. Also, the processor may be one type of arithmetic unit or a combination of a plurality of types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like. 【0019】 In the following embodiments, a RAM (Random Access Memory) with a reference number is a memory in which information is temporarily stored and is used as a work memory by the processor. 【0020】 In the following embodiments, a storage with a reference number is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc. 【0021】 In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark). 【0022】 In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or." 【0023】 [First Embodiment] 【0024】 Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment. 【0025】 As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server. 【0026】 The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network). 【0027】 The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52. 【0028】 The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input. 【0029】 The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor. 【0030】 Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54. 【0031】 Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14. 【0032】 As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30. 【0033】 The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290. 【0034】 In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48. 【0035】 Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal". 【0036】 This invention is a voice-based fraud detection system, particularly aimed at responding quickly to fraud targeting the elderly. The embodiments of this invention will be described below with specific examples. 【0037】 First, the user engages in everyday conversation using a dedicated device. The device is equipped with a high-performance microphone that can continuously capture ambient sound. Once sound is captured, the device's speech recognition engine converts the speech into text in real time. 【0038】 This converted text data is sent to a server. The server has a processing unit that analyzes the received text data, and by referring to a database that stores past fraud patterns, it detects specific phrases and keywords that contain signs of fraud. 【0039】 If signs of fraud are detected, the server generates an alert and sends the alert information to registered contacts, such as the user's family or the police. The server also sends a warning signal to the device to inform the user of the danger. 【0040】 For example, if a user receives a phone call asking for bank account information, the terminal captures the audio and converts it into speech recognition. The server then takes the text and compares it against a fraud pattern database to detect potentially fraudulent keywords such as "bank account" and "information request." The server instantly generates an alert and sends notifications to the user's mobile phone, email, and registered emergency contacts. 【0041】 This system also utilizes machine learning algorithms, allowing it to accumulate more data over time and improve the accuracy of fraud pattern detection. When new fraud methods emerge, their unique patterns are learned, and the database is automatically updated. 【0042】 In this way, the system of the present invention can reduce the risk of elderly people becoming victims of fraud and provide an environment in which they can communicate using voice with peace of mind. 【0043】 The following describes the processing flow. 【0044】 Step 1: 【0045】 The device continuously captures ambient sound using its built-in microphone. The sound is temporarily recorded as digital data. 【0046】 Step 2: 【0047】 The terminal converts captured audio data into text data using a speech recognition processing unit. This conversion process is performed in real time, taking into account audio interruptions and noise. 【0048】 Step 3: 【0049】 The device sends the converted text data to the server. The data is transmitted using a secure protocol and processed in a privacy-protected manner. 【0050】 Step 4: 【0051】 The server processes the received text data using an analysis device and compares it against a database of previously accumulated fraud patterns. At this stage, it checks whether specific keywords or phrases are included. 【0052】 Step 5: 【0053】 The server evaluates whether signs of fraud have been detected. If fraud is highly likely, it generates an alert and sets up its details. 【0054】 Step 6: 【0055】 The server sends the generated alert information to registered contacts. These contacts include family members and police officers who can respond to emergencies. 【0056】 Step 7: 【0057】 The device immediately alerts the user. It communicates the warning visually through audio and on the display, prompting the user to interrupt or reconsider the conversation. 【0058】 Step 8: 【0059】 The server continuously monitors subsequent conversation data and incorporates new fraud patterns into its learning database. Machine learning algorithms improve detection accuracy in the future. 【0060】 (Example 1) 【0061】 Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal." 【0062】 Fraudulent schemes targeting the elderly are becoming more sophisticated, making them vulnerable to becoming victims in their daily lives. Therefore, there is a need for technologies that can respond quickly and effectively when elderly people face the risk of fraud. Furthermore, because fraudulent methods are constantly evolving, static analysis based on past data is insufficient. 【0063】 The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means. 【0064】 In this invention, the server includes acquisition means for acquiring voice information, conversion means for converting voice information into text data, and discrimination means for comparing the text data with existing fraud pattern information to determine the possibility of fraud. This enables real-time detection of the risk of voice fraud that may occur in the daily lives of elderly people, prompt warnings, and appropriate countermeasures. 【0065】 "Means for acquiring audio information" refers to a device that senses ambient sounds and converts them into electrical signals, and has the function of capturing audio data in real time. 【0066】 "Conversion means for converting audio information into text data" refers to a technology that analyzes acquired audio data and converts it into corresponding text, and is implemented by a speech recognition engine. 【0067】 "A means of discrimination that compares text data with existing fraud pattern information to determine the possibility of fraud" refers to a process that compares converted text data with fraud pattern information in a database and automatically detects signs of fraud. 【0068】 "Notification means" refers to a means of communicating detected fraud alerts to the user and designated contacts in the form of voice messages or text messages. 【0069】 "Analysis methods" refer to techniques that utilize natural language processing technology to highly analyze text data and extract signs of fraud, and involve the use of large-scale language model algorithms. 【0070】 "Means for analysis based on learning algorithms" refers to the process of using machine learning technology to learn from analysis data and improve the accuracy of recognizing fraud patterns. 【0071】 "Means for automatically updating information" refers to technology that quickly reflects newly detected fraud patterns in the database, ensuring that decisions are always based on the latest information. 【0072】 This invention describes a specific embodiment for implementing a voice-based fraud detection system targeting the elderly. 【0073】 The user carries a dedicated terminal to run this system. This terminal is equipped with a high-performance microphone and can capture ambient sound in high quality using an audio processing chip such as Realtek. The terminal converts the acquired audio information into text data in real time using Google® Speech-to-Text API, etc. Once this process is complete, the text data is sent to the server via a secure communication protocol (e.g., HTTPS). 【0074】 When the server receives text data, it performs a detailed analysis using natural language processing techniques. This involves using large-scale language models such as BERT and GPT to identify specific phrases and keywords that may indicate fraud. Based on this information, the server compares it with a database of past fraud patterns to determine the likelihood of fraud. 【0075】 If the analysis determines that there is a risk of fraud, the server quickly generates an alert. This alert is sent via email or SMS to the user or pre-registered emergency contacts (e.g., family or security agencies). A warning signal is also sent to the device, prompting the user to be vigilant through audio and screen notifications. 【0076】 This system also features a function that continuously learns fraud patterns based on machine learning algorithms. Therefore, it maintains the ability to detect new fraud methods even when they emerge, and the database is automatically updated. 【0077】 For example, if a user receives a call asking for their bank account information, the terminal immediately converts this audio into text. After analysis by the server, the keywords "bank account" and "information provision" are recognized as potentially fraudulent. The server then generates an alert and sends a warning to the user and their emergency contacts. 【0078】 An example of a prompt message would be a system that detects signs of fraud from user voice inquiries and issues an alert. This would enable elderly people to communicate more safely in their daily lives. 【0079】 The flow of the specific processing in Example 1 will be explained using Figure 11. 【0080】 Step 1: 【0081】 The device captures ambient sound through a high-performance microphone. The input is ambient sound, and the output is digital audio data. Specifically, the device continuously collects sound and applies noise cancellation to improve the quality of the audio data. 【0082】 Step 2: 【0083】 The device converts captured audio data into text data using speech recognition technology such as the Google Speech-to-Text API. The input is digital audio data, and the output is recognized text data. Specifically, the device segments the audio data into chunks of a fixed size and passes them sequentially to the speech recognition engine. 【0084】 Step 3: 【0085】 The terminal sends the converted text data to the server via a secure communication protocol (e.g., HTTPS). The input is text data, and the output is a success message for sending the data to the server. Specifically, the terminal encodes the text data in packet format and initiates network communication. 【0086】 Step 4: 【0087】 The server analyzes the received text data by using natural language processing techniques to break it down into phrases and search for specific keywords that may indicate fraud. The input is continuous text data, and the output is a data structure showing the occurrence of keywords. Specifically, the server calls generative AI models such as BERT or GPT to analyze the data. 【0088】 Step 5: 【0089】 The server uses the analysis results to compare them with a fraud pattern database to determine signs of fraud. The input is data on the occurrence of keywords, and the output is risk assessment data that quantifies the likelihood of fraud. Specifically, the server executes database queries and calculates the degree of match with the corresponding fraud pattern. 【0090】 Step 6: 【0091】 If the server determines that there is a high risk of fraud, it generates an alert and sends an alert notification to the user and registered contacts. The input is risk assessment data, and the output is a notification via email or SMS. Specifically, the server sends emails using the SMTP protocol and sends text messages via an SMS gateway. 【0092】 Step 7: 【0093】 The server updates its machine learning model based on the collected data, continuously improving the fraud detection algorithm. The input is a new dataset, and the output is the updated trained model. Specifically, the server periodically runs a batch training process to improve the model's accuracy. 【0094】 (Application Example 1) 【0095】 Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal." 【0096】 Fraud targeting the elderly remains a serious social problem, and many of these frauds involve sophisticated voice scams. Existing prevention measures struggle to quickly and accurately detect fraud and prevent victimization. The risk is particularly high for elderly people living alone. Therefore, there is a need for a system that can detect signs of fraudulent activity from everyday voice communications and issue prompt warnings. 【0097】 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means. 【0098】 In this invention, the server includes an acoustic input means for acquiring an acoustic signal, a conversion means for converting the acoustic signal into text information, and a matching means for detecting signs of fraudulent activity by comparing the text information with past fraudulent activity pattern data. This makes it possible to quickly detect signs of fraud from everyday conversations, promptly notify registered communication recipients of warnings, and directly alert the user through dialogue means. 【0099】 "Acoustic input means" refers to devices or equipment for continuously acquiring ambient sounds and conversational sounds. 【0100】 "Conversion means" refers to a process or device that converts acquired acoustic signals into textual information using an appropriate algorithm. 【0101】 "Verification means" refers to a process or system for detecting signs of fraud by comparing converted character information with existing fraud pattern data. 【0102】 "Notification means" refers to a process or function for sending a warning to a registered recipient regarding the potential for detected fraudulent activity. 【0103】 "Dialogue means" refers to a device or function that allows a system to communicate directly with the user and provide warnings. 【0104】 "Machine learning techniques" are algorithms and technologies used to automatically learn patterns and knowledge from data. 【0105】 An "information aggregation device" is a database or system for storing collected data and detected patterns, and updating them as needed. 【0106】 This invention provides a specific embodiment for implementing a system for detecting voice fraud targeting the elderly. The system comprises an acoustic input means, a conversion means using advanced speech recognition, a matching means for detecting signs of fraudulent activity, a notification means, and a dialogue means. 【0107】 The server uses a high-performance microphone as an acoustic input method to continuously capture everyday conversations and ambient sounds. The captured acoustic signals are converted into text in real time using a cloud-based speech recognition service such as Google Cloud Speech-to-Text via a conversion device. The converted text is sent to the server and compared with past fraud pattern data using a matching device employing machine learning techniques. AI models using Python and scikit-learn are introduced to quickly and accurately analyze signs of fraud. 【0108】 If signs of fraud are detected, the server generates a warning through a notification system and sends the warning to registered communication recipients. Simultaneously, a robot installed in the home uses a dialogue system to directly warn the user of the risk of fraud. This reduces the risk of elderly people becoming victims of fraud and provides a safe communication environment. 【0109】 As a concrete example, consider a scenario where a user receives a suspicious phone call and is asked for banking information. This audio is immediately captured and compared against a fraud pattern database to detect keywords such as "bank" and "account information." The server instantly generates an alert and notifies the user's family and the police. Furthermore, the robot directly informs the user that "providing this information is dangerous." 【0110】 An example of a prompt for the generating AI model is provided in the format: "The following text is from a phone conversation. Please analyze if it may be a scam: 'Do you need my account information?'" 【0111】 The flow of a specific process in Application Example 1 will be explained using Figure 12. 【0112】 Step 1: 【0113】 The device uses a high-performance microphone to capture ambient sound. The input is ambient noise and conversation, and the output is the captured acoustic signal. Pre-processing, such as compression and noise reduction, is performed to capture the audio in real time. 【0114】 Step 2: 【0115】 The device uses a speech recognition service such as Google Cloud Speech-to-Text to convert the acoustic signal into text information. The input is the acoustic signal obtained in step 1, and the output is the converted text information. A speech recognition algorithm is applied to convert the acoustic signal into text data. 【0116】 Step 3: 【0117】 The server compares the character information with past fraudulent activity pattern data. The input is the character information obtained in step 2, and the output is information indicating fraudulent activity. Known fraudulent patterns are referenced from the database and compared with the character information. 【0118】 Step 4: 【0119】 The server uses a machine learning model to analyze signs of fraudulent activity. The input is the information about signs of fraudulent activity obtained in step 3, and the output is decision information for generating alerts. The computational methods used are models based on Python and scikit-learn. 【0120】 Step 5: 【0121】 The server generates an alert and sends a warning to registered contacts via a notification system. The input is the decision information obtained in step 4, and the output is the warning notification sent to the contacts. Warning information is quickly transmitted using a communication method. 【0122】 Step 6: 【0123】 The user's home robot uses a dialogue mechanism to directly warn the user of the risk of fraud. The input is the warning information generated in step 5, and the output is an audio warning to the user. A pre-configured message is played using speech synthesis technology. 【0124】 Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions. 【0125】 This invention is a technology that combines an emotion engine with a voice-based fraud detection system to detect potential fraud with greater accuracy and provide appropriate alerts to the user. Specific forms for implementing this system are described below. 【0126】 Users use a dedicated device for everyday communication. This device has a built-in high-performance microphone that continuously captures surrounding conversations. The captured audio is converted into text data by a speech recognition processing unit installed in the device. 【0127】 The converted text data is sent from the terminal to the server. The server has an analysis device that processes the received text data and has the function of detecting signs of fraud by comparing it with fraud patterns stored in a database. In addition, the server has an emotion engine that analyzes the emotional state from the user's voice. This allows the likelihood of fraud to be evaluated by taking into account not only the content of the voice but also the user's emotional information. 【0128】 As a concrete example, consider a scenario where a user receives a phone call requesting a loan. The device captures this audio and converts it into text data in real time. The server analyzes the text data and compares it against existing fraud patterns. Simultaneously, an emotion engine understands the user's emotional state from their tone and intonation, and comprehensively assesses the likelihood of fraud. Based on these results, the server automatically generates an alert and notifies the user and their registered contacts. 【0129】 Furthermore, this system utilizes machine learning algorithms, enabling it to continuously learn new fraud techniques and emotional patterns. Over time, the system dynamically updates its database, always providing the most up-to-date fraud prevention measures. 【0130】 In this way, the system of the present invention not only detects fraud but also takes into account the user's emotions, providing alerts based on deeper insights and enabling fraud to be prevented. 【0131】 The following describes the processing flow. 【0132】 Step 1: 【0133】 The device continuously captures audio from the user's surroundings using a built-in high-performance microphone. This audio data is temporarily recorded in digital format. 【0134】 Step 2: 【0135】 The terminal converts the captured audio into text data in real time using a speech recognition processor. The converted text data is used for analyzing the possibility of fraud. 【0136】 Step 3: 【0137】 The terminal sends the converted text data to the server. Security protocols are used for transmission to ensure the data is processed safely. 【0138】 Step 4: 【0139】 The server analyzes the received text data. The analysis device refers to a database containing accumulated fraud patterns and checks if keywords and phrases match existing fraud techniques. 【0140】 Step 5: 【0141】 The server uses an emotion engine to understand the user's emotional state from their voice. It analyzes the tone, speed, and volume of their voice to determine their psychological state. 【0142】 Step 6: 【0143】 The server comprehensively evaluates the results of the text data analysis and the user's emotional state to determine the possibility of fraud and generate an alert. 【0144】 Step 7: 【0145】 The server sends the generated alert information to registered contacts. Family members, the police, and other relevant parties capable of taking appropriate action are notified. 【0146】 Step 8: 【0147】 The device issues a warning to the user. Through voice notifications and visual alerts on the display, it prompts the user to end the conversation or double-check. 【0148】 Step 9: 【0149】 The server continuously analyzes conversations and incorporates newly detected fraud patterns and emotional states into its learning database. Machine learning algorithms are used to continuously improve the system's accuracy. 【0150】 (Example 2) 【0151】 Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal". 【0152】 Traditional voice-based fraud detection systems focus on analyzing audio content, but evaluating fraud signs solely based on text data can be insufficient for accurate detection. In particular, they fail to consider emotional shifts and subtle nuances discernible from human speech, potentially leading to missed fraud risks. Furthermore, as fraud techniques evolve daily, existing databases must be constantly updated to accommodate the latest patterns. There is a need to solve these problems and provide a highly accurate and flexible fraud detection system. 【0153】 The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means. 【0154】 In this invention, the server includes means for acquiring voice information by an integration device, processing means for converting the voice information into natural language data, analysis means for comparing the natural language data with past fraud pattern information, and emotion analysis means for analyzing the acquired voice state information and evaluating emotion information. This makes it possible to consider not only the voice content but also human emotion information, evaluate the possibility of fraud with high accuracy, and notify warning information. Furthermore, by dynamically updating the information repository, it becomes possible to respond quickly to the latest fraud methods. 【0155】 "Audio information" refers to vibration data acquired from the surrounding audio input environment, and is data that is analyzed and treated as meaningful information. 【0156】 An "integration device" is a device that collects raw data acquired from various input environments and manages and processes it centrally. 【0157】 "Natural language data" refers to a collection of data in which linguistic expressions used by humans in everyday life have been converted into a text format that can be stored digitally. 【0158】 "Processing device means" refers to a combination of hardware and software for converting or analyzing input data into a specific format. 【0159】 "Analysis device means" refers to devices and technologies that examine input data and identify and evaluate its characteristics and patterns. 【0160】 "Fraudulent pattern information" refers to a collection of information stored in a database that records the characteristics of fraudulent behavior and fraudulent activities that have occurred in the past. 【0161】 "Emotional analysis device means" refers to a device or method for analyzing the tone, pitch, and other meta-information of speech to determine the emotional state of the speaker. 【0162】 "Warning information" refers to information generated to inform the recipient of the existence of a risk when certain conditions are met. 【0163】 "Signal output device means" refers to a device or technology for transferring analyzed and evaluated information to an end user or other system. 【0164】 This invention is implemented as a fraud detection system based on voice information. This system performs the collection, analysis, and warning issuance of voice information as a series of processes. 【0165】 The user first uses a dedicated device. This device has a built-in high-performance microphone that continuously acquires ambient sound information. During this process, the microphone is equipped with noise-canceling technology, which eliminates ambient noise and allows for clearer audio acquisition. For example, even in noisy environments, the voice of a specific speaker can be clearly captured. 【0166】 The terminal is equipped with a speech recognition processing unit to convert acquired voice information into natural language data. This unit incorporates an advanced speech recognition algorithm that converts raw voice data into text format. The converted text data is securely transmitted from the terminal to the server. Encryption technology is used during this process to protect data privacy. 【0167】 The server uses an analysis device to compare received text data with past fraud pattern information. This analysis device utilizes natural language processing technology to evaluate signs of fraud from context. It also has a sentiment analyzer that determines emotional states from voice information. This sentiment evaluation can further increase the likelihood of fraud. For example, if certain word choices or tone of voice sound unnatural, it will be recognized as a sign of fraud. 【0168】 As a result, if fraud is deemed highly likely, the server automatically generates warning information and sends it to the relevant recipients via an output signaling device. This process uses machine learning algorithms to continuously learn new fraud and sentiment patterns, dynamically updating the information repository to ensure that responses are always up-to-date. 【0169】 For example, when a user requests that money be transferred, the system analyzes the content of the message and the emotional tone of the voice, and if there is a possibility of fraud, it promptly notifies the user or their registered emergency contact. 【0170】 An example of a prompt message would be, "Use this system to detect recently popular fraud techniques." In this way, the present invention is a system that provides more advanced fraud detection capabilities by simultaneously considering voice and emotional information. 【0171】 The flow of the specific processing in Example 2 will be explained using Figure 13. 【0172】 Step 1: 【0173】 The user uses a device to acquire audio information from the environment using a high-performance microphone. The microphone utilizes noise cancellation to eliminate unwanted background noise and capture clearer audio. The input is physical audio information, and the output is electronic audio data. 【0174】 Step 2: 【0175】 The device sends the received audio data to its internal speech recognition processing unit, where it converts it into natural language data. In this process, a speech recognition algorithm analyzes the audio waveform and converts it into text format. The input is audio data, and the output is text data. 【0176】 Step 3: 【0177】 The terminal sends the generated text data to the server. Before transmission, the data is encrypted to ensure its security. The input is the converted text data, and the output is the encrypted text data. 【0178】 Step 4: 【0179】 The server uses an analysis device to compare the received text data with past fraudulent patterns. Natural language processing techniques are used to analyze the data and search for signs of fraud. The input is encrypted text data, and the output is an assessment of the likelihood of fraud. 【0180】 Step 5: 【0181】 The server simultaneously performs emotion analysis of the voice. The emotion analysis device analyzes the tone and intonation of the voice obtained from the voice data and evaluates the user's emotional state. The input is voice data, and the output is the emotion evaluation result. 【0182】 Step 6: 【0183】 The server combines the analyzed text data with the sentiment evaluation results to comprehensively assess the likelihood of fraud. If a high probability of fraud is detected, the server automatically generates a warning. The input is the evaluation results of the text data and the sentiment evaluation results, and the output is the warning information. 【0184】 Step 7: 【0185】 The server sends warning information to the user and pre-registered recipients. The information is sent via email or SMS, prompting the user to take immediate action. The input is warning information, and the output is notifications via email, SMS, etc. 【0186】 Step 8: 【0187】 The server uses machine learning algorithms to learn new fraud and sentiment patterns, updating its database. This optimizes the system to constantly respond to the latest fraudulent techniques. The input is the analyzed data set, and the output is the updated database. 【0188】 (Application Example 2) 【0189】 Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal." 【0190】 Conventional voice-based fraud detection systems analyze only the content of the voice, and as fraudulent methods become more sophisticated, their accuracy and reliability are limited. Furthermore, they cannot make comprehensive judgments that take into account the user's emotional state, leading to risks of false positives and missed scams. There is a need to solve these problems and prevent fraud before it occurs. 【0191】 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means. 【0192】 In this invention, the server includes means for acquiring an acoustic signal, means for recognizing the acoustic signal and converting it into text data, means for analyzing the text data and comparing it with past fraud pattern data to detect signs of fraud, and means for having an emotion engine that analyzes the emotional state based on the detected fraud pattern. This enables highly accurate fraud detection that takes into account not only the content of the fraud but also the emotional information of the user. 【0193】 An "acoustic signal" is an electrical signal obtained by converting air vibrations transmitted as sound into an electrical signal, and it is the data that forms the basis of speech understanding. 【0194】 "Acquisition means" refers to a device or method for capturing an acoustic signal and inputting it into an electronic device. 【0195】 "Recognition means" refers to a device or method that processes acquired acoustic signals and interprets them as character data. 【0196】 "Text data" refers to data in text format output from the analysis of speech, and is information used for subsequent processing. 【0197】 "Analysis means" refers to an apparatus or method used for the purpose of detecting signs of fraud by comparing textual data with past fraud patterns. 【0198】 "Means equipped with an emotion engine" refers to a device or method for analyzing the emotional state of a speaker based on text data obtained from an acoustic signal. 【0199】 The system for implementing this invention mainly consists of the cooperation between a terminal device and a server. The terminal device acquires acoustic signals using a high-performance microphone and converts them into text data in real time using a speech recognition library. Specifically, services such as Google Cloud Speech-to-Text are used. 【0200】 Text data acquired by the device is transmitted to a server via the internet. The server has analytical capabilities to compare the text data with a database of past fraud patterns. To identify signs of fraud, a machine learning algorithm (e.g., TENSORFLOW®) is implemented to perform text analysis. This analysis process also learns new fraud techniques and sentiment patterns, enabling decision-making based on the latest information. 【0201】 Furthermore, the server functions as a system equipped with an emotion engine. This analyzes the tone and intonation of the acoustic signal in real time to analyze the user's emotional state. By comparing the results of the emotion analysis with fraud patterns, it comprehensively evaluates the likelihood of fraud and generates notification information. The generated notification information is immediately sent to the user's terminal and registered communication destinations. 【0202】 As a concrete example, consider a scenario where a user receives a "same-day loan" offer over the phone. In this case, the audio signal is captured by the terminal and converted into text data. The server performs fraud pattern and sentiment analysis on the received data, and if it determines that there is a high probability of fraud, it creates an alert and notifies the user. 【0203】 Examples of prompts generated using AI models include the following: 【0204】 "Analyze the situation in which a user receives a suspicious offer via voice communication, assess the risk from both a fraudulent and emotional perspective, and create a scenario that generates an alert." 【0205】 The flow of a specific process in Application Example 2 will be explained using Figure 14. 【0206】 Step 1: 【0207】 The device uses a high-performance microphone to acquire ambient acoustic signals. The input is physical sound, and the output is an electrical audio signal. This signal is processed by a digital audio library. Specifically, the microphone captures ambient sound. 【0208】 Step 2: 【0209】 The device converts acquired audio signals into text data in real time using a speech recognition library (e.g., Google Cloud Speech-to-Text). The input is an audio signal, and the output is text data. Specifically, the process involves inputting the audio signal into a speech recognition model and converting it into text data. 【0210】 Step 3: 【0211】 The terminal sends the converted character data to the server via the internet. The input is character data, and the output is the transmission of data to the server. Specifically, the terminal uploads text data to the server using its network connection. 【0212】 Step 4: 【0213】 The server compares the received text data with a fraud pattern database to analyze signs of fraud. The input is the received text data, and the output is the fraud detection result. Specifically, it uses a machine learning algorithm to perform the comparison with the database. 【0214】 Step 5: 【0215】 The server uses an emotion engine to analyze the speaker's emotional state in relation to text data. The input is text data, and the output is the emotion analysis result. Specifically, the emotion model evaluates the tone and context of the text to generate emotion information. 【0216】 Step 6: 【0217】 The server integrates fraud detection results and sentiment analysis results to comprehensively evaluate the likelihood of fraud. The inputs are fraud detection results and sentiment analysis results, and the output is the overall evaluation and notification information. Specifically, it calculates the probability of fraud risk and generates a notification message for the user based on that. 【0218】 Step 7: 【0219】 Based on the evaluation results, if the server determines there is a risk of fraud, it generates an alert and sends notification information to the user's terminal and registered communication destinations. The input is the overall evaluation and notification information, and the output is the final user notification. Specifically, an alert message is generated and delivered to the user and communication destinations in real time. 【0220】 The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data. 【0221】 Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization. 【0222】 In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14. 【0223】 [Second Embodiment] 【0224】 Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment. 【0225】 As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server. 【0226】 The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network). 【0227】 The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52. 【0228】 The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46. 【0229】 Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision). 【0230】 Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner. 【0231】 Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56. 【0232】 The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30. 【0233】 The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290. 【0234】 In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48. 【0235】 Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal". 【0236】 This invention is a voice-based fraud detection system, particularly aimed at responding quickly to fraud targeting the elderly. The embodiments of this invention will be described below with specific examples. 【0237】 First, the user engages in everyday conversation using a dedicated device. The device is equipped with a high-performance microphone that can continuously capture ambient sound. Once sound is captured, the device's speech recognition engine converts the speech into text in real time. 【0238】 This converted text data is sent to a server. The server has a processing unit that analyzes the received text data, and by referring to a database that stores past fraud patterns, it detects specific phrases and keywords that contain signs of fraud. 【0239】 If signs of fraud are detected, the server generates an alert and sends the alert information to registered contacts, such as the user's family or the police. The server also sends a warning signal to the device to inform the user of the danger. 【0240】 For example, if a user receives a phone call asking for bank account information, the terminal captures the audio and converts it into speech recognition. The server then takes the text and compares it against a fraud pattern database to detect potentially fraudulent keywords such as "bank account" and "information request." The server instantly generates an alert and sends notifications to the user's mobile phone, email, and registered emergency contacts. 【0241】 This system also utilizes machine learning algorithms, allowing it to accumulate more data over time and improve the accuracy of fraud pattern detection. When new fraud methods emerge, their unique patterns are learned, and the database is automatically updated. 【0242】 In this way, the system of the present invention can reduce the risk of elderly people becoming victims of fraud and provide an environment in which they can communicate using voice with peace of mind. 【0243】 The following describes the processing flow. 【0244】 Step 1: 【0245】 The device continuously captures ambient sound using its built-in microphone. The sound is temporarily recorded as digital data. 【0246】 Step 2: 【0247】 The terminal converts captured audio data into text data using a speech recognition processing unit. This conversion process is performed in real time, taking into account audio interruptions and noise. 【0248】 Step 3: 【0249】 The device sends the converted text data to the server. The data is transmitted using a secure protocol and processed in a privacy-protected manner. 【0250】 Step 4: 【0251】 The server processes the received text data using an analysis device and compares it against a database of previously accumulated fraud patterns. At this stage, it checks whether specific keywords or phrases are included. 【0252】 Step 5: 【0253】 The server evaluates whether signs of fraud have been detected. If fraud is highly likely, it generates an alert and sets up its details. 【0254】 Step 6: 【0255】 The server sends the generated alert information to registered contacts. These contacts include family members and police officers who can respond to emergencies. 【0256】 Step 7: 【0257】 The device immediately alerts the user. It communicates the warning visually through audio and on the display, prompting the user to interrupt or reconsider the conversation. 【0258】 Step 8: 【0259】 The server continuously monitors subsequent conversation data and incorporates new fraud patterns into its learning database. Machine learning algorithms improve detection accuracy in the future. 【0260】 (Example 1) 【0261】 Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal." 【0262】 Fraudulent schemes targeting the elderly are becoming more sophisticated, making them vulnerable to becoming victims in their daily lives. Therefore, there is a need for technologies that can respond quickly and effectively when elderly people face the risk of fraud. Furthermore, because fraudulent methods are constantly evolving, static analysis based on past data is insufficient. 【0263】 The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means. 【0264】 In this invention, the server includes acquisition means for acquiring voice information, conversion means for converting voice information into text data, and discrimination means for comparing the text data with existing fraud pattern information to determine the possibility of fraud. This enables real-time detection of the risk of voice fraud that may occur in the daily lives of elderly people, prompt warnings, and appropriate countermeasures. 【0265】 "Means for acquiring audio information" refers to a device that senses ambient sounds and converts them into electrical signals, and has the function of capturing audio data in real time. 【0266】 "Conversion means for converting audio information into text data" refers to a technology that analyzes acquired audio data and converts it into corresponding text, and is implemented by a speech recognition engine. 【0267】 "A means of discrimination that compares text data with existing fraud pattern information to determine the possibility of fraud" refers to a process that compares converted text data with fraud pattern information in a database and automatically detects signs of fraud. 【0268】 "Notification means" refers to a means of communicating detected fraud alerts to the user and designated contacts in the form of voice messages or text messages. 【0269】 "Analysis methods" refer to techniques that utilize natural language processing technology to highly analyze text data and extract signs of fraud, and involve the use of large-scale language model algorithms. 【0270】 "Means for analysis based on learning algorithms" refers to the process of using machine learning technology to learn from analysis data and improve the accuracy of recognizing fraud patterns. 【0271】 "Means for automatically updating information" refers to technology that quickly reflects newly detected fraud patterns in the database, ensuring that decisions are always based on the latest information. 【0272】 This invention describes a specific embodiment for implementing a voice-based fraud detection system targeting the elderly. 【0273】 Users carry a dedicated terminal to run this system. This terminal is equipped with a high-performance microphone and can capture ambient sound in high quality using an audio processing chip such as Realtek. The terminal converts the acquired audio information into text data in real time using the Google Speech-to-Text API, etc. Once this process is complete, the text data is sent to the server via a secure communication protocol (e.g., HTTPS). 【0274】 When the server receives text data, it performs a detailed analysis using natural language processing techniques. This involves using large-scale language models such as BERT and GPT to identify specific phrases and keywords that may indicate fraud. Based on this information, the server compares it with a database of past fraud patterns to determine the likelihood of fraud. 【0275】 If the analysis determines that there is a risk of fraud, the server quickly generates an alert. This alert is sent via email or SMS to the user or pre-registered emergency contacts (e.g., family or security agencies). A warning signal is also sent to the device, prompting the user to be vigilant through audio and screen notifications. 【0276】 This system also features a function that continuously learns fraud patterns based on machine learning algorithms. Therefore, it maintains the ability to detect new fraud methods even when they emerge, and the database is automatically updated. 【0277】 For example, if a user receives a call asking for their bank account information, the terminal immediately converts this audio into text. After analysis by the server, the keywords "bank account" and "information provision" are recognized as potentially fraudulent. The server then generates an alert and sends a warning to the user and their emergency contacts. 【0278】 An example of a prompt message would be a system that detects signs of fraud from user voice inquiries and issues an alert. This would enable elderly people to communicate more safely in their daily lives. 【0279】 The flow of the specific processing in Example 1 will be explained using Figure 11. 【0280】 Step 1: 【0281】 The terminal captures ambient voices through a high-performance microphone. The input is the ambient voice, and the output is voice data in digital format. As a specific operation, the terminal constantly collects voices and performs noise cancellation to improve the quality of the voice data. 【0282】 Step 2: 【0283】 The terminal converts the captured voice data into text data using speech recognition technologies such as the Google Speech-to-Text API. The input is voice data in digital format, and the output is recognized character data. As a specific operation, the terminal segments the voice data in a certain chunk size and sequentially passes it to the speech recognition engine. 【0284】 Step 3: 【0285】 The terminal sends the converted text data to the server via a secure communication protocol (e.g., HTTPS). The input is the text data, and the output is a success message for data transmission to the server. As a specific operation, the terminal encodes the text data in packet format and initiates network communication. 【0286】 Step 4: 【0287】 The server uses natural language processing technology to decompose the text into phrases and search for specific keywords that may indicate fraud in order to analyze the received text data. The input is continuous text data, and the output is a data structure indicating the occurrence status of the keywords. As a specific operation, the server calls generative AI models such as BERT or GPT to analyze the data. 【0288】 Step 5: 【0289】 The server uses the analysis results to compare them with a fraud pattern database to determine signs of fraud. The input is data on the occurrence of keywords, and the output is risk assessment data that quantifies the likelihood of fraud. Specifically, the server executes database queries and calculates the degree of match with the corresponding fraud pattern. 【0290】 Step 6: 【0291】 If the server determines that there is a high risk of fraud, it generates an alert and sends an alert notification to the user and registered contacts. The input is risk assessment data, and the output is a notification via email or SMS. Specifically, the server sends emails using the SMTP protocol and sends text messages via an SMS gateway. 【0292】 Step 7: 【0293】 The server updates its machine learning model based on the collected data, continuously improving the fraud detection algorithm. The input is a new dataset, and the output is the updated trained model. Specifically, the server periodically runs a batch training process to improve the model's accuracy. 【0294】 (Application Example 1) 【0295】 Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal." 【0296】 Fraud targeting the elderly remains a serious social problem, and many of these frauds involve sophisticated voice scams. Existing prevention measures struggle to quickly and accurately detect fraud and prevent victimization. The risk is particularly high for elderly people living alone. Therefore, there is a need for a system that can detect signs of fraudulent activity from everyday voice communications and issue prompt warnings. 【0297】 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means. 【0298】 In this invention, the server includes an acoustic input means for acquiring an acoustic signal, a conversion means for converting the acoustic signal into text information, and a matching means for detecting signs of fraudulent activity by comparing the text information with past fraudulent activity pattern data. This makes it possible to quickly detect signs of fraud from everyday conversations, promptly notify registered communication recipients of warnings, and directly alert the user through dialogue means. 【0299】 "Acoustic input means" refers to devices or equipment for continuously acquiring ambient sounds and conversational sounds. 【0300】 "Conversion means" refers to a process or device that converts acquired acoustic signals into textual information using an appropriate algorithm. 【0301】 "Verification means" refers to a process or system for detecting signs of fraud by comparing converted character information with existing fraud pattern data. 【0302】 "Notification means" refers to a process or function for sending a warning to a registered recipient regarding the potential for detected fraudulent activity. 【0303】 "Dialogue means" refers to a device or function that allows a system to communicate directly with the user and provide warnings. 【0304】 "Machine learning techniques" are algorithms and technologies used to automatically learn patterns and knowledge from data. 【0305】 An "information aggregation device" is a database or system for storing collected data and detected patterns, and updating them as needed. 【0306】 This invention shows a form for specifically implementing a system for detecting voice fraud targeting the elderly. The system consists of an acoustic input means, a conversion means using advanced voice recognition, a collation means for detecting signs of fraud, a notification means, and an interaction means. 【0307】 The server uses a high-performance microphone as the acoustic input means to continuously capture daily conversations and ambient sounds. The captured acoustic signals are converted into character information in real time via the conversion means using a cloud voice recognition service such as Google Cloud Speech-to-Text. The converted character information is sent to the server and compared with past fraud pattern data by the collation means using machine learning techniques. An AI model using Python and scikit-learn is introduced to quickly and accurately analyze signs of fraud. 【0308】 When signs of fraud are detected, the server generates a warning through the notification means and sends the warning to the registered communication destination. At the same time, the robot installed in the home uses the interaction means to directly warn the user of the danger of fraud. This reduces the risk that the elderly will suffer fraud and provides a safe communication environment. 【0309】 As a specific example, consider the case where the user receives a suspicious call and is asked for bank information. This voice is immediately captured and compared with the fraud pattern database, and keywords such as "bank" and "account information" are detected. The server instantly creates an alert and notifies the user's family and the police. Furthermore, the robot directly tells the user, "Providing this information is dangerous." 【0310】 An example of a prompt sentence for the generated AI model is provided in the form of "The following text is the content of a phone conversation. Please analyze whether there is a possibility of fraud: 'Do you need my account information?'". 【0311】 The flow of specific processing in Application Example 1 will be described using FIG. 12. 【0312】 Step 1: 【0313】 The device uses a high-performance microphone to capture ambient sound. The input is ambient noise and conversation, and the output is the captured acoustic signal. Pre-processing, such as compression and noise reduction, is performed to capture the audio in real time. 【0314】 Step 2: 【0315】 The device uses a speech recognition service such as Google Cloud Speech-to-Text to convert the acoustic signal into text information. The input is the acoustic signal obtained in step 1, and the output is the converted text information. A speech recognition algorithm is applied to convert the acoustic signal into text data. 【0316】 Step 3: 【0317】 The server compares the character information with past fraudulent activity pattern data. The input is the character information obtained in step 2, and the output is information indicating fraudulent activity. Known fraudulent patterns are referenced from the database and compared with the character information. 【0318】 Step 4: 【0319】 The server uses a machine learning model to analyze signs of fraudulent activity. The input is the information about signs of fraudulent activity obtained in step 3, and the output is decision information for generating alerts. The computational methods used are models based on Python and scikit-learn. 【0320】 Step 5: 【0321】 The server generates an alert and sends a warning to registered contacts via a notification system. The input is the decision information obtained in step 4, and the output is the warning notification sent to the contacts. Warning information is quickly transmitted using a communication method. 【0322】 Step 6: 【0323】 The user's home robot uses a dialogue mechanism to directly warn the user of the risk of fraud. The input is the warning information generated in step 5, and the output is an audio warning to the user. A pre-configured message is played using speech synthesis technology. 【0324】 Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions. 【0325】 This invention is a technology that combines an emotion engine with a voice-based fraud detection system to detect potential fraud with greater accuracy and provide appropriate alerts to the user. Specific forms for implementing this system are described below. 【0326】 Users use a dedicated device for everyday communication. This device has a built-in high-performance microphone that continuously captures surrounding conversations. The captured audio is converted into text data by a speech recognition processing unit installed in the device. 【0327】 The converted text data is sent from the terminal to the server. The server has an analysis device that processes the received text data and has the function of detecting signs of fraud by comparing it with fraud patterns stored in a database. In addition, the server has an emotion engine that analyzes the emotional state from the user's voice. This allows the likelihood of fraud to be evaluated by taking into account not only the content of the voice but also the user's emotional information. 【0328】 As a concrete example, consider a scenario where a user receives a phone call requesting a loan. The device captures this audio and converts it into text data in real time. The server analyzes the text data and compares it against existing fraud patterns. Simultaneously, an emotion engine understands the user's emotional state from their tone and intonation, and comprehensively assesses the likelihood of fraud. Based on these results, the server automatically generates an alert and notifies the user and their registered contacts. 【0329】 Furthermore, this system utilizes machine learning algorithms, enabling it to continuously learn new fraud techniques and emotional patterns. Over time, the system dynamically updates its database, always providing the most up-to-date fraud prevention measures. 【0330】 In this way, the system of the present invention not only detects fraud but also takes into account the user's emotions, providing alerts based on deeper insights and enabling fraud to be prevented. 【0331】 The following describes the processing flow. 【0332】 Step 1: 【0333】 The device continuously captures audio from the user's surroundings using a built-in high-performance microphone. This audio data is temporarily recorded in digital format. 【0334】 Step 2: 【0335】 The terminal converts the captured audio into text data in real time using a speech recognition processor. The converted text data is used for analyzing the possibility of fraud. 【0336】 Step 3: 【0337】 The terminal sends the converted text data to the server. Security protocols are used for transmission to ensure the data is processed safely. 【0338】 Step 4: 【0339】 The server analyzes the received text data. The analysis device refers to a database containing accumulated fraud patterns and checks if keywords and phrases match existing fraud techniques. 【0340】 Step 5: 【0341】 The server uses an emotion engine to understand the user's emotional state from their voice. It analyzes the tone, speed, and volume of their voice to determine their psychological state. 【0342】 Step 6: 【0343】 The server comprehensively evaluates the results of the text data analysis and the user's emotional state to determine the possibility of fraud and generate an alert. 【0344】 Step 7: 【0345】 The server sends the generated alert information to registered contacts. Family members, the police, and other relevant parties capable of taking appropriate action are notified. 【0346】 Step 8: 【0347】 The device issues a warning to the user. Through voice notifications and visual alerts on the display, it prompts the user to end the conversation or double-check. 【0348】 Step 9: 【0349】 The server continuously analyzes conversations and incorporates newly detected fraud patterns and emotional states into its learning database. Machine learning algorithms are used to continuously improve the system's accuracy. 【0350】 (Example 2) 【0351】 Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal". 【0352】 Traditional voice-based fraud detection systems focus on analyzing audio content, but evaluating fraud signs solely based on text data can be insufficient for accurate detection. In particular, they fail to consider emotional shifts and subtle nuances discernible from human speech, potentially leading to missed fraud risks. Furthermore, as fraud techniques evolve daily, existing databases must be constantly updated to accommodate the latest patterns. There is a need to solve these problems and provide a highly accurate and flexible fraud detection system. 【0353】 The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means. 【0354】 In this invention, the server includes means for acquiring voice information by an integration device, processing means for converting the voice information into natural language data, analysis means for comparing the natural language data with past fraud pattern information, and emotion analysis means for analyzing the acquired voice state information and evaluating emotion information. This makes it possible to consider not only the voice content but also human emotion information, evaluate the possibility of fraud with high accuracy, and notify warning information. Furthermore, by dynamically updating the information repository, it becomes possible to respond quickly to the latest fraud methods. 【0355】 "Audio information" refers to vibration data acquired from the surrounding audio input environment, and is data that is analyzed and treated as meaningful information. 【0356】 An "integration device" is a device that collects raw data acquired from various input environments and manages and processes it centrally. 【0357】 "Natural language data" refers to a collection of data in which linguistic expressions used by humans in everyday life have been converted into a text format that can be stored digitally. 【0358】 "Processing device means" refers to a combination of hardware and software for converting or analyzing input data into a specific format. 【0359】 "Analysis device means" refers to devices and technologies that examine input data and identify and evaluate its characteristics and patterns. 【0360】 "Fraudulent pattern information" refers to a collection of information stored in a database that records the characteristics of fraudulent behavior and fraudulent activities that have occurred in the past. 【0361】 "Emotional analysis device means" refers to a device or method for analyzing the tone, pitch, and other meta-information of speech to determine the emotional state of the speaker. 【0362】 "Warning information" refers to information generated to inform the recipient of the existence of a risk when certain conditions are met. 【0363】 "Signal output device means" refers to a device or technology for transferring analyzed and evaluated information to an end user or other system. 【0364】 This invention is implemented as a fraud detection system based on voice information. This system performs the collection, analysis, and warning issuance of voice information as a series of processes. 【0365】 The user first uses a dedicated device. This device has a built-in high-performance microphone that continuously acquires ambient sound information. During this process, the microphone is equipped with noise-canceling technology, which eliminates ambient noise and allows for clearer audio acquisition. For example, even in noisy environments, the voice of a specific speaker can be clearly captured. 【0366】 The terminal is equipped with a speech recognition processing unit to convert acquired voice information into natural language data. This unit incorporates an advanced speech recognition algorithm that converts raw voice data into text format. The converted text data is securely transmitted from the terminal to the server. Encryption technology is used during this process to protect data privacy. 【0367】 The server uses an analysis device to compare received text data with past fraud pattern information. This analysis device utilizes natural language processing technology to evaluate signs of fraud from context. It also has a sentiment analyzer that determines emotional states from voice information. This sentiment evaluation can further increase the likelihood of fraud. For example, if certain word choices or tone of voice sound unnatural, it will be recognized as a sign of fraud. 【0368】 As a result, if fraud is deemed highly likely, the server automatically generates warning information and sends it to the relevant recipients via an output signaling device. This process uses machine learning algorithms to continuously learn new fraud and sentiment patterns, dynamically updating the information repository to ensure that responses are always up-to-date. 【0369】 For example, when a user requests that money be transferred, the system analyzes the content of the message and the emotional tone of the voice, and if there is a possibility of fraud, it promptly notifies the user or their registered emergency contact. 【0370】 An example of a prompt message would be, "Use this system to detect recently popular fraud techniques." In this way, the present invention is a system that provides more advanced fraud detection capabilities by simultaneously considering voice and emotional information. 【0371】 The flow of the specific processing in Example 2 will be explained using Figure 13. 【0372】 Step 1: 【0373】 The user uses a device to acquire audio information from the environment using a high-performance microphone. The microphone utilizes noise cancellation to eliminate unwanted background noise and capture clearer audio. The input is physical audio information, and the output is electronic audio data. 【0374】 Step 2: 【0375】 The device sends the received audio data to its internal speech recognition processing unit, where it converts it into natural language data. In this process, a speech recognition algorithm analyzes the audio waveform and converts it into text format. The input is audio data, and the output is text data. 【0376】 Step 3: 【0377】 The terminal sends the generated text data to the server. Before transmission, the data is encrypted to ensure its security. The input is the converted text data, and the output is the encrypted text data. 【0378】 Step 4: 【0379】 The server uses an analysis device to compare the received text data with past fraudulent patterns. Natural language processing techniques are used to analyze the data and search for signs of fraud. The input is encrypted text data, and the output is an assessment of the likelihood of fraud. 【0380】 Step 5: 【0381】 The server simultaneously performs emotion analysis of the voice. The emotion analysis device analyzes the tone and intonation of the voice obtained from the voice data and evaluates the user's emotional state. The input is voice data, and the output is the emotion evaluation result. 【0382】 Step 6: 【0383】 The server combines the analyzed text data with the sentiment evaluation results to comprehensively assess the likelihood of fraud. If a high probability of fraud is detected, the server automatically generates a warning. The input is the evaluation results of the text data and the sentiment evaluation results, and the output is the warning information. 【0384】 Step 7: 【0385】 The server sends warning information to the user and pre-registered recipients. The information is sent via email or SMS, prompting the user to take immediate action. The input is warning information, and the output is notifications via email, SMS, etc. 【0386】 Step 8: 【0387】 The server uses machine learning algorithms to learn new fraud and sentiment patterns, updating its database. This optimizes the system to constantly respond to the latest fraudulent techniques. The input is the analyzed data set, and the output is the updated database. 【0388】 (Application Example 2) 【0389】 Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal." 【0390】 Conventional voice-based fraud detection systems analyze only the content of the voice, and as fraudulent methods become more sophisticated, their accuracy and reliability are limited. Furthermore, they cannot make comprehensive judgments that take into account the user's emotional state, leading to risks of false positives and missed scams. There is a need to solve these problems and prevent fraud before it occurs. 【0391】 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means. 【0392】 In this invention, the server includes means for acquiring an acoustic signal, means for recognizing the acoustic signal and converting it into text data, means for analyzing the text data and comparing it with past fraud pattern data to detect signs of fraud, and means for having an emotion engine that analyzes the emotional state based on the detected fraud pattern. This enables highly accurate fraud detection that takes into account not only the content of the fraud but also the emotional information of the user. 【0393】 An "acoustic signal" is an electrical signal obtained by converting air vibrations transmitted as sound into an electrical signal, and it is the data that forms the basis of speech understanding. 【0394】 "Acquisition means" refers to a device or method for capturing an acoustic signal and inputting it into an electronic device. 【0395】 "Recognition means" refers to a device or method that processes acquired acoustic signals and interprets them as character data. 【0396】 "Text data" refers to data in text format output from the analysis of speech, and is information used for subsequent processing. 【0397】 "Analysis means" refers to an apparatus or method used for the purpose of detecting signs of fraud by comparing textual data with past fraud patterns. 【0398】 "Means equipped with an emotion engine" refers to a device or method for analyzing the emotional state of a speaker based on text data obtained from an acoustic signal. 【0399】 The system for implementing this invention mainly consists of the cooperation between a terminal device and a server. The terminal device acquires acoustic signals using a high-performance microphone and converts them into text data in real time using a speech recognition library. Specifically, services such as Google Cloud Speech-to-Text are used. 【0400】 Text data acquired by the device is transmitted to a server via the internet. The server has analytical capabilities to compare the text data with a database of past fraud patterns. To identify signs of fraud, machine learning algorithms (e.g., TensorFlow) are implemented to perform text analysis. This analysis process also learns new fraud techniques and sentiment patterns, enabling decision-making based on the latest information. 【0401】 Furthermore, the server functions as a system equipped with an emotion engine. This analyzes the tone and intonation of the acoustic signal in real time to analyze the user's emotional state. By comparing the results of the emotion analysis with fraud patterns, it comprehensively evaluates the likelihood of fraud and generates notification information. The generated notification information is immediately sent to the user's terminal and registered communication destinations. 【0402】 As a concrete example, consider a scenario where a user receives a "same-day loan" offer over the phone. In this case, the audio signal is captured by the terminal and converted into text data. The server performs fraud pattern and sentiment analysis on the received data, and if it determines that there is a high probability of fraud, it creates an alert and notifies the user. 【0403】 Examples of prompts generated using AI models include the following: 【0404】 "Analyze the situation in which a user receives a suspicious offer via voice communication, assess the risk from both a fraudulent and emotional perspective, and create a scenario that generates an alert." 【0405】 The flow of a specific process in Application Example 2 will be explained using Figure 14. 【0406】 Step 1: 【0407】 The device uses a high-performance microphone to acquire ambient acoustic signals. The input is physical sound, and the output is an electrical audio signal. This signal is processed by a digital audio library. Specifically, the microphone captures ambient sound. 【0408】 Step 2: 【0409】 The device converts acquired audio signals into text data in real time using a speech recognition library (e.g., Google Cloud Speech-to-Text). The input is an audio signal, and the output is text data. Specifically, the process involves inputting the audio signal into a speech recognition model and converting it into text data. 【0410】 Step 3: 【0411】 The terminal sends the converted character data to the server via the internet. The input is character data, and the output is the transmission of data to the server. Specifically, the terminal uploads text data to the server using its network connection. 【0412】 Step 4: 【0413】 The server compares the received text data with a fraud pattern database to analyze signs of fraud. The input is the received text data, and the output is the fraud detection result. Specifically, it uses a machine learning algorithm to perform the comparison with the database. 【0414】 Step 5: 【0415】 The server uses an emotion engine to analyze the speaker's emotional state in relation to text data. The input is text data, and the output is the emotion analysis result. Specifically, the emotion model evaluates the tone and context of the text to generate emotion information. 【0416】 Step 6: 【0417】 The server integrates fraud detection results and sentiment analysis results to comprehensively evaluate the likelihood of fraud. The inputs are fraud detection results and sentiment analysis results, and the output is the overall evaluation and notification information. Specifically, it calculates the probability of fraud risk and generates a notification message for the user based on that. 【0418】 Step 7: 【0419】 Based on the evaluation results, if the server determines there is a risk of fraud, it generates an alert and sends notification information to the user's terminal and registered communication destinations. The input is the overall evaluation and notification information, and the output is the final user notification. Specifically, an alert message is generated and delivered to the user and communication destinations in real time. 【0420】 The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data. 【0421】 Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization. 【0422】 In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214. 【0423】 [Third Embodiment] 【0424】 Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment. 【0425】 As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server. 【0426】 The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network). 【0427】 The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52. 【0428】 The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46. 【0429】 Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision). 【0430】 Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner. 【0431】 Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56. 【0432】 The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30. 【0433】 The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290. 【0434】 In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48. 【0435】 Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal". 【0436】 This invention is a voice-based fraud detection system, particularly aimed at responding quickly to fraud targeting the elderly. The embodiments of this invention will be described below with specific examples. 【0437】 First, the user engages in everyday conversation using a dedicated device. The device is equipped with a high-performance microphone that can continuously capture ambient sound. Once sound is captured, the device's speech recognition engine converts the speech into text in real time. 【0438】 This converted text data is sent to a server. The server has a processing unit that analyzes the received text data, and by referring to a database that stores past fraud patterns, it detects specific phrases and keywords that contain signs of fraud. 【0439】 If signs of fraud are detected, the server generates an alert and sends the alert information to registered contacts, such as the user's family or the police. The server also sends a warning signal to the device to inform the user of the danger. 【0440】 For example, if a user receives a phone call asking for bank account information, the terminal captures the audio and converts it into speech recognition. The server then takes the text and compares it against a fraud pattern database to detect potentially fraudulent keywords such as "bank account" and "information request." The server instantly generates an alert and sends notifications to the user's mobile phone, email, and registered emergency contacts. 【0441】 This system also utilizes machine learning algorithms, allowing it to accumulate more data over time and improve the accuracy of fraud pattern detection. When new fraud methods emerge, their unique patterns are learned, and the database is automatically updated. 【0442】 In this way, the system of the present invention can reduce the risk of elderly people becoming victims of fraud and provide an environment in which they can communicate using voice with peace of mind. 【0443】 The following describes the processing flow. 【0444】 Step 1: 【0445】 The device continuously captures ambient sound using its built-in microphone. The sound is temporarily recorded as digital data. 【0446】 Step 2: 【0447】 The terminal converts captured audio data into text data using a speech recognition processing unit. This conversion process is performed in real time, taking into account audio interruptions and noise. 【0448】 Step 3: 【0449】 The device sends the converted text data to the server. The data is transmitted using a secure protocol and processed in a privacy-protected manner. 【0450】 Step 4: 【0451】 The server processes the received text data using an analysis device and compares it against a database of previously accumulated fraud patterns. At this stage, it checks whether specific keywords or phrases are included. 【0452】 Step 5: 【0453】 The server evaluates whether signs of fraud have been detected. If fraud is highly likely, it generates an alert and sets up its details. 【0454】 Step 6: 【0455】 The server sends the generated alert information to registered contacts. These contacts include family members and police officers who can respond to emergencies. 【0456】 Step 7: 【0457】 The device immediately alerts the user. It communicates the warning visually through audio and on the display, prompting the user to interrupt or reconsider the conversation. 【0458】 Step 8: 【0459】 The server continuously monitors subsequent conversation data and incorporates new fraud patterns into its learning database. Machine learning algorithms improve detection accuracy in the future. 【0460】 (Example 1) 【0461】 Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal." 【0462】 Fraudulent schemes targeting the elderly are becoming more sophisticated, making them vulnerable to becoming victims in their daily lives. Therefore, there is a need for technologies that can respond quickly and effectively when elderly people face the risk of fraud. Furthermore, because fraudulent methods are constantly evolving, static analysis based on past data is insufficient. 【0463】 The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means. 【0464】 In this invention, the server includes acquisition means for acquiring voice information, conversion means for converting voice information into text data, and discrimination means for comparing the text data with existing fraud pattern information to determine the possibility of fraud. This enables real-time detection of the risk of voice fraud that may occur in the daily lives of elderly people, prompt warnings, and appropriate countermeasures. 【0465】 "Means for acquiring audio information" refers to a device that senses ambient sounds and converts them into electrical signals, and has the function of capturing audio data in real time. 【0466】 "Conversion means for converting audio information into text data" refers to a technology that analyzes acquired audio data and converts it into corresponding text, and is implemented by a speech recognition engine. 【0467】 "A means of discrimination that compares text data with existing fraud pattern information to determine the possibility of fraud" refers to a process that compares converted text data with fraud pattern information in a database and automatically detects signs of fraud. 【0468】 "Notification means" refers to a means of communicating detected fraud alerts to the user and designated contacts in the form of voice messages or text messages. 【0469】 "Analysis methods" refer to techniques that utilize natural language processing technology to highly analyze text data and extract signs of fraud, and involve the use of large-scale language model algorithms. 【0470】 "Means for analysis based on learning algorithms" refers to the process of using machine learning technology to learn from analysis data and improve the accuracy of recognizing fraud patterns. 【0471】 "Means for automatically updating information" refers to technology that quickly reflects newly detected fraud patterns in the database, ensuring that decisions are always based on the latest information. 【0472】 This invention describes a specific embodiment for implementing a voice-based fraud detection system targeting the elderly. 【0473】 Users carry a dedicated terminal to run this system. This terminal is equipped with a high-performance microphone and can capture ambient sound in high quality using an audio processing chip such as Realtek. The terminal converts the acquired audio information into text data in real time using the Google Speech-to-Text API, etc. Once this process is complete, the text data is sent to the server via a secure communication protocol (e.g., HTTPS). 【0474】 When the server receives text data, it performs a detailed analysis using natural language processing techniques. This involves using large-scale language models such as BERT and GPT to identify specific phrases and keywords that may indicate fraud. Based on this information, the server compares it with a database of past fraud patterns to determine the likelihood of fraud. 【0475】 If the analysis determines that there is a risk of fraud, the server quickly generates an alert. This alert is sent via email or SMS to the user or pre-registered emergency contacts (e.g., family or security agencies). A warning signal is also sent to the device, prompting the user to be vigilant through audio and screen notifications. 【0476】 This system also features a function that continuously learns fraud patterns based on machine learning algorithms. Therefore, it maintains the ability to detect new fraud methods even when they emerge, and the database is automatically updated. 【0477】 For example, if a user receives a call asking for their bank account information, the terminal immediately converts this audio into text. After analysis by the server, the keywords "bank account" and "information provision" are recognized as potentially fraudulent. The server then generates an alert and sends a warning to the user and their emergency contacts. 【0478】 An example of a prompt message would be a system that detects signs of fraud from user voice inquiries and issues an alert. This would enable elderly people to communicate more safely in their daily lives. 【0479】 The flow of the specific processing in Example 1 will be explained using Figure 11. 【0480】 Step 1: 【0481】 The device captures ambient sound through a high-performance microphone. The input is ambient sound, and the output is digital audio data. Specifically, the device continuously collects sound and applies noise cancellation to improve the quality of the audio data. 【0482】 Step 2: 【0483】 The device converts captured audio data into text data using speech recognition technology such as the Google Speech-to-Text API. The input is digital audio data, and the output is recognized text data. Specifically, the device segments the audio data into chunks of a fixed size and passes them sequentially to the speech recognition engine. 【0484】 Step 3: 【0485】 The terminal sends the converted text data to the server via a secure communication protocol (e.g., HTTPS). The input is text data, and the output is a success message for sending the data to the server. Specifically, the terminal encodes the text data in packet format and initiates network communication. 【0486】 Step 4: 【0487】 The server analyzes the received text data by using natural language processing techniques to break it down into phrases and search for specific keywords that may indicate fraud. The input is continuous text data, and the output is a data structure showing the occurrence of keywords. Specifically, the server calls generative AI models such as BERT or GPT to analyze the data. 【0488】 Step 5: 【0489】 The server uses the analysis results to compare them with a fraud pattern database to determine signs of fraud. The input is data on the occurrence of keywords, and the output is risk assessment data that quantifies the likelihood of fraud. Specifically, the server executes database queries and calculates the degree of match with the corresponding fraud pattern. 【0490】 Step 6: 【0491】 If the server determines that there is a high risk of fraud, it generates an alert and sends an alert notification to the user and registered contacts. The input is risk assessment data, and the output is a notification via email or SMS. Specifically, the server sends emails using the SMTP protocol and sends text messages via an SMS gateway. 【0492】 Step 7: 【0493】 The server updates its machine learning model based on the collected data, continuously improving the fraud detection algorithm. The input is a new dataset, and the output is the updated trained model. Specifically, the server periodically runs a batch training process to improve the model's accuracy. 【0494】 (Application Example 1) 【0495】 Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal." 【0496】 Fraud targeting the elderly remains a serious social problem, and many of these frauds involve sophisticated voice scams. Existing prevention measures struggle to quickly and accurately detect fraud and prevent victimization. The risk is particularly high for elderly people living alone. Therefore, there is a need for a system that can detect signs of fraudulent activity from everyday voice communications and issue prompt warnings. 【0497】 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means. 【0498】 In this invention, the server includes an acoustic input means for acquiring an acoustic signal, a conversion means for converting the acoustic signal into text information, and a matching means for detecting signs of fraudulent activity by comparing the text information with past fraudulent activity pattern data. This makes it possible to quickly detect signs of fraud from everyday conversations, promptly notify registered communication recipients of warnings, and directly alert the user through dialogue means. 【0499】 "Acoustic input means" refers to devices or equipment for continuously acquiring ambient sounds and conversational sounds. 【0500】 "Conversion means" refers to a process or device that converts acquired acoustic signals into textual information using an appropriate algorithm. 【0501】 "Verification means" refers to a process or system for detecting signs of fraud by comparing converted character information with existing fraud pattern data. 【0502】 "Notification means" refers to a process or function for sending a warning to a registered recipient regarding the potential for detected fraudulent activity. 【0503】 "Dialogue means" refers to a device or function that allows a system to communicate directly with the user and provide warnings. 【0504】 "Machine learning techniques" are algorithms and technologies used to automatically learn patterns and knowledge from data. 【0505】 An "information aggregation device" is a database or system for storing collected data and detected patterns, and updating them as needed. 【0506】 This invention provides a specific embodiment for implementing a system for detecting voice fraud targeting the elderly. The system comprises an acoustic input means, a conversion means using advanced speech recognition, a matching means for detecting signs of fraudulent activity, a notification means, and a dialogue means. 【0507】 The server uses a high-performance microphone as an acoustic input method to continuously capture everyday conversations and ambient sounds. The captured acoustic signals are converted into text in real time using a cloud-based speech recognition service such as Google Cloud Speech-to-Text via a conversion device. The converted text is sent to the server and compared with past fraud pattern data using a matching device employing machine learning techniques. AI models using Python and scikit-learn are introduced to quickly and accurately analyze signs of fraud. 【0508】 If signs of fraud are detected, the server generates a warning through a notification system and sends the warning to registered communication recipients. Simultaneously, a robot installed in the home uses a dialogue system to directly warn the user of the risk of fraud. This reduces the risk of elderly people becoming victims of fraud and provides a safe communication environment. 【0509】 As a concrete example, consider a scenario where a user receives a suspicious phone call and is asked for banking information. This audio is immediately captured and compared against a fraud pattern database to detect keywords such as "bank" and "account information." The server instantly generates an alert and notifies the user's family and the police. Furthermore, the robot directly informs the user that "providing this information is dangerous." 【0510】 An example of a prompt for the generating AI model is provided in the format: "The following text is from a phone conversation. Please analyze if it may be a scam: 'Do you need my account information?'" 【0511】 The flow of a specific process in Application Example 1 will be explained using Figure 12. 【0512】 Step 1: 【0513】 The device uses a high-performance microphone to capture ambient sound. The input is ambient noise and conversation, and the output is the captured acoustic signal. Pre-processing, such as compression and noise reduction, is performed to capture the audio in real time. 【0514】 Step 2: 【0515】 The device uses a speech recognition service such as Google Cloud Speech-to-Text to convert the acoustic signal into text information. The input is the acoustic signal obtained in step 1, and the output is the converted text information. A speech recognition algorithm is applied to convert the acoustic signal into text data. 【0516】 Step 3: 【0517】 The server compares the character information with past fraudulent activity pattern data. The input is the character information obtained in step 2, and the output is information indicating fraudulent activity. Known fraudulent patterns are referenced from the database and compared with the character information. 【0518】 Step 4: 【0519】 The server uses a machine learning model to analyze signs of fraudulent activity. The input is the information about signs of fraudulent activity obtained in step 3, and the output is decision information for generating alerts. The computational methods used are models based on Python and scikit-learn. 【0520】 Step 5: 【0521】 The server generates an alert and sends a warning to registered contacts via a notification system. The input is the decision information obtained in step 4, and the output is the warning notification sent to the contacts. Warning information is quickly transmitted using a communication method. 【0522】 Step 6: 【0523】 The user's home robot uses a dialogue mechanism to directly warn the user of the risk of fraud. The input is the warning information generated in step 5, and the output is an audio warning to the user. A pre-configured message is played using speech synthesis technology. 【0524】 Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions. 【0525】 This invention is a technology that combines an emotion engine with a voice-based fraud detection system to detect potential fraud with greater accuracy and provide appropriate alerts to the user. Specific forms for implementing this system are described below. 【0526】 Users use a dedicated device for everyday communication. This device has a built-in high-performance microphone that continuously captures surrounding conversations. The captured audio is converted into text data by a speech recognition processing unit installed in the device. 【0527】 The converted text data is sent from the terminal to the server. The server has an analysis device that processes the received text data and has the function of detecting signs of fraud by comparing it with fraud patterns stored in a database. In addition, the server has an emotion engine that analyzes the emotional state from the user's voice. This allows the likelihood of fraud to be evaluated by taking into account not only the content of the voice but also the user's emotional information. 【0528】 As a concrete example, consider a scenario where a user receives a phone call requesting a loan. The device captures this audio and converts it into text data in real time. The server analyzes the text data and compares it against existing fraud patterns. Simultaneously, an emotion engine understands the user's emotional state from their tone and intonation, and comprehensively assesses the likelihood of fraud. Based on these results, the server automatically generates an alert and notifies the user and their registered contacts. 【0529】 Furthermore, this system utilizes machine learning algorithms, enabling it to continuously learn new fraud techniques and emotional patterns. Over time, the system dynamically updates its database, always providing the most up-to-date fraud prevention measures. 【0530】 In this way, the system of the present invention not only detects fraud but also takes into account the user's emotions, providing alerts based on deeper insights and enabling fraud to be prevented. 【0531】 The following describes the processing flow. 【0532】 Step 1: 【0533】 The device continuously captures audio from the user's surroundings using a built-in high-performance microphone. This audio data is temporarily recorded in digital format. 【0534】 Step 2: 【0535】 The terminal converts the captured audio into text data in real time using a speech recognition processor. The converted text data is used for analyzing the possibility of fraud. 【0536】 Step 3: 【0537】 The terminal sends the converted text data to the server. Security protocols are used for transmission to ensure the data is processed safely. 【0538】 Step 4: 【0539】 The server analyzes the received text data. The analysis device refers to a database containing accumulated fraud patterns and checks if keywords and phrases match existing fraud techniques. 【0540】 Step 5: 【0541】 The server uses an emotion engine to understand the user's emotional state from their voice. It analyzes the tone, speed, and volume of their voice to determine their psychological state. 【0542】 Step 6: 【0543】 The server comprehensively evaluates the results of the text data analysis and the user's emotional state to determine the possibility of fraud and generate an alert. 【0544】 Step 7: 【0545】 The server sends the generated alert information to registered contacts. Family members, the police, and other relevant parties capable of taking appropriate action are notified. 【0546】 Step 8: 【0547】 The device issues a warning to the user. Through voice notifications and visual alerts on the display, it prompts the user to end the conversation or double-check. 【0548】 Step 9: 【0549】 The server continuously analyzes conversations and incorporates newly detected fraud patterns and emotional states into its learning database. Machine learning algorithms are used to continuously improve the system's accuracy. 【0550】 (Example 2) 【0551】 Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal." 【0552】 Traditional voice-based fraud detection systems focus on analyzing audio content, but evaluating fraud signs solely based on text data can be insufficient for accurate detection. In particular, they fail to consider emotional shifts and subtle nuances discernible from human speech, potentially leading to missed fraud risks. Furthermore, as fraud techniques evolve daily, existing databases must be constantly updated to accommodate the latest patterns. There is a need to solve these problems and provide a highly accurate and flexible fraud detection system. 【0553】 The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means. 【0554】 In this invention, the server includes means for acquiring voice information by an integration device, processing means for converting the voice information into natural language data, analysis means for comparing the natural language data with past fraud pattern information, and emotion analysis means for analyzing the acquired voice state information and evaluating emotion information. This makes it possible to consider not only the voice content but also human emotion information, evaluate the possibility of fraud with high accuracy, and notify warning information. Furthermore, by dynamically updating the information repository, it becomes possible to respond quickly to the latest fraud methods. 【0555】 "Audio information" refers to vibration data acquired from the surrounding audio input environment, and is data that is analyzed and treated as meaningful information. 【0556】 An "integration device" is a device that collects raw data acquired from various input environments and manages and processes it centrally. 【0557】 "Natural language data" refers to a collection of data in which linguistic expressions used by humans in everyday life have been converted into a text format that can be stored digitally. 【0558】 "Processing device means" refers to a combination of hardware and software for converting or analyzing input data into a specific format. 【0559】 "Analysis device means" refers to devices and technologies that examine input data and identify and evaluate its characteristics and patterns. 【0560】 "Fraudulent pattern information" refers to a collection of information stored in a database that records the characteristics of fraudulent behavior and fraudulent activities that have occurred in the past. 【0561】 "Emotional analysis device means" refers to a device or method for analyzing the tone, pitch, and other meta-information of speech to determine the emotional state of the speaker. 【0562】 "Warning information" refers to information generated to inform the recipient of the existence of a risk when certain conditions are met. 【0563】 "Signal output device means" refers to a device or technology for transferring analyzed and evaluated information to an end user or other system. 【0564】 This invention is implemented as a fraud detection system based on voice information. This system performs the collection, analysis, and warning issuance of voice information as a series of processes. 【0565】 The user first uses a dedicated device. This device has a built-in high-performance microphone that continuously acquires ambient sound information. During this process, the microphone is equipped with noise-canceling technology, which eliminates ambient noise and allows for clearer audio acquisition. For example, even in noisy environments, the voice of a specific speaker can be clearly captured. 【0566】 The terminal is equipped with a speech recognition processing unit to convert acquired voice information into natural language data. This unit incorporates an advanced speech recognition algorithm that converts raw voice data into text format. The converted text data is securely transmitted from the terminal to the server. Encryption technology is used during this process to protect data privacy. 【0567】 The server uses an analysis device to compare received text data with past fraud pattern information. This analysis device utilizes natural language processing technology to evaluate signs of fraud from context. It also has a sentiment analyzer that determines emotional states from voice information. This sentiment evaluation can further increase the likelihood of fraud. For example, if certain word choices or tone of voice sound unnatural, it will be recognized as a sign of fraud. 【0568】 As a result, if fraud is deemed highly likely, the server automatically generates warning information and sends it to the relevant recipients via an output signaling device. This process uses machine learning algorithms to continuously learn new fraud and sentiment patterns, dynamically updating the information repository to ensure that responses are always up-to-date. 【0569】 For example, when a user requests that money be transferred, the system analyzes the content of the message and the emotional tone of the voice, and if there is a possibility of fraud, it promptly notifies the user or their registered emergency contact. 【0570】 An example of a prompt message would be, "Use this system to detect recently popular fraud techniques." In this way, the present invention is a system that provides more advanced fraud detection capabilities by simultaneously considering voice and emotional information. 【0571】 The flow of the specific processing in Example 2 will be explained using Figure 13. 【0572】 Step 1: 【0573】 The user uses a device to acquire audio information from the environment using a high-performance microphone. The microphone utilizes noise cancellation to eliminate unwanted background noise and capture clearer audio. The input is physical audio information, and the output is electronic audio data. 【0574】 Step 2: 【0575】 The device sends the received audio data to its internal speech recognition processing unit, where it converts it into natural language data. In this process, a speech recognition algorithm analyzes the audio waveform and converts it into text format. The input is audio data, and the output is text data. 【0576】 Step 3: 【0577】 The terminal sends the generated text data to the server. Before transmission, the data is encrypted to ensure its security. The input is the converted text data, and the output is the encrypted text data. 【0578】 Step 4: 【0579】 The server uses an analysis device to compare the received text data with past fraudulent patterns. Natural language processing techniques are used to analyze the data and search for signs of fraud. The input is encrypted text data, and the output is an assessment of the likelihood of fraud. 【0580】 Step 5: 【0581】 The server simultaneously performs emotion analysis of the voice. The emotion analysis device analyzes the tone and intonation of the voice obtained from the voice data and evaluates the user's emotional state. The input is voice data, and the output is the emotion evaluation result. 【0582】 Step 6: 【0583】 The server combines the analyzed text data with the sentiment evaluation results to comprehensively assess the likelihood of fraud. If a high probability of fraud is detected, the server automatically generates a warning. The input is the evaluation results of the text data and the sentiment evaluation results, and the output is the warning information. 【0584】 Step 7: 【0585】 The server sends warning information to the user and pre-registered recipients. The information is sent via email or SMS, prompting the user to take immediate action. The input is warning information, and the output is notifications via email, SMS, etc. 【0586】 Step 8: 【0587】 The server uses machine learning algorithms to learn new fraud and sentiment patterns, updating its database. This optimizes the system to constantly respond to the latest fraudulent techniques. The input is the analyzed data set, and the output is the updated database. 【0588】 (Application Example 2) 【0589】 Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal." 【0590】 Conventional voice-based fraud detection systems analyze only the content of the voice, and as fraudulent methods become more sophisticated, their accuracy and reliability are limited. Furthermore, they cannot make comprehensive judgments that take into account the user's emotional state, leading to risks of false positives and missed scams. There is a need to solve these problems and prevent fraud before it occurs. 【0591】 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means. 【0592】 In this invention, the server includes means for acquiring an acoustic signal, means for recognizing the acoustic signal and converting it into text data, means for analyzing the text data and comparing it with past fraud pattern data to detect signs of fraud, and means for having an emotion engine that analyzes the emotional state based on the detected fraud pattern. This enables highly accurate fraud detection that takes into account not only the content of the fraud but also the emotional information of the user. 【0593】 An "acoustic signal" is an electrical signal obtained by converting air vibrations transmitted as sound into an electrical signal, and it is the data that forms the basis of speech understanding. 【0594】 "Acquisition means" refers to a device or method for capturing an acoustic signal and inputting it into an electronic device. 【0595】 "Recognition means" refers to a device or method that processes acquired acoustic signals and interprets them as character data. 【0596】 "Text data" refers to data in text format output from the analysis of speech, and is information used for subsequent processing. 【0597】 "Analysis means" refers to an apparatus or method used for the purpose of detecting signs of fraud by comparing textual data with past fraud patterns. 【0598】 "Means equipped with an emotion engine" refers to a device or method for analyzing the emotional state of a speaker based on text data obtained from an acoustic signal. 【0599】 The system for implementing this invention mainly consists of the cooperation between a terminal device and a server. The terminal device acquires acoustic signals using a high-performance microphone and converts them into text data in real time using a speech recognition library. Specifically, services such as Google Cloud Speech-to-Text are used. 【0600】 Text data acquired by the device is transmitted to a server via the internet. The server has analytical capabilities to compare the text data with a database of past fraud patterns. To identify signs of fraud, machine learning algorithms (e.g., TensorFlow) are implemented to perform text analysis. This analysis process also learns new fraud techniques and sentiment patterns, enabling decision-making based on the latest information. 【0601】 Furthermore, the server functions as a system equipped with an emotion engine. This analyzes the tone and intonation of the acoustic signal in real time to analyze the user's emotional state. By comparing the results of the emotion analysis with fraud patterns, it comprehensively evaluates the likelihood of fraud and generates notification information. The generated notification information is immediately sent to the user's terminal and registered communication destinations. 【0602】 As a concrete example, consider a scenario where a user receives a "same-day loan" offer over the phone. In this case, the audio signal is captured by the terminal and converted into text data. The server performs fraud pattern and sentiment analysis on the received data, and if it determines that there is a high probability of fraud, it creates an alert and notifies the user. 【0603】 Examples of prompts generated using AI models include the following: 【0604】 "Analyze the situation in which a user receives a suspicious offer via voice communication, assess the risk from both a fraudulent and emotional perspective, and create a scenario that generates an alert." 【0605】 The flow of a specific process in Application Example 2 will be explained using Figure 14. 【0606】 Step 1: 【0607】 The device uses a high-performance microphone to acquire ambient acoustic signals. The input is physical sound, and the output is an electrical audio signal. This signal is processed by a digital audio library. Specifically, the microphone captures ambient sound. 【0608】 Step 2: 【0609】 The device converts acquired audio signals into text data in real time using a speech recognition library (e.g., Google Cloud Speech-to-Text). The input is an audio signal, and the output is text data. Specifically, the process involves inputting the audio signal into a speech recognition model and converting it into text data. 【0610】 Step 3: 【0611】 The terminal sends the converted character data to the server via the internet. The input is character data, and the output is the transmission of data to the server. Specifically, the terminal uploads text data to the server using its network connection. 【0612】 Step 4: 【0613】 The server compares the received text data with a fraud pattern database to analyze signs of fraud. The input is the received text data, and the output is the fraud detection result. Specifically, it uses a machine learning algorithm to perform the comparison with the database. 【0614】 Step 5: 【0615】 The server uses an emotion engine to analyze the speaker's emotional state in relation to text data. The input is text data, and the output is the emotion analysis result. Specifically, the emotion model evaluates the tone and context of the text to generate emotion information. 【0616】 Step 6: 【0617】 The server integrates fraud detection results and sentiment analysis results to comprehensively evaluate the likelihood of fraud. The inputs are fraud detection results and sentiment analysis results, and the output is the overall evaluation and notification information. Specifically, it calculates the probability of fraud risk and generates a notification message for the user based on that. 【0618】 Step 7: 【0619】 Based on the evaluation results, if the server determines there is a risk of fraud, it generates an alert and sends notification information to the user's terminal and registered communication destinations. The input is the overall evaluation and notification information, and the output is the final user notification. Specifically, an alert message is generated and delivered to the user and communication destinations in real time. 【0620】 The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data. 【0621】 Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization. 【0622】 In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314. 【0623】 [Fourth Embodiment] 【0624】 Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment. 【0625】 As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server. 【0626】 The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network). 【0627】 The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52. 【0628】 The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46. 【0629】 Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision). 【0630】 Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner. 【0631】 The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes. 【0632】 Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56. 【0633】 The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30. 【0634】 The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290. 【0635】 In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48. 【0636】 Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal". 【0637】 This invention is a voice-based fraud detection system, particularly aimed at responding quickly to fraud targeting the elderly. The embodiments of this invention will be described below with specific examples. 【0638】 First, the user engages in everyday conversation using a dedicated device. The device is equipped with a high-performance microphone that can continuously capture ambient sound. Once sound is captured, the device's speech recognition engine converts the speech into text in real time. 【0639】 This converted text data is sent to a server. The server has a processing unit that analyzes the received text data, and by referring to a database that stores past fraud patterns, it detects specific phrases and keywords that contain signs of fraud. 【0640】 If signs of fraud are detected, the server generates an alert and sends the alert information to registered contacts, such as the user's family or the police. The server also sends a warning signal to the device to inform the user of the danger. 【0641】 For example, if a user receives a phone call asking for bank account information, the terminal captures the audio and converts it into speech recognition. The server then takes the text and compares it against a fraud pattern database to detect potentially fraudulent keywords such as "bank account" and "information request." The server instantly generates an alert and sends notifications to the user's mobile phone, email, and registered emergency contacts. 【0642】 This system also utilizes machine learning algorithms, allowing it to accumulate more data over time and improve the accuracy of fraud pattern detection. When new fraud methods emerge, their unique patterns are learned, and the database is automatically updated. 【0643】 In this way, the system of the present invention can reduce the risk of elderly people becoming victims of fraud and provide an environment in which they can communicate using voice with peace of mind. 【0644】 The following describes the processing flow. 【0645】 Step 1: 【0646】 The device continuously captures ambient sound using its built-in microphone. The sound is temporarily recorded as digital data. 【0647】 Step 2: 【0648】 The terminal converts captured audio data into text data using a speech recognition processing unit. This conversion process is performed in real time, taking into account audio interruptions and noise. 【0649】 Step 3: 【0650】 The device sends the converted text data to the server. The data is transmitted using a secure protocol and processed in a privacy-protected manner. 【0651】 Step 4: 【0652】 The server processes the received text data using an analysis device and compares it against a database of previously accumulated fraud patterns. At this stage, it checks whether specific keywords or phrases are included. 【0653】 Step 5: 【0654】 The server evaluates whether signs of fraud have been detected. If fraud is highly likely, it generates an alert and sets up its details. 【0655】 Step 6: 【0656】 The server sends the generated alert information to registered contacts. These contacts include family members and police officers who can respond to emergencies. 【0657】 Step 7: 【0658】 The device immediately alerts the user. It communicates the warning visually through audio and on the display, prompting the user to interrupt or reconsider the conversation. 【0659】 Step 8: 【0660】 The server continuously monitors subsequent conversation data and incorporates new fraud patterns into its learning database. Machine learning algorithms improve detection accuracy in the future. 【0661】 (Example 1) 【0662】 Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal". 【0663】 Fraudulent schemes targeting the elderly are becoming more sophisticated, making them vulnerable to becoming victims in their daily lives. Therefore, there is a need for technologies that can respond quickly and effectively when elderly people face the risk of fraud. Furthermore, because fraudulent methods are constantly evolving, static analysis based on past data is insufficient. 【0664】 The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means. 【0665】 In this invention, the server includes acquisition means for acquiring voice information, conversion means for converting voice information into text data, and discrimination means for comparing the text data with existing fraud pattern information to determine the possibility of fraud. This enables real-time detection of the risk of voice fraud that may occur in the daily lives of elderly people, prompt warnings, and appropriate countermeasures. 【0666】 "Means for acquiring audio information" refers to a device that senses ambient sounds and converts them into electrical signals, and has the function of capturing audio data in real time. 【0667】 "Conversion means for converting audio information into text data" refers to a technology that analyzes acquired audio data and converts it into corresponding text, and is implemented by a speech recognition engine. 【0668】 "A means of discrimination that compares text data with existing fraud pattern information to determine the possibility of fraud" refers to a process that compares converted text data with fraud pattern information in a database and automatically detects signs of fraud. 【0669】 "Notification means" refers to a means of communicating detected fraud alerts to the user and designated contacts in the form of voice messages or text messages. 【0670】 "Analysis methods" refer to techniques that utilize natural language processing technology to highly analyze text data and extract signs of fraud, and involve the use of large-scale language model algorithms. 【0671】 "Means for analysis based on learning algorithms" refers to the process of using machine learning technology to learn from analysis data and improve the accuracy of recognizing fraud patterns. 【0672】 "Means for automatically updating information" refers to technology that quickly reflects newly detected fraud patterns in the database, ensuring that decisions are always based on the latest information. 【0673】 This invention describes a specific embodiment for implementing a voice-based fraud detection system targeting the elderly. 【0674】 Users carry a dedicated terminal to run this system. This terminal is equipped with a high-performance microphone and can capture ambient sound in high quality using an audio processing chip such as Realtek. The terminal converts the acquired audio information into text data in real time using the Google Speech-to-Text API, etc. Once this process is complete, the text data is sent to the server via a secure communication protocol (e.g., HTTPS). 【0675】 When the server receives text data, it performs a detailed analysis using natural language processing techniques. This involves using large-scale language models such as BERT and GPT to identify specific phrases and keywords that may indicate fraud. Based on this information, the server compares it with a database of past fraud patterns to determine the likelihood of fraud. 【0676】 If the analysis determines that there is a risk of fraud, the server quickly generates an alert. This alert is sent via email or SMS to the user or pre-registered emergency contacts (e.g., family or security agencies). A warning signal is also sent to the device, prompting the user to be vigilant through audio and screen notifications. 【0677】 This system also features a function that continuously learns fraud patterns based on machine learning algorithms. Therefore, it maintains the ability to detect new fraud methods even when they emerge, and the database is automatically updated. 【0678】 For example, if a user receives a call asking for their bank account information, the terminal immediately converts this audio into text. After analysis by the server, the keywords "bank account" and "information provision" are recognized as potentially fraudulent. The server then generates an alert and sends a warning to the user and their emergency contacts. 【0679】 An example of a prompt message would be a system that detects signs of fraud from user voice inquiries and issues an alert. This would enable elderly people to communicate more safely in their daily lives. 【0680】 The flow of the specific processing in Example 1 will be explained using Figure 11. 【0681】 Step 1: 【0682】 The device captures ambient sound through a high-performance microphone. The input is ambient sound, and the output is digital audio data. Specifically, the device continuously collects sound and applies noise cancellation to improve the quality of the audio data. 【0683】 Step 2: 【0684】 The device converts captured audio data into text data using speech recognition technology such as the Google Speech-to-Text API. The input is digital audio data, and the output is recognized text data. Specifically, the device segments the audio data into chunks of a fixed size and passes them sequentially to the speech recognition engine. 【0685】 Step 3: 【0686】 The terminal sends the converted text data to the server via a secure communication protocol (e.g., HTTPS). The input is text data, and the output is a success message for sending the data to the server. Specifically, the terminal encodes the text data in packet format and initiates network communication. 【0687】 Step 4: 【0688】 The server analyzes the received text data by using natural language processing techniques to break it down into phrases and search for specific keywords that may indicate fraud. The input is continuous text data, and the output is a data structure showing the occurrence of keywords. Specifically, the server calls generative AI models such as BERT or GPT to analyze the data. 【0689】 Step 5: 【0690】 The server uses the analysis results to compare them with a fraud pattern database to determine signs of fraud. The input is data on the occurrence of keywords, and the output is risk assessment data that quantifies the likelihood of fraud. Specifically, the server executes database queries and calculates the degree of match with the corresponding fraud pattern. 【0691】 Step 6: 【0692】 If the server determines that there is a high risk of fraud, it generates an alert and sends an alert notification to the user and registered contacts. The input is risk assessment data, and the output is a notification via email or SMS. Specifically, the server sends emails using the SMTP protocol and sends text messages via an SMS gateway. 【0693】 Step 7: 【0694】 The server updates its machine learning model based on the collected data, continuously improving the fraud detection algorithm. The input is a new dataset, and the output is the updated trained model. Specifically, the server periodically runs a batch training process to improve the model's accuracy. 【0695】 (Application Example 1) 【0696】 Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal". 【0697】 Fraud targeting the elderly remains a serious social problem, and many of these frauds involve sophisticated voice scams. Existing prevention measures struggle to quickly and accurately detect fraud and prevent victimization. The risk is particularly high for elderly people living alone. Therefore, there is a need for a system that can detect signs of fraudulent activity from everyday voice communications and issue prompt warnings. 【0698】 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means. 【0699】 In this invention, the server includes an acoustic input means for acquiring an acoustic signal, a conversion means for converting the acoustic signal into text information, and a matching means for detecting signs of fraudulent activity by comparing the text information with past fraudulent activity pattern data. This makes it possible to quickly detect signs of fraud from everyday conversations, promptly notify registered communication recipients of warnings, and directly alert the user through dialogue means. 【0700】 "Acoustic input means" refers to devices or equipment for continuously acquiring ambient sounds and conversational sounds. 【0701】 "Conversion means" refers to a process or device that converts acquired acoustic signals into textual information using an appropriate algorithm. 【0702】 "Verification means" refers to a process or system for detecting signs of fraud by comparing converted character information with existing fraud pattern data. 【0703】 "Notification means" refers to a process or function for sending a warning to a registered recipient regarding the potential for detected fraudulent activity. 【0704】 "Dialogue means" refers to a device or function that allows a system to communicate directly with the user and provide warnings. 【0705】 "Machine learning techniques" are algorithms and technologies used to automatically learn patterns and knowledge from data. 【0706】 An "information aggregation device" is a database or system for storing collected data and detected patterns, and updating them as needed. 【0707】 This invention provides a specific embodiment for implementing a system for detecting voice fraud targeting the elderly. The system comprises an acoustic input means, a conversion means using advanced speech recognition, a matching means for detecting signs of fraudulent activity, a notification means, and a dialogue means. 【0708】 The server uses a high-performance microphone as an acoustic input method to continuously capture everyday conversations and ambient sounds. The captured acoustic signals are converted into text in real time using a cloud-based speech recognition service such as Google Cloud Speech-to-Text via a conversion device. The converted text is sent to the server and compared with past fraud pattern data using a matching device employing machine learning techniques. AI models using Python and scikit-learn are introduced to quickly and accurately analyze signs of fraud. 【0709】 If signs of fraud are detected, the server generates a warning through a notification system and sends the warning to registered communication recipients. Simultaneously, a robot installed in the home uses a dialogue system to directly warn the user of the risk of fraud. This reduces the risk of elderly people becoming victims of fraud and provides a safe communication environment. 【0710】 As a concrete example, consider a scenario where a user receives a suspicious phone call and is asked for banking information. This audio is immediately captured and compared against a fraud pattern database to detect keywords such as "bank" and "account information." The server instantly generates an alert and notifies the user's family and the police. Furthermore, the robot directly informs the user that "providing this information is dangerous." 【0711】 An example of a prompt for the generating AI model is provided in the format: "The following text is from a phone conversation. Please analyze if it may be a scam: 'Do you need my account information?'" 【0712】 The flow of a specific process in Application Example 1 will be explained using Figure 12. 【0713】 Step 1: 【0714】 The device uses a high-performance microphone to capture ambient sound. The input is ambient noise and conversation, and the output is the captured acoustic signal. Pre-processing, such as compression and noise reduction, is performed to capture the audio in real time. 【0715】 Step 2: 【0716】 The device uses a speech recognition service such as Google Cloud Speech-to-Text to convert the acoustic signal into text information. The input is the acoustic signal obtained in step 1, and the output is the converted text information. A speech recognition algorithm is applied to convert the acoustic signal into text data. 【0717】 Step 3: 【0718】 The server compares the character information with past fraudulent activity pattern data. The input is the character information obtained in step 2, and the output is information indicating fraudulent activity. Known fraudulent patterns are referenced from the database and compared with the character information. 【0719】 Step 4: 【0720】 The server uses a machine learning model to analyze signs of fraudulent activity. The input is the information about signs of fraudulent activity obtained in step 3, and the output is decision information for generating alerts. The computational methods used are models based on Python and scikit-learn. 【0721】 Step 5: 【0722】 The server generates an alert and sends a warning to registered contacts via a notification system. The input is the decision information obtained in step 4, and the output is the warning notification sent to the contacts. Warning information is quickly transmitted using a communication method. 【0723】 Step 6: 【0724】 The user's home robot uses a dialogue mechanism to directly warn the user of the risk of fraud. The input is the warning information generated in step 5, and the output is an audio warning to the user. A pre-configured message is played using speech synthesis technology. 【0725】 Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions. 【0726】 This invention is a technology that combines an emotion engine with a voice-based fraud detection system to detect potential fraud with greater accuracy and provide appropriate alerts to the user. Specific forms for implementing this system are described below. 【0727】 Users use a dedicated device for everyday communication. This device has a built-in high-performance microphone that continuously captures surrounding conversations. The captured audio is converted into text data by a speech recognition processing unit installed in the device. 【0728】 The converted text data is sent from the terminal to the server. The server has an analysis device that processes the received text data and has the function of detecting signs of fraud by comparing it with fraud patterns stored in a database. In addition, the server has an emotion engine that analyzes the emotional state from the user's voice. This allows the likelihood of fraud to be evaluated by taking into account not only the content of the voice but also the user's emotional information. 【0729】 As a concrete example, consider a scenario where a user receives a phone call requesting a loan. The device captures this audio and converts it into text data in real time. The server analyzes the text data and compares it against existing fraud patterns. Simultaneously, an emotion engine understands the user's emotional state from their tone and intonation, and comprehensively assesses the likelihood of fraud. Based on these results, the server automatically generates an alert and notifies the user and their registered contacts. 【0730】 Furthermore, this system utilizes machine learning algorithms, enabling it to continuously learn new fraud techniques and emotional patterns. Over time, the system dynamically updates its database, always providing the most up-to-date fraud prevention measures. 【0731】 In this way, the system of the present invention not only detects fraud but also takes into account the user's emotions, providing alerts based on deeper insights and enabling fraud to be prevented. 【0732】 The following describes the processing flow. 【0733】 Step 1: 【0734】 The device continuously captures audio from the user's surroundings using a built-in high-performance microphone. This audio data is temporarily recorded in digital format. 【0735】 Step 2: 【0736】 The terminal converts the captured audio into text data in real time using a speech recognition processor. The converted text data is used for analyzing the possibility of fraud. 【0737】 Step 3: 【0738】 The terminal sends the converted text data to the server. Security protocols are used for transmission to ensure the data is processed safely. 【0739】 Step 4: 【0740】 The server analyzes the received text data. The analysis device refers to a database containing accumulated fraud patterns and checks if keywords and phrases match existing fraud techniques. 【0741】 Step 5: 【0742】 The server uses an emotion engine to understand the user's emotional state from their voice. It analyzes the tone, speed, and volume of their voice to determine their psychological state. 【0743】 Step 6: 【0744】 The server comprehensively evaluates the results of the text data analysis and the user's emotional state to determine the possibility of fraud and generate an alert. 【0745】 Step 7: 【0746】 The server sends the generated alert information to registered contacts. Family members, the police, and other relevant parties capable of taking appropriate action are notified. 【0747】 Step 8: 【0748】 The device issues a warning to the user. Through voice notifications and visual alerts on the display, it prompts the user to end the conversation or double-check. 【0749】 Step 9: 【0750】 The server continuously analyzes conversations and incorporates newly detected fraud patterns and emotional states into its learning database. Machine learning algorithms are used to continuously improve the system's accuracy. 【0751】 (Example 2) 【0752】 Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal". 【0753】 Traditional voice-based fraud detection systems focus on analyzing audio content, but evaluating fraud signs solely based on text data can be insufficient for accurate detection. In particular, they fail to consider emotional shifts and subtle nuances discernible from human speech, potentially leading to missed fraud risks. Furthermore, as fraud techniques evolve daily, existing databases must be constantly updated to accommodate the latest patterns. There is a need to solve these problems and provide a highly accurate and flexible fraud detection system. 【0754】 The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means. 【0755】 In this invention, the server includes means for acquiring voice information by an integration device, processing means for converting the voice information into natural language data, analysis means for comparing the natural language data with past fraud pattern information, and emotion analysis means for analyzing the acquired voice state information and evaluating emotion information. This makes it possible to consider not only the voice content but also human emotion information, evaluate the possibility of fraud with high accuracy, and notify warning information. Furthermore, by dynamically updating the information repository, it becomes possible to respond quickly to the latest fraud methods. 【0756】 "Audio information" refers to vibration data acquired from the surrounding audio input environment, and is data that is analyzed and treated as meaningful information. 【0757】 An "integration device" is a device that collects raw data acquired from various input environments and manages and processes it centrally. 【0758】 "Natural language data" refers to a collection of data in which linguistic expressions used by humans in everyday life have been converted into a text format that can be stored digitally. 【0759】 "Processing device means" refers to a combination of hardware and software for converting or analyzing input data into a specific format. 【0760】 "Analysis device means" refers to devices and technologies that examine input data and identify and evaluate its characteristics and patterns. 【0761】 "Fraudulent pattern information" refers to a collection of information stored in a database that records the characteristics of fraudulent behavior and fraudulent activities that have occurred in the past. 【0762】 "Emotional analysis device means" refers to a device or method for analyzing the tone, pitch, and other meta-information of speech to determine the emotional state of the speaker. 【0763】 "Warning information" refers to information generated to inform the recipient of the existence of a risk when certain conditions are met. 【0764】 "Signal output device means" refers to a device or technology for transferring analyzed and evaluated information to an end user or other system. 【0765】 This invention is implemented as a fraud detection system based on voice information. This system performs the collection, analysis, and warning issuance of voice information as a series of processes. 【0766】 The user first uses a dedicated device. This device has a built-in high-performance microphone that continuously acquires ambient sound information. During this process, the microphone is equipped with noise-canceling technology, which eliminates ambient noise and allows for clearer audio acquisition. For example, even in noisy environments, the voice of a specific speaker can be clearly captured. 【0767】 The terminal is equipped with a speech recognition processing unit to convert acquired voice information into natural language data. This unit incorporates an advanced speech recognition algorithm that converts raw voice data into text format. The converted text data is securely transmitted from the terminal to the server. Encryption technology is used during this process to protect data privacy. 【0768】 The server uses an analysis device to compare received text data with past fraud pattern information. This analysis device utilizes natural language processing technology to evaluate signs of fraud from context. It also has a sentiment analyzer that determines emotional states from voice information. This sentiment evaluation can further increase the likelihood of fraud. For example, if certain word choices or tone of voice sound unnatural, it will be recognized as a sign of fraud. 【0769】 As a result, if fraud is deemed highly likely, the server automatically generates warning information and sends it to the relevant recipients via an output signaling device. This process uses machine learning algorithms to continuously learn new fraud and sentiment patterns, dynamically updating the information repository to ensure that responses are always up-to-date. 【0770】 For example, when a user requests that money be transferred, the system analyzes the content of the message and the emotional tone of the voice, and if there is a possibility of fraud, it promptly notifies the user or their registered emergency contact. 【0771】 An example of a prompt message would be, "Use this system to detect recently popular fraud techniques." In this way, the present invention is a system that provides more advanced fraud detection capabilities by simultaneously considering voice and emotional information. 【0772】 The flow of the specific processing in Example 2 will be explained using Figure 13. 【0773】 Step 1: 【0774】 The user uses a device to acquire audio information from the environment using a high-performance microphone. The microphone utilizes noise cancellation to eliminate unwanted background noise and capture clearer audio. The input is physical audio information, and the output is electronic audio data. 【0775】 Step 2: 【0776】 The device sends the received audio data to its internal speech recognition processing unit, where it converts it into natural language data. In this process, a speech recognition algorithm analyzes the audio waveform and converts it into text format. The input is audio data, and the output is text data. 【0777】 Step 3: 【0778】 The terminal sends the generated text data to the server. Before transmission, the data is encrypted to ensure its security. The input is the converted text data, and the output is the encrypted text data. 【0779】 Step 4: 【0780】 The server uses an analysis device to compare the received text data with past fraudulent patterns. Natural language processing techniques are used to analyze the data and search for signs of fraud. The input is encrypted text data, and the output is an assessment of the likelihood of fraud. 【0781】 Step 5: 【0782】 The server simultaneously performs emotion analysis of the voice. The emotion analysis device analyzes the tone and intonation of the voice obtained from the voice data and evaluates the user's emotional state. The input is voice data, and the output is the emotion evaluation result. 【0783】 Step 6: 【0784】 The server combines the analyzed text data with the sentiment evaluation results to comprehensively assess the likelihood of fraud. If a high probability of fraud is detected, the server automatically generates a warning. The input is the evaluation results of the text data and the sentiment evaluation results, and the output is the warning information. 【0785】 Step 7: 【0786】 The server sends warning information to the user and pre-registered recipients. The information is sent via email or SMS, prompting the user to take immediate action. The input is warning information, and the output is notifications via email, SMS, etc. 【0787】 Step 8: 【0788】 The server uses machine learning algorithms to learn new fraud and sentiment patterns, updating its database. This optimizes the system to constantly respond to the latest fraudulent techniques. The input is the analyzed data set, and the output is the updated database. 【0789】 (Application Example 2) 【0790】 Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal". 【0791】 Conventional voice-based fraud detection systems analyze only the content of the voice, and as fraudulent methods become more sophisticated, their accuracy and reliability are limited. Furthermore, they cannot make comprehensive judgments that take into account the user's emotional state, leading to risks of false positives and missed scams. There is a need to solve these problems and prevent fraud before it occurs. 【0792】 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means. 【0793】 In this invention, the server includes means for acquiring an acoustic signal, means for recognizing the acoustic signal and converting it into text data, means for analyzing the text data and comparing it with past fraud pattern data to detect signs of fraud, and means for having an emotion engine that analyzes the emotional state based on the detected fraud pattern. This enables highly accurate fraud detection that takes into account not only the content of the fraud but also the emotional information of the user. 【0794】 An "acoustic signal" is an electrical signal obtained by converting air vibrations transmitted as sound into an electrical signal, and it is the data that forms the basis of speech understanding. 【0795】 "Acquisition means" refers to a device or method for capturing an acoustic signal and inputting it into an electronic device. 【0796】 "Recognition means" refers to a device or method that processes acquired acoustic signals and interprets them as character data. 【0797】 "Text data" refers to data in text format output from the analysis of speech, and is information used for subsequent processing. 【0798】 "Analysis means" refers to an apparatus or method used for the purpose of detecting signs of fraud by comparing textual data with past fraud patterns. 【0799】 "Means equipped with an emotion engine" refers to a device or method for analyzing the emotional state of a speaker based on text data obtained from an acoustic signal. 【0800】 The system for implementing this invention mainly consists of the cooperation between a terminal device and a server. The terminal device acquires acoustic signals using a high-performance microphone and converts them into text data in real time using a speech recognition library. Specifically, services such as Google Cloud Speech-to-Text are used. 【0801】 Text data acquired by the device is transmitted to a server via the internet. The server has analytical capabilities to compare the text data with a database of past fraud patterns. To identify signs of fraud, machine learning algorithms (e.g., TensorFlow) are implemented to perform text analysis. This analysis process also learns new fraud techniques and sentiment patterns, enabling decision-making based on the latest information. 【0802】 Furthermore, the server functions as a system equipped with an emotion engine. This analyzes the tone and intonation of the acoustic signal in real time to analyze the user's emotional state. By comparing the results of the emotion analysis with fraud patterns, it comprehensively evaluates the likelihood of fraud and generates notification information. The generated notification information is immediately sent to the user's terminal and registered communication destinations. 【0803】 As a concrete example, consider a scenario where a user receives a "same-day loan" offer over the phone. In this case, the audio signal is captured by the terminal and converted into text data. The server performs fraud pattern and sentiment analysis on the received data, and if it determines that there is a high probability of fraud, it creates an alert and notifies the user. 【0804】 Examples of prompts generated using AI models include the following: 【0805】 "Analyze the situation in which a user receives a suspicious offer via voice communication, assess the risk from both a fraudulent and emotional perspective, and create a scenario that generates an alert." 【0806】 The flow of a specific process in Application Example 2 will be explained using Figure 14. 【0807】 Step 1: 【0808】 The device uses a high-performance microphone to acquire ambient acoustic signals. The input is physical sound, and the output is an electrical audio signal. This signal is processed by a digital audio library. Specifically, the microphone captures ambient sound. 【0809】 Step 2: 【0810】 The device converts acquired audio signals into text data in real time using a speech recognition library (e.g., Google Cloud Speech-to-Text). The input is an audio signal, and the output is text data. Specifically, the process involves inputting the audio signal into a speech recognition model and converting it into text data. 【0811】 Step 3: 【0812】 The terminal sends the converted character data to the server via the internet. The input is character data, and the output is the transmission of data to the server. Specifically, the terminal uploads text data to the server using its network connection. 【0813】 Step 4: 【0814】 The server compares the received text data with a fraud pattern database to analyze signs of fraud. The input is the received text data, and the output is the fraud detection result. Specifically, it uses a machine learning algorithm to perform the comparison with the database. 【0815】 Step 5: 【0816】 The server uses an emotion engine to analyze the speaker's emotional state in relation to text data. The input is text data, and the output is the emotion analysis result. Specifically, the emotion model evaluates the tone and context of the text to generate emotion information. 【0817】 Step 6: 【0818】 The server integrates fraud detection results and sentiment analysis results to comprehensively evaluate the likelihood of fraud. The inputs are fraud detection results and sentiment analysis results, and the output is the overall evaluation and notification information. Specifically, it calculates the probability of fraud risk and generates a notification message for the user based on that. 【0819】 Step 7: 【0820】 Based on the evaluation results, if the server determines there is a risk of fraud, it generates an alert and sends notification information to the user's terminal and registered communication destinations. The input is the overall evaluation and notification information, and the output is the final user notification. Specifically, an alert message is generated and delivered to the user and communication destinations in real time. 【0821】 The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data. 【0822】 Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization. 【0823】 In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414. 【0824】 Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion. 【0825】 Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together. 【0826】 These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression. 【0827】 The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become. 【0828】 Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant. 【0829】 The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more." 【0830】 The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values. 【0831】 The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format. 【0832】 In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data. 【0833】 In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56. 【0834】 Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12. 【0835】 Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56. 【0836】 The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory. 【0837】 The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor. 【0838】 Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources. 【0839】 Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose. 【0840】 The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above. 【0841】 All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference. 【0842】 The following is further disclosed regarding the embodiments described above. 【0843】 (Claim 1) 【0844】 An input device for capturing audio, 【0845】 A speech recognition processing device for converting speech into text data, 【0846】 A processing device that compares text data with past fraud pattern data to detect signs of fraud, 【0847】 An output device that outputs an alert about the possibility of fraud and notifies registered contacts, 【0848】 A system that includes this. 【0849】 (Claim 2) 【0850】 The system according to claim 1, further comprising a processing unit for analyzing signs of fraud based on a machine learning algorithm. 【0851】 (Claim 3) 【0852】 The system according to claim 1, further comprising a processing device for dynamically updating a database based on detected fraud patterns. 【0853】 【0854】 "Example 1" 【0855】 (Claim 1) 【0856】 A means for acquiring audio information, 【0857】 A conversion means for converting audio information into text data, 【0858】 A means of discrimination that compares text data with existing fraud pattern information in order to determine the possibility of fraud, 【0859】 A notification system that outputs an alert about the possibility of fraud and notifies registered contact information, 【0860】 An analysis method that utilizes natural language processing technology to analyze character data, 【0861】 A system that includes this. 【0862】 (Claim 2) 【0863】 The system according to claim 1, further comprising means for analyzing signs of fraud based on a learning algorithm. 【0864】 (Claim 3) 【0865】 The system according to claim 1, further comprising means for automatically updating information based on identified fraud patterns. 【0866】 "Application Example 1" 【0867】 (Claim 1) 【0868】 An acoustic input means for acquiring an acoustic signal, 【0869】 A conversion means for converting acoustic signals into text information, 【0870】 A matching means for detecting signs of fraudulent activity by comparing textual information with past fraudulent activity pattern data, 【0871】 A notification mechanism for outputting a warning about the possibility of fraudulent activity and notifying registered communication recipients, 【0872】 A means of dialogue for directly interacting with humans and providing warnings, 【0873】 A system that includes this. 【0874】 (Claim 2) 【0875】 The system according to claim 1, further comprising analytical means for analyzing signs of fraudulent activity based on machine learning techniques. 【0876】 (Claim 3) 【0877】 The system according to claim 1, further comprising update means for dynamically updating an information accumulating device based on detected fraudulent activity patterns. 【0878】 "Example 2 of combining an emotion engine" 【0879】 (Claim 1) 【0880】 A means for acquiring audio information by an integrated device, 【0881】 Processing device for converting speech information into natural language data, 【0882】 An analysis device means for comparing natural language data with past fraudulent pattern information, 【0883】 An emotion analysis device means that analyzes acquired voice state information and evaluates emotion information, 【0884】 A signal output device means that generates warning information considering the possibility of fraud and notifies relevant information recipients, 【0885】 A system that includes this. 【0886】 (Claim 2) 【0887】 The system according to claim 1, further comprising analytical device means for analyzing signs of fraud based on a machine learning algorithm. 【0888】 (Claim 3) 【0889】 The system according to claim 1, further comprising an analysis device means for dynamically updating the information repository based on detected fraudulent patterns. 【0890】 "Application example 2 when combining with an emotional engine" 【0891】 (Claim 1) 【0892】 A means for acquiring an acoustic signal, 【0893】 A recognition means for converting acoustic signals into text data, 【0894】 An analytical method that compares text data with past fraud pattern data in order to detect signs of fraud, 【0895】 A notification method that outputs information about the possibility of fraud and transmits it to registered communication recipients, 【0896】 A means equipped with an emotion engine that analyzes emotional states based on detected fraud patterns, 【0897】 A system that includes this. 【0898】 (Claim 2) 【0899】 The system according to claim 1, comprising a learning means for analyzing signs of fraud based on a machine learning algorithm and learning novel fraud methods and emotional patterns. 【0900】 (Claim 3) 【0901】 The system according to claim 1, further comprising update means for dynamically updating an information set based on detected fraud patterns and sentiment analysis results. [Explanation of Symbols] 【0902】 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
[Claim 1] An input device for capturing audio, A speech recognition processing device for converting speech into text data, A processing device that compares text data with past fraud pattern data to detect signs of fraud, An output device that outputs an alert about the possibility of fraud and notifies registered contacts, A system that includes this. [Claim 2] The system according to claim 1, further comprising a processing unit for analyzing signs of fraud based on a machine learning algorithm. [Claim 3] The system according to claim 1, further comprising a processing device for dynamically updating a database based on detected fraud patterns.
Citation Information
Patent Citations
Persona chatbot control method and system
JP2022180282A