system

A real-time fraud detection system for the elderly uses voice data processing and machine learning to identify and alert potential fraud, addressing the challenge of sophisticated scams targeting the elderly.

JP2026105506APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-16
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Elderly individuals are increasingly targeted by sophisticated frauds, and existing systems struggle to detect these in real time, leading to economic and mental damage.

Method used

A system that collects user voices in real time, processes the voice data through speech recognition and natural language processing, and uses machine learning to detect potential fraud, notifying family members or law enforcement via an alarm system.

Benefits of technology

Effectively prevents fraudulent activities by quickly alerting relevant parties, adapting to new fraud patterns, and ensuring the safety of elderly individuals.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026105506000001_ABST
    Figure 2026105506000001_ABST
Patent Text Reader

Abstract

Provide a system. 【Solution means】 A device for acquiring the user's voice, Conversion means for converting the voice into data, Communication means for securely transferring the data, Recognition means for converting the data into text information, Evaluation means for analyzing the text information and evaluating the possibility of fraud, Alarm means for issuing a warning when it is determined that there is a possibility of fraud, Display means that is worn by the user and visually displays the warning, External communication means for sending notifications to family members or supporters, A system including the above.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including the steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of the chatbot's character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance that responds to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In recent years, fraud targeting the elderly has been increasing, and as a result, many elderly people have suffered economic and mental damage. Such fraud is sophisticated, and it is difficult for the elderly themselves to take countermeasures. Therefore, an effective preventive measure that can detect these frauds in advance and notify them promptly is required.

Means for Solving the Problems

[0005] Note: There seems to be an error in the original text where the "

発明が解決しようとする課題

Problems to be Solved by the Invention

課題を解決するための手段

[0006] "User" refers to an individual whose voice is collected through this system and who receives fraud detection services.

[0007] "A device for collecting sound" refers to a device that includes microphones and peripheral equipment used to capture users' conversations in real time.

[0008] A "processing device" refers to a device or program used to convert collected audio data into a format that is easy to analyze.

[0009] A "communication device" refers to a device equipped with network connectivity for securely transmitting processed audio data to a remote server.

[0010] "Speech recognition device" refers to software and hardware used to convert speech data into text data.

[0011] "Analysis device" refers to a device or program used to evaluate whether text data obtained through speech recognition has the potential to be fraudulent.

[0012] An "alarm system" refers to a device that notifies the relevant individual or organization when it is determined that there is a high probability of fraud.

[0013] A "fraud detection system" refers to the entire system that combines the above-mentioned devices to detect and prevent fraudulent activities. [Brief explanation of the drawing]

[0014] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Embodiments for Carrying Out the Invention

[0015] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0016] First, the terms used in the following description will be explained.

[0017] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0018] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0019] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0020] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0022] [First Embodiment]

[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0035] This invention provides a fraud detection system that monitors user conversations in real time and detects potential fraud. This system mainly consists of a series of devices and methods for collecting, analyzing, and detecting anomalies in voice data.

[0036] The device is equipped with a microphone that continuously collects the user's conversations. The collected audio data is preprocessed, including noise reduction and volume normalization, and converted into a format that is easy to process. This audio data is encrypted and transmitted to the server via a communication device.

[0037] The server converts the received audio data into text using a speech recognition device. The text obtained through speech recognition is then analyzed in detail by an analysis device. Referring to past fraud data, if keywords or phrases indicating potential fraud are detected, patterns suggesting fraud are identified.

[0038] When a fraudulent activity is deemed highly likely, the server activates an alarm system and sends a notification to pre-registered family members or law enforcement agencies. This notification is sent via SMS, email, or push notification to a dedicated application. This allows for a swift response to prevent users from becoming victims of fraudulent activity.

[0039] As a concrete example, when an elderly person receives a phone call, the call is automatically monitored by the system. If phrases suggesting fraud, such as "transfer money," are heard during the call, the system detects this and immediately sends a warning notification to the family. The family can then receive this notification and contact the elderly person directly to confirm whether a problem has occurred.

[0040] In this way, the system utilizes machine learning models and updates them with the latest data to adapt to new fraud methods. This makes it possible to effectively protect the elderly from fraud in real time.

[0041] The following describes the processing flow.

[0042] Step 1:

[0043] The device continuously collects ambient sound from the user's surroundings using a microphone. During this process, the device automatically performs noise reduction to ensure clear audio at an appropriate volume.

[0044] Step 2:

[0045] The terminal converts the collected voice data into a standard communication format and compresses the data as needed. This process reduces the load associated with transmitting data.

[0046] Step 3:

[0047] The device encrypts the voice data and sends it to the server via the internet connection. This ensures data security and privacy.

[0048] Step 4:

[0049] The server decrypts the received encrypted data and sends it to the speech recognition engine. The speech recognition engine converts the speech data into text data.

[0050] Step 5:

[0051] The server passes text data to an analysis device, which uses natural language processing to search for keywords and phrases that may indicate potential fraud. The analysis device uses machine learning models to detect anomalies.

[0052] Step 6:

[0053] The server evaluates the likelihood of fraud based on the analysis results. If signs of fraud are detected, it generates an appropriate alert.

[0054] Step 7:

[0055] The server sends alerts to the device or designated contacts. Notification methods include SMS, email, and notifications via a dedicated app.

[0056] Step 8:

[0057] Users receive notifications and respond according to pre-configured instructions. For example, they may be advised to immediately end a call or consult with a specific contact.

[0058] (Example 1)

[0059] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0060] In modern society, fraudulent activities are becoming increasingly sophisticated, and there is a particular problem in that the elderly and people unfamiliar with technology are often targeted. Traditional methods make it difficult to detect fraudulent activities or respond immediately, which can result in the damage being exacerbated. In response to this, there is a need to develop a system that uses voice data to detect signs of fraud in real time and issue a rapid warning.

[0061] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0062] In this invention, the server includes speech recognition means for converting audio data into text, analysis means for evaluating the likelihood of fraud, and detection means for identifying keywords or phrases that indicate the likelihood of fraud. This makes it possible to detect signs of fraud from audio in real time and issue an immediate warning before the user becomes involved in fraudulent activity.

[0063] "Voice acquisition means" refers to a device or function for continuously collecting the voice emitted by a user in real time.

[0064] "Preprocessing means" refers to a device or function that performs noise reduction or volume normalization in order to convert the collected audio data into a format that is easy to analyze.

[0065] "Communication means" refers to functions and devices for encrypting processed data and transmitting it securely.

[0066] "Speech recognition means" refers to technologies and devices that convert speech data into text.

[0067] "Analysis means" refers to a device or function for analyzing text data and evaluating the likelihood of fraud.

[0068] "Detection means" refers to functions or devices that identify keywords or phrases within text and identify signs of fraud.

[0069] An "alarm system" is a device or function that issues a warning when it is determined that there is a high probability of fraud.

[0070] "Communication method" refers to the means of transmitting alarms or notifications to designated recipients.

[0071] A "generative artificial intelligence model" is an algorithm that uses machine learning to learn from past data and detect new fraud patterns.

[0072] This invention is for realizing a fraud detection system, and is particularly designed to detect signs of fraudulent activity in real time.

[0073] The terminal is equipped with a standard microphone as a means of voice acquisition to capture the user's natural conversation. It incorporates pre-processing measures that reduce background noise and clarify the voice signal using multiple noise reduction techniques. The voice data is converted to a disturbance-resistant format and then transmitted to the server via a communication method using an encryption protocol such as AES.

[0074] The server converts the audio data into text using publicly known speech recognition software as a speech recognition means. This text data is then analyzed by an analysis means that implements a generative AI model. The model refers to past fraud data and evaluates the probability of fraud, while utilizing detection means to identify keywords or phrases that indicate potential fraud.

[0075] If a transaction is highly likely to be fraudulent, the server will use its alarm system to send alerts via SMS, email, or other communication methods to pre-registered family members or monitoring organizations. The notification will include specific instructions, such as, "A suspicious transaction has been detected. Please contact the user for verification."

[0076] For example, elderly people may receive phone calls containing phrases that suggest fraud, such as "bank account verification" or "transfer request." In such cases, the system immediately detects these words and quickly sends a warning notification to the relevant family members. This allows for early intervention and can prevent potential harm.

[0077] The use of generative AI models improves responsiveness to new fraud techniques and enhances accuracy by constantly referencing the latest fraud database. An example of a prompt message is: "If a suspicious situation is detected, please explain how to respond quickly."

[0078] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0079] Step 1:

[0080] The device uses a voice acquisition method to collect the user's conversation using a microphone.

[0081] Input: User's voice

[0082] Processing: Capture the collected audio as digital data.

[0083] Output: Digital audio data

[0084] Step 2:

[0085] The terminal applies noise reduction technology as a pre-processing step to clarify the audio. In addition, it normalizes the volume to process the data into a stable format.

[0086] Input: Digital audio data

[0087] Processing: Noise reduction, volume normalization

[0088] Output: Preprocessed audio data

[0089] Step 3:

[0090] The terminal encrypts the pre-processed audio data using an encryption algorithm and sends it to the server via a communication method.

[0091] Input: Preprocessed audio data

[0092] Processing: AES encryption, data transmission

[0093] Output: Encrypted audio data, transmission complete.

[0094] Step 4:

[0095] The server converts the received audio data into text using speech recognition technology. Specific speech recognition software is then used to extract textual information from the audio.

[0096] Input: Encrypted audio data

[0097] Processing: Speech recognition, text conversion

[0098] Output: Text data

[0099] Step 5:

[0100] The server uses analytical tools on the text data and analyzes it with a generating AI model while referring to a database of fraudulent activities. This process identifies keywords that indicate potential fraud.

[0101] Input: Text data

[0102] Processing: Analysis using a generative AI model, detection of malicious words.

[0103] Output: Fraud probability assessment data

[0104] Step 6:

[0105] If the server determines, based on the analysis results, that there is a high probability of fraud, it will use an alerting mechanism to warn the relevant recipient. For example, it might send a detailed notification via SMS.

[0106] Input: Fraud potential assessment data

[0107] Processing: Evaluate alert conditions, send warning.

[0108] Output: Alarm notification sent successfully.

[0109] This series of steps enables the system to detect and notify users of fraudulent activity in real time, preventing them from becoming victims.

[0110] (Application Example 1)

[0111] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0112] There is a need to prevent elderly people from becoming victims of fraud through telephone or face-to-face conversations. However, constant monitoring by a third party is difficult from a privacy and feasibility standpoint. In addition, elderly people often have difficulty recognizing fraudulent activity themselves. A system is needed to address these problems and ensure the safety of the elderly.

[0113] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0114] In this invention, the server includes a device for acquiring the user's voice, a conversion means for converting the voice into data, and a communication means for securely transferring the data. This makes it possible to prevent elderly people from becoming involved in fraudulent activities and to ensure their safety.

[0115] A "device for acquiring user voices" is a device used to effectively collect ambient sounds and acquire target voice data.

[0116] A "conversion means for converting audio to data" is a mechanism for converting acquired audio signals into data in a format that is easy to process.

[0117] "Communication methods for securely transferring data" refer to means that incorporate technologies and protocols for encrypting and securely transmitting converted data.

[0118] "Recognition means for converting data into text information" refers to a device or mechanism that incorporates speech recognition technology to convert speech or speech data into text information.

[0119] "An evaluation method that analyzes text information and assesses the likelihood of fraudulent activity" refers to a technology that uses text information to determine the likelihood of fraud using machine learning or rule-based approaches.

[0120] An "alarm system that issues a warning when fraudulent activity is suspected" is a means of alerting users or guardians when fraud is suspected, and it is a device or function that emits visual and auditory warnings.

[0121] "A display means worn by the user to visually display a warning" refers to a device worn by the user to visually display a message through a digital display.

[0122] "External communication means for sending notifications to family or caregivers" refers to a communication device that has the function of sending notifications to family or caregivers in remote locations, triggered by an alarm.

[0123] To implement this invention, it is necessary to combine a smart device worn by the user with a system that includes a cloud server for processing data. The purpose of this system is to ensure user safety by acquiring voice data in real time and determining the possibility of fraudulent activity.

[0124] The server receives voice data transmitted from the terminal and converts it into text information using a speech recognition engine. A highly accurate speech recognition service such as Google® Speech-to-Text is recommended for speech recognition. The converted text information is then analyzed to assess the risk of fraudulent activity. This process can utilize machine learning models to perform pattern recognition by referencing past fraud data. If the assessment determines a high risk, an alert is issued, and a notification is sent to the designated contact. This notification is sent via SMS or push notification to enable a quick response.

[0125] The device is equipped with a highly sensitive microphone that can clearly capture conversations around the user. The device temporarily records this audio data, performs noise reduction and volume normalization processing, and then securely transmits the data to the server. To enhance communication security during data transfer, it is desirable to utilize encryption technology.

[0126] If a user is wearing a smart device, a warning will be displayed on the device's screen when a risk is detected. This visual alert allows the user to immediately recognize the possibility of fraud. For example, if an elderly person is approached while doing their daily shopping and becomes a target of fraud, the system can detect the risk and issue a warning, preventing them from becoming a victim.

[0127] An example of a prompt using a generative AI model is: "Design an AI that protects seniors from fraud through real-time voice analysis. Include a process that uses a microphone and smart display to detect conversations suggestive of fraud, display a warning, and notify family members."

[0128] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0129] Step 1:

[0130] The device acquires the user's voice through a microphone. The input audio signal is captured as digital data, and noise reduction and volume normalization are performed. This process results in the output of clear and easily processable audio data.

[0131] Step 2:

[0132] The terminal encrypts the pre-processed audio data and transmits it to the server via a communication method. Algorithms such as AES are used for encryption to ensure data security. This allows the data to be transferred to the server while protected from unauthorized access.

[0133] Step 3:

[0134] The server inputs the received audio data into a speech recognition engine and converts it into text information. Using speech recognition services such as Google Speech-to-Text, the system converts the audio signal into string data, obtaining human-readable text information.

[0135] Step 4:

[0136] The server uses a machine learning model to analyze the converted text information. It refers to a database of past fraud cases and evaluates whether the text data contains patterns that suggest fraud. If it determines that there is a possibility of fraud, it outputs the corresponding risk score.

[0137] Step 5:

[0138] The server issues a warning through an alarm system if the risk score exceeds a certain threshold. Specifically, it displays a warning message on the display of a smart device worn by the user via an interface. It also sends a notification to a designated contact using an external communication method. The notification is sent quickly as an SMS, email, or push notification.

[0139] Step 6:

[0140] Users can recognize the risk of fraud by checking the warning displayed on their smart device's screen. As actual action, users can take appropriate measures, such as contacting family members.

[0141] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0142] This invention provides a fraud detection system that monitors users' voices in real time, detects potential fraud, and analyzes the users' emotional states. This system consists of multiple devices and technologies for collecting, analyzing, recognizing emotions, and detecting anomalies in voice data.

[0143] The device has a high-performance microphone to capture the user's natural conversation, and the acquired audio data is clarified through noise reduction. After preprocessing, this collected audio data is encrypted and transmitted to the server using a communication device.

[0144] The server converts the received audio data into text data using a speech recognition device. Next, an analysis device analyzes this text data to identify potentially fraudulent phrases and contexts. In addition, an emotion engine analyzes the user's emotional state from the audio data to improve the accuracy of the fraud assessment.

[0145] The emotion engine recognizes in real time when a user is experiencing emotional states such as anxiety, tension, or excitement, and feeds this information back into the analysis system's evaluation, helping to more accurately determine the likelihood of fraud. For example, if a user is clearly experiencing anxiety or tension, this information may be added to the trigger conditions for fraud alerts.

[0146] When a potential scam is detected, the server activates an alarm system and quickly sends a warning to family members or law enforcement agencies. This notification is typically delivered via SMS, email, or a dedicated app. For example, if an elderly person receives a suspicious phone call and becomes anxious, the system's emotion engine can detect this anxiety and send a scam alert earlier than usual.

[0147] This system can adapt to the latest fraud techniques by continuously updating its machine learning model based on past fraud data, and functions as a powerful tool to protect the elderly from fraud. This makes it possible to prevent damage from fraudulent activities.

[0148] The following describes the processing flow.

[0149] Step 1:

[0150] The device collects the user's conversation in real time via a microphone. During this process, noise reduction technology is applied to improve the quality of the audio data.

[0151] Step 2:

[0152] The device converts the collected audio data into a format that is easy to process. This process includes compressing and standardizing the audio data.

[0153] Step 3:

[0154] The terminal encrypts the processed voice data for secure transmission and sends it to the server via a communication device.

[0155] Step 4:

[0156] The server converts the received audio data into text data using a speech recognition device. The speech-to-text conversion process is performed using a speech recognition algorithm.

[0157] Step 5:

[0158] The server analyzes text data using an analysis device to detect keywords and phrases indicating fraudulent activity. A machine learning model using past fraud data supports this process.

[0159] Step 6:

[0160] The server uses an emotion engine to analyze the user's emotions from the voice data. If a specific emotional state is detected, the server integrates this into the fraud possibility assessment.

[0161] Step 7:

[0162] The server combines the analysis results and sentiment analysis results and issues an alert if it determines that there is a high probability of fraud. This alert is notified to the user's family or law enforcement agencies via a communication device.

[0163] Step 8:

[0164] Users can receive notifications from family members or law enforcement agencies, prompting them to take further action. Family members can contact the user, check on the situation, and intervene as needed.

[0165] (Example 2)

[0166] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0167] In modern society, fraud and scams are becoming more sophisticated, and many people are at risk of becoming victims. In particular, some scammers who are sensitive to emotional nuances not only use language and tone, but also appeal to people's emotions to carry out their fraudulent activities. However, conventional fraud detection systems have a problem in that they cannot adequately recognize and analyze these emotionally related elements, and there is a possibility that fraud will be overlooked.

[0168] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0169] In this invention, the server includes means for acquiring the user's voice, conversion means for converting the voice into a format that is easy to process, and emotion analysis means for analyzing the emotional state from the acoustic data and improving the accuracy of evaluating the possibility of fraud. This makes it possible to recognize suspicious emotional states associated with fraudulent activity in real time and to track the possibility of fraud more accurately.

[0170] "Means for acquiring user speech" refers to devices and technologies for accurately and effectively collecting the voices emitted by users.

[0171] "Conversion means for converting to an easily processable format" refers to devices and technologies that prepare collected audio data into an optimal data format so that subsequent analysis and interpretation can be easily performed.

[0172] "Transmission means" refers to technologies and devices for securely and quickly transmitting data to other devices or servers.

[0173] "Acoustic analysis means" refers to technologies and devices that analyze audio data and convert its content into textual information.

[0174] "Analysis means" refers to technologies and devices that analyze textual information to identify phrases and contexts that can detect fraudulent activity.

[0175] "Emotional analysis means" refers to analytical techniques and devices that analyze the emotional state contained in speech and use that analysis to enhance its effectiveness.

[0176] "Alarming measures" refer to devices or technologies that immediately send an alert to relevant parties when a potential fraud is detected.

[0177] A "generative AI model" refers to an algorithm or technology that utilizes artificial intelligence technology to generate new insights and make decisions based on information learned from past data.

[0178] This invention provides a specific embodiment of a system that monitors users' voices in real time and detects potential fraudulent activity or deception. This system is realized by combining the following technologies and means.

[0179] First, the device is equipped with a high-performance microphone, which allows for accurate capture of the user's voice. This microphone utilizes noise reduction technology to eliminate ambient noise, resulting in clearer voice data. This ensures reliable collection of user voice data even in noisy environments such as cafes.

[0180] The voice data collected by the device is pre-processed and then transmitted to the server in an encrypted state via a communication method. This encrypted communication ensures the privacy and security of the data.

[0181] The server analyzes the received audio data using a speech recognition device and converts it into text format. This text data is then analyzed using an analysis device to identify potentially fraudulent phrases and contexts, and further evaluates the likelihood of fraud based on the user's emotional state extracted from the audio through sentiment analysis.

[0182] The specific analysis utilizes an emotion engine to determine whether the user is experiencing anxiety, tension, excitement, or other states of mind, and this information is fed back into the fraud detection process. For example, if an elderly person receives a suspicious phone call and is feeling tense, the system will detect this tension and issue an alert earlier than usual.

[0183] Ultimately, if a fraud is deemed highly likely, the server will use alerting mechanisms to quickly notify family members and law enforcement agencies. This notification will be sent via SMS, email, and a dedicated app, enabling a swift response.

[0184] Furthermore, this system utilizes a generative AI model based on past fraud data, enabling it to respond to new fraud methods. An example of a prompt message to the generative AI model is: "Convert the content of suspicious phone calls received by elderly people into text format and detect the possibility of fraud. Also, analyze the emotional state from the voice and incorporate this into the decision to issue a fraud alert." In this way, the system constantly learns the latest information and takes the most appropriate action.

[0185] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0186] Step 1:

[0187] The device uses a high-performance microphone to acquire the user's voice. Noise reduction is performed to remove ambient noise, resulting in clear audio data. The input to this process is the user's voice, and the output is the noise-reduced audio data. Specifically, the system analyzes and filters noise in real time.

[0188] Step 2:

[0189] The terminal performs preprocessing on the acquired audio data. This preprocessing includes volume normalization and sample data formatting to improve data quality. The input is audio data with noise removed, and the output is preprocessed audio data. Specifically, it performs acoustic signal analysis and data format conversion.

[0190] Step 3:

[0191] The terminal encrypts the pre-processed audio data and sends it to the server. At this stage, encryption technology is used to ensure the data is transmitted securely. The input is pre-processed audio data, and the output is encrypted data. Specifically, the data is protected using an encryption algorithm.

[0192] Step 4:

[0193] The server decrypts the received encrypted audio data and converts it to text using a speech recognition device. This conversion process utilizes AI technology to materialize the audio data as textual information. The input is the decrypted audio data, and the output is text data. Specifically, it recognizes words and sentences from the audio and converts them into strings.

[0194] Step 5:

[0195] The server analyzes text data using an analysis device to identify potentially fraudulent phrases. This analysis uses a generative AI model to evaluate features associated with fraud. The input is text data, and the output is an evaluation of the likelihood of fraud. Specifically, it involves applying a pre-trained fraud detection algorithm.

[0196] Step 6:

[0197] The server analyzes the user's emotional state using emotion analysis techniques based on the audio data. Here, AI extracts emotional patterns from the audio, providing supplementary information for fraud assessment. The input is audio data (including text), and the output is the emotion analysis result. Specifically, the system estimates emotions from tone and word choice.

[0198] Step 7:

[0199] If a fraudulent activity is deemed highly likely, the server will immediately alert family members and law enforcement agencies using its alarm system. This alert will be sent via SMS or email as needed. The input is the result of fraud assessment and sentiment analysis, and the output is the notification action. Specifically, an automated notification will be sent to emergency contacts.

[0200] (Application Example 2)

[0201] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0202] In modern society, vulnerable individuals, particularly the elderly, are at increased risk of becoming victims of fraudulent telephone scams. However, there are limited means to detect these scams, which are conducted via voice, in real time and prevent victims from becoming victims. This invention aims to prevent victims from becoming victims of fraud by quickly and accurately detecting such fraudulent activities and providing necessary notifications.

[0203] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0204] In this invention, the server includes a sound receiving means for acquiring the user's voice, a sound processing means for pre-processing the voice, and a transmission means for encrypting and transmitting the pre-processed voice information. This makes it possible to protect users from fraudulent telephone scams.

[0205] A "sound receiving device" is a device used to accurately acquire the user's voice.

[0206] "Speech processing means" refers to the techniques and processes used to preprocess acquired speech into a format that is easy to analyze.

[0207] A "transmission means" is a device that provides the function of securely encrypting pre-processed audio information and transmitting it to other systems or servers.

[0208] "Speech recognition means" refers to technologies and devices for converting speech information into text information.

[0209] "Analysis means" refers to processes and devices used to evaluate the possibility of fraudulent activity based on textual information.

[0210] "Emotional analysis methods" refer to technologies and devices used to evaluate a user's psychological state from voice data.

[0211] "Alarm generation means" refers to devices or functions that issue notifications when there is a high probability of fraudulent activity.

[0212] A "communication element" is a component used to transmit alarms and notifications to remote recipients.

[0213] "Machine learning techniques" are algorithms and processes used to evaluate the likelihood of fraudulent activity based on data from past fraudulent activities and to improve the system.

[0214] To realize this invention, the system operates based on the following configuration. First, the terminal acquires the user's voice using a high-performance microphone. The acquired voice is pre-processed by the terminal's voice processing means to reduce noise. This pre-processed voice data is encrypted by the terminal's transmission means and sent to the server.

[0215] The server receives this data and converts the speech into text using speech recognition. The converted text is then evaluated by an analysis tool to determine if it contains potentially fraudulent content. The analysis tool utilizes machine learning techniques such as Google Cloud Speech-to-Text API and AWS® Comprehend. Furthermore, sentiment analysis evaluates the user's psychological state from the audio data; if emotions such as anxiety or tension are detected, the accuracy of the fraud assessment improves.

[0216] If fraudulent activity is deemed highly likely, the server will use alarm generation mechanisms to issue warnings and notifications. These methods include SMS, email, or a dedicated app, ensuring rapid information transmission through the communication element. For example, if a user receives a suspicious phone call, it is crucial that the system detects the unusual tension and immediately sends a notification to protect the individual.

[0217] An example of a prompt is, "Explain how the AI ​​monitors and detects fraudulent activity in real time, and how it alerts users." Entering this prompt into the AI ​​can facilitate further detailed analysis and improvements.

[0218] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0219] Step 1:

[0220] The device uses a high-performance microphone to capture the user's voice in real time. This input audio is raw, unprocessed data and contains noise and ambient sounds.

[0221] Step 2:

[0222] The terminal uses audio processing equipment to apply noise reduction to the acquired audio. In this step, unnecessary audio information is removed, and clear audio data suitable for analysis is output.

[0223] Step 3:

[0224] The terminal encodes the pre-processed audio data and transmits it to the server in an encrypted form via a transmission method. This output is protected using advanced encryption technology to prevent eavesdropping and data tampering.

[0225] Step 4:

[0226] The server receives encrypted audio data and uses speech recognition to convert the audio into text. In this step, the speech recognition algorithm outputs meaningful text from the input audio data.

[0227] Step 5:

[0228] The server uses analytical tools to analyze text information and assess the likelihood of fraudulent activity. In this step, a machine learning model is applied to output the risk of fraud based on specific keywords or phrases.

[0229] Step 6:

[0230] The server utilizes emotion analysis techniques to evaluate the user's psychological state based on voice data. If anxiety or tension is detected through this analysis, it is used as output to improve the accuracy of fraud detection.

[0231] Step 7:

[0232] If a fraudulent activity is deemed highly likely, the server will generate an alert using its alarm generation system and send notifications to the user and designated emergency contacts. This notification may be delivered via SMS, email, or a dedicated app.

[0233] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0234] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0235] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0236] [Second Embodiment]

[0237] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0238] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0239] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0240] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0241] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0242] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0243] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0244] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0245] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0246] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0247] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0248] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0249] This invention provides a fraud detection system that monitors user conversations in real time and detects potential fraud. This system mainly consists of a series of devices and methods for collecting, analyzing, and detecting anomalies in voice data.

[0250] The device is equipped with a microphone that continuously collects the user's conversation. The collected audio data is preprocessed, including noise reduction and volume normalization, and converted into a format that is easy to process. This audio data is encrypted and transmitted to the server via a communication device.

[0251] The server converts the received audio data into text using a speech recognition device. The text obtained through speech recognition is then analyzed in detail by an analysis device. Referring to past fraud data, if keywords or phrases indicating potential fraud are detected, patterns suggesting fraud are identified.

[0252] When a fraudulent activity is deemed highly likely, the server activates an alarm system and sends a notification to pre-registered family members or law enforcement agencies. This notification is sent via SMS, email, or push notification to a dedicated application. This allows for a swift response to prevent users from becoming victims of fraudulent activity.

[0253] As a concrete example, when an elderly person receives a phone call, the call is automatically monitored by the system. If phrases suggesting fraud, such as "transfer money," are heard during the call, the system detects this and immediately sends a warning notification to the family. The family can then receive this notification and contact the elderly person directly to confirm whether a problem has occurred.

[0254] Thus, this system utilizes machine learning models and constantly updates them with the latest data to adapt to new fraud methods. This makes it possible to effectively protect the elderly from fraud in real time.

[0255] The following describes the processing flow.

[0256] Step 1:

[0257] The device continuously collects ambient sound from the user's surroundings using a microphone. During this process, the device automatically performs noise reduction to ensure clear audio at an appropriate volume.

[0258] Step 2:

[0259] The terminal converts the collected voice data into a standard communication format and compresses the data as needed. This process reduces the load associated with transmitting data.

[0260] Step 3:

[0261] The device encrypts the voice data and sends it to the server via the internet connection. This ensures data security and privacy.

[0262] Step 4:

[0263] The server decrypts the received encrypted data and sends it to the speech recognition engine. The speech recognition engine converts the speech data into text data.

[0264] Step 5:

[0265] The server passes text data to an analysis device, which uses natural language processing to search for keywords and phrases that may indicate potential fraud. The analysis device uses machine learning models to detect anomalies.

[0266] Step 6:

[0267] The server evaluates the likelihood of fraud based on the analysis results. If signs of fraud are detected, it generates an appropriate alert.

[0268] Step 7:

[0269] The server sends alerts to the device or designated contacts. Notification methods include SMS, email, and notifications via a dedicated app.

[0270] Step 8:

[0271] Users receive notifications and respond according to pre-configured instructions. For example, they may be advised to immediately end a call or consult with a specific contact.

[0272] (Example 1)

[0273] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0274] In modern society, fraudulent activities are becoming increasingly sophisticated, and there is a particular problem in that the elderly and people unfamiliar with technology are often targeted. Traditional methods make it difficult to detect fraudulent activities or respond immediately, which can result in the damage being exacerbated. In response to this, there is a need to develop a system that uses voice data to detect signs of fraud in real time and issue a rapid warning.

[0275] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0276] In this invention, the server includes speech recognition means for converting audio data into text, analysis means for evaluating the likelihood of fraud, and detection means for identifying keywords or phrases that indicate the likelihood of fraud. This makes it possible to detect signs of fraud from audio in real time and issue an immediate warning before the user becomes involved in fraudulent activity.

[0277] "Voice acquisition means" refers to a device or function for continuously collecting the voice emitted by a user in real time.

[0278] "Preprocessing means" refers to a device or function that performs noise reduction or volume normalization in order to convert the collected audio data into a format that is easy to analyze.

[0279] "Communication means" refers to functions and devices for encrypting processed data and transmitting it securely.

[0280] "Speech recognition means" refers to technologies and devices that convert speech data into text.

[0281] The "analysis means" is a device or function for analyzing text data and evaluating the possibility of fraud.

[0282] The "detection means" is a function or device for identifying keywords and phrases in text and identifying signs of fraud.

[0283] The "warning means" is a device or function for issuing a warning when it is determined that the possibility of fraud is high.

[0284] The "communication method" is a means for transmitting warnings and notifications to designated recipients.

[0285] The "generate artificial intelligence model" is an algorithm for learning past data using machine learning and detecting new fraud patterns.

[0286] This invention is for realizing an illegal act detection system, and is particularly designed for detecting signs of fraud in real time.

[0287] The terminal is equipped with a general microphone as a voice acquisition means for capturing the natural conversation of the user. A preprocessing means for reducing background noise and clarifying the voice signal using multiple noise reduction technologies is incorporated. After the voice data is converted into a format resistant to interference, it is transmitted to the server by the communication means through an encryption protocol such as AES.

[0288] The server converts the voice data into text using voice recognition software known as a voice recognition means. This text data is analyzed by an analysis means implementing a generate AI model. The model refers to past illegal act data and evaluates the probability while utilizing a detection means for identifying keywords or phrases indicating the possibility of fraud.

[0289] If a transaction is highly likely to be fraudulent, the server will use its alarm system to send alerts via SMS, email, or other communication methods to pre-registered family members or monitoring organizations. The notification will include specific instructions, such as, "A suspicious transaction has been detected. Please contact the user for verification."

[0290] For example, elderly people may receive phone calls containing phrases that suggest fraud, such as "bank account verification" or "transfer request." In such cases, the system immediately detects these words and quickly sends a warning notification to the relevant family members. This allows for early intervention and can prevent potential harm.

[0291] The use of generative AI models improves responsiveness to new fraud techniques and enhances accuracy by constantly referencing the latest fraud database. An example of a prompt message is: "If a suspicious situation is detected, please explain how to respond quickly."

[0292] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0293] Step 1:

[0294] The device uses a voice acquisition method to collect the user's conversation using a microphone.

[0295] Input: User's voice

[0296] Processing: Capture the collected audio as digital data.

[0297] Output: Digital audio data

[0298] Step 2:

[0299] The terminal applies noise reduction technology as a pre-processing step to clarify the audio. In addition, it normalizes the volume to process the data into a stable format.

[0300] Input: Digital audio data

[0301] Processing: Noise removal, volume normalization

[0302] Output: Preprocessed audio data

[0303] Step 3:

[0304] The terminal encrypts the preprocessed audio data using an encryption algorithm and transmits it to the server via communication means.

[0305] Input: Preprocessed audio data

[0306] Processing: AES encryption, data transmission

[0307] Output: Encrypted audio data, transmission completed

[0308] Step 4:

[0309] The server converts the received audio data into text using speech recognition means. Specific speech recognition software is used to extract character information from the audio.

[0310] Input: Encrypted audio data

[0311] Processing: Speech recognition, text conversion

[0312] Output: Text data

[0313] Step 5:

[0314] The server uses analysis means for the text data and analyzes it with an AI model while referring to the fraud database. Through this process, keywords indicating the possibility of fraud are identified.

[0315] [[ID=6​​ Processing: Analysis using a generative AI model, detection of malicious words.

[0317] Output: Fraud probability assessment data

[0318] Step 6:

[0319] If the server determines, based on the analysis results, that there is a high probability of fraud, it will use an alerting mechanism to warn the relevant recipient. For example, it might send a detailed notification via SMS.

[0320] Input: Fraud risk assessment data

[0321] Processing: Evaluate alert conditions, send warning.

[0322] Output: Alarm notification sent successfully.

[0323] This series of steps enables the system to detect and notify users of fraudulent activity in real time, preventing them from becoming victims.

[0324] (Application Example 1)

[0325] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0326] There is a need to prevent elderly people from becoming victims of fraud through telephone or face-to-face conversations. However, constant monitoring by a third party is difficult from a privacy and feasibility standpoint. In addition, elderly people often have difficulty recognizing fraudulent activity themselves. A system is needed to address these problems and ensure the safety of the elderly.

[0327] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0328] In this invention, the server includes a device for acquiring the user's voice, a conversion means for converting the voice into data, and a communication means for securely transferring the data. This makes it possible to prevent elderly people from becoming involved in fraudulent activities and to ensure their safety.

[0329] A "device for acquiring user voices" is a device used to effectively collect ambient sounds and acquire target voice data.

[0330] A "conversion means for converting audio to data" is a mechanism for converting acquired audio signals into data in a format that is easy to process.

[0331] "Communication methods for securely transferring data" refer to means that incorporate technologies and protocols for encrypting and securely transmitting converted data.

[0332] "Recognition means for converting data into text information" refers to a device or mechanism that incorporates speech recognition technology to convert speech or speech data into text information.

[0333] "An evaluation method that analyzes text information to assess the likelihood of fraudulent activity" refers to a technology that uses text information to determine the likelihood of fraud using machine learning or rule-based approaches.

[0334] "Alert systems that issue warnings when fraudulent activity is suspected" are means of alerting users or guardians when fraud is suspected, and include devices or functions that emit visual and auditory warnings.

[0335] "A display means worn by the user to visually display a warning" refers to a device worn by the user to visually display a message through a digital display.

[0336] "External communication means for sending notifications to family or caregivers" refers to a communication device that has the function of sending notifications to family or caregivers in remote locations, triggered by an alarm.

[0337] To implement this invention, it is necessary to combine a smart device worn by the user with a system that includes a cloud server for processing data. The purpose of this system is to ensure user safety by acquiring voice data in real time and determining the possibility of fraudulent activity.

[0338] The server receives voice data transmitted from the terminal and converts it into text information using a speech recognition engine. A highly accurate speech recognition service such as Google Speech-to-Text is recommended for speech recognition. The converted text information is then analyzed to assess the risk of fraudulent activity. This process can utilize machine learning models to perform pattern recognition by referencing past fraud data. If the assessment determines a high risk, an alert is issued, and a notification is sent to the designated contact. This notification is sent via SMS or push notification to enable a quick response.

[0339] The device is equipped with a highly sensitive microphone that can clearly capture conversations around the user. The device temporarily records this audio data, performs noise reduction and volume normalization processing, and then securely transmits the data to the server. To enhance communication security during data transfer, it is desirable to utilize encryption technology.

[0340] If a user is wearing a smart device, a warning will be displayed on the device's screen when a risk is detected. This visual alert allows the user to immediately recognize the possibility of fraud. For example, if an elderly person is approached while doing their daily shopping and becomes a target of fraud, the system can detect the risk and issue a warning, preventing them from becoming a victim.

[0341] An example of a prompt using a generative AI model is: "Design an AI that protects seniors from fraud through real-time voice analysis. Include a process that uses a microphone and smart display to detect conversations suggestive of fraud, display a warning, and notify family members."

[0342] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0343] Step 1:

[0344] The device acquires the user's voice through a microphone. The input audio signal is captured as digital data, and noise reduction and volume normalization are performed. This process results in the output of clear and easily processable audio data.

[0345] Step 2:

[0346] The terminal encrypts the pre-processed audio data and transmits it to the server via a communication method. Algorithms such as AES are used for encryption to ensure data security. This allows the data to be transferred to the server while protected from unauthorized access.

[0347] Step 3:

[0348] The server inputs the received audio data into a speech recognition engine and converts it into text information. Using speech recognition services such as Google Speech-to-Text, the system converts the audio signal into string data, obtaining human-readable text information.

[0349] Step 4:

[0350] The server uses a machine learning model to analyze the converted text information. It refers to a database of past fraud cases and evaluates whether the text data contains patterns that suggest fraud. If it determines that there is a possibility of fraud, it outputs the corresponding risk score.

[0351] Step 5:

[0352] The server issues a warning through an alarm system if the risk score exceeds a certain threshold. Specifically, it displays a warning message on the display of a smart device worn by the user via an interface. It also sends a notification to a designated contact using an external communication method. The notification is sent quickly as an SMS, email, or push notification.

[0353] Step 6:

[0354] Users can recognize the risk of fraud by checking the warning displayed on their smart device's screen. As actual action, users can take appropriate measures, such as contacting family members.

[0355] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0356] This invention provides a fraud detection system that monitors users' voices in real time, detects potential fraud, and analyzes the users' emotional states. This system consists of multiple devices and technologies for collecting, analyzing, recognizing emotions, and detecting anomalies in voice data.

[0357] The device has a high-performance microphone to capture the user's natural conversation, and the acquired audio data is clarified through noise reduction. After preprocessing, this collected audio data is encrypted and transmitted to the server using a communication device.

[0358] The server converts the received audio data into text data using a speech recognition device. Next, an analysis device analyzes this text data to identify potentially fraudulent phrases and contexts. In addition, an emotion engine analyzes the user's emotional state from the audio data to improve the accuracy of the fraud assessment.

[0359] The emotion engine recognizes in real time when a user is experiencing emotional states such as anxiety, tension, or excitement, and feeds this information back into the analysis system's evaluation, helping to more accurately determine the likelihood of fraud. For example, if a user is clearly experiencing anxiety or tension, this information may be added to the trigger conditions for fraud alerts.

[0360] When a potential scam is detected, the server activates an alarm system and quickly sends a warning to family members or law enforcement agencies. This notification is typically delivered via SMS, email, or a dedicated app. For example, if an elderly person receives a suspicious phone call and becomes anxious, the system's emotion engine can detect this anxiety and send a scam alert earlier than usual.

[0361] This system can adapt to the latest fraud techniques by continuously updating its machine learning model based on past fraud data, and functions as a powerful tool to protect the elderly from fraud. This makes it possible to prevent damage from fraudulent activities.

[0362] The following describes the processing flow.

[0363] Step 1:

[0364] The device collects the user's conversation in real time via a microphone. During this process, noise reduction technology is applied to improve the quality of the audio data.

[0365] Step 2:

[0366] The device converts the collected audio data into a format that is easy to process. This process includes compressing and standardizing the audio data.

[0367] Step 3:

[0368] The terminal encrypts the processed voice data for secure transmission and sends it to the server via a communication device.

[0369] Step 4:

[0370] The server converts the received audio data into text data using a speech recognition device. The speech-to-text conversion process is performed using a speech recognition algorithm.

[0371] Step 5:

[0372] The server analyzes text data using an analysis device to detect keywords and phrases indicating fraudulent activity. A machine learning model using past fraud data supports this process.

[0373] Step 6:

[0374] The server uses an emotion engine to analyze the user's emotions from the voice data. If a specific emotional state is detected, the server integrates this into the fraud possibility assessment.

[0375] Step 7:

[0376] The server combines the analysis results and sentiment analysis results and issues an alert if it determines that there is a high probability of fraud. This alert is notified to the user's family or law enforcement agencies via a communication device.

[0377] Step 8:

[0378] Users can receive notifications from family members and law enforcement agencies, prompting them to take further action. Family members can contact the user, check on the situation, and intervene as needed.

[0379] (Example 2)

[0380] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0381] In modern society, fraud and scams are becoming more sophisticated, and many people are at risk of becoming victims. In particular, some scammers who are sensitive to emotional nuances not only use language and tone, but also appeal to people's emotions to carry out their fraudulent activities. However, conventional fraud detection systems have a problem in that they cannot adequately recognize and analyze these emotionally related elements, and there is a possibility that fraud will be overlooked.

[0382] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0383] In this invention, the server includes means for acquiring the user's voice, conversion means for converting the voice into a format that is easy to process, and emotion analysis means for analyzing the emotional state from the acoustic data and improving the accuracy of evaluating the possibility of fraud. This makes it possible to recognize suspicious emotional states associated with fraudulent activity in real time and to track the possibility of fraud more accurately.

[0384] "Means for acquiring user speech" refers to devices and technologies for accurately and effectively collecting the voices emitted by users.

[0385] "Conversion means for converting to an easily processable format" refers to devices and technologies that prepare collected audio data into an optimal data format so that subsequent analysis and interpretation can be easily performed.

[0386] "Transmission means" refers to technologies and devices for securely and quickly transmitting data to other devices or servers.

[0387] "Acoustic analysis means" refers to technologies and devices that analyze audio data and convert its content into textual information.

[0388] "Analysis means" refers to technologies and devices that analyze textual information to identify phrases and contexts that can detect fraudulent activity.

[0389] "Emotional analysis means" refers to analytical techniques and devices that analyze the emotional state contained in speech and use that analysis to enhance its effectiveness.

[0390] "Alarming measures" refer to devices or technologies that immediately send an alert to relevant parties when a potential fraud is detected.

[0391] A "generative AI model" refers to an algorithm or technology that utilizes artificial intelligence technology to generate new insights and make decisions based on information learned from past data.

[0392] This invention provides a specific embodiment of a system that monitors users' voices in real time and detects potential fraudulent activity or deception. This system is realized by combining the following technologies and means.

[0393] First, the device is equipped with a high-performance microphone, which allows for accurate capture of the user's voice. This microphone utilizes noise reduction technology to eliminate ambient noise, resulting in clearer voice data. This ensures reliable collection of user voice data even in noisy environments such as cafes.

[0394] The voice data collected by the device is pre-processed and then transmitted to the server in an encrypted state via a communication method. This encrypted communication ensures the privacy and security of the data.

[0395] The server analyzes the received audio data using a speech recognition device and converts it into text format. This text data is then analyzed using an analysis device to identify potentially fraudulent phrases and contexts, and further evaluates the likelihood of fraud based on the user's emotional state extracted from the audio through sentiment analysis.

[0396] The specific analysis utilizes an emotion engine to determine whether the user is experiencing anxiety, tension, excitement, or other states of mind, and this information is fed back into the fraud detection process. For example, if an elderly person receives a suspicious phone call and is feeling tense, the system will detect this tension and issue an alert earlier than usual.

[0397] Ultimately, if a fraud is deemed highly likely, the server will use alerting mechanisms to quickly notify family members and law enforcement agencies. This notification will be sent via SMS, email, and a dedicated app, enabling a swift response.

[0398] Furthermore, this system utilizes a generative AI model based on past fraud data, enabling it to respond to new fraud methods. An example of a prompt message to the generative AI model is: "Convert the content of suspicious phone calls received by elderly people into text format and detect the possibility of fraud. Also, analyze the emotional state from the voice and incorporate this into the decision to issue a fraud alert." In this way, the system constantly learns the latest information and takes the most appropriate action.

[0399] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0400] Step 1:

[0401] The device uses a high-performance microphone to acquire the user's voice. Noise reduction is performed to remove ambient noise, resulting in clear audio data. The input to this process is the user's voice, and the output is the noise-reduced audio data. Specifically, the system analyzes and filters noise in real time.

[0402] Step 2:

[0403] The terminal performs preprocessing on the acquired audio data. This preprocessing includes volume normalization and sample data formatting to improve data quality. The input is audio data with noise removed, and the output is preprocessed audio data. Specifically, it performs acoustic signal analysis and data format conversion.

[0404] Step 3:

[0405] The terminal encrypts the pre-processed audio data and sends it to the server. At this stage, encryption technology is used to ensure the data is transmitted securely. The input is pre-processed audio data, and the output is encrypted data. Specifically, the data is protected using an encryption algorithm.

[0406] Step 4:

[0407] The server decrypts the received encrypted audio data and converts it to text using a speech recognition device. This conversion process utilizes AI technology to materialize the audio data as textual information. The input is the decrypted audio data, and the output is text data. Specifically, it recognizes words and sentences from the audio and converts them into strings.

[0408] Step 5:

[0409] The server analyzes text data using an analysis device to identify potentially fraudulent phrases. This analysis uses a generative AI model to evaluate features associated with fraud. The input is text data, and the output is an evaluation of the likelihood of fraud. Specifically, it involves applying a pre-trained fraud detection algorithm.

[0410] Step 6:

[0411] The server analyzes the user's emotional state using emotion analysis techniques based on the audio data. Here, AI extracts emotional patterns from the audio, providing supplementary information for fraud assessment. The input is audio data (including text), and the output is the emotion analysis result. Specifically, the system estimates emotions from tone and word choice.

[0412] Step 7:

[0413] If a fraudulent activity is deemed highly likely, the server will immediately alert family members and law enforcement agencies using its alarm system. This alert will be sent via SMS or email as needed. The input is the result of fraud assessment and sentiment analysis, and the output is the notification action. Specifically, an automated notification will be sent to emergency contacts.

[0414] (Application Example 2)

[0415] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".

[0416] In modern society, vulnerable individuals, particularly the elderly, are at increased risk of becoming victims of fraudulent telephone scams. However, there are limited means to detect these scams, which are conducted via voice, in real time and prevent victims from becoming victims. This invention aims to prevent victims from becoming victims of fraud by quickly and accurately detecting such fraudulent activities and providing necessary notifications.

[0417] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0418] In this invention, the server includes a sound receiving means for acquiring the user's voice, a sound processing means for pre-processing the voice, and a transmission means for encrypting and transmitting the pre-processed voice information. This makes it possible to protect users from fraudulent telephone scams.

[0419] A "sound receiving device" is a device used to accurately acquire the user's voice.

[0420] "Speech processing means" refers to the techniques and processes used to preprocess acquired speech into a format that is easy to analyze.

[0421] A "transmission means" is a device that provides the function of securely encrypting pre-processed audio information and transmitting it to other systems or servers.

[0422] "Speech recognition means" refers to technologies and devices for converting speech information into text information.

[0423] "Analysis means" refers to processes and devices used to evaluate the possibility of fraudulent activity based on textual information.

[0424] "Emotional analysis methods" refer to technologies and devices used to evaluate a user's psychological state from voice data.

[0425] "Alarm generation means" refers to devices or functions that issue notifications when there is a high probability of fraudulent activity.

[0426] A "communication element" is a component used to transmit alarms and notifications to remote recipients.

[0427] "Machine learning techniques" are algorithms and processes used to evaluate the likelihood of fraudulent activity based on data from past fraudulent activities and to improve the system.

[0428] To realize this invention, the system operates based on the following configuration. First, the terminal acquires the user's voice using a high-performance microphone. The acquired voice is pre-processed by the terminal's voice processing means to reduce noise. This pre-processed voice data is encrypted by the terminal's transmission means and sent to the server.

[0429] The server receives this data and uses speech recognition to convert the speech into text. The converted text is then evaluated by an analysis tool to determine if it contains potentially fraudulent content. The analysis tool utilizes machine learning techniques such as Google Cloud Speech-to-Text API and AWS Comprehend. Furthermore, sentiment analysis evaluates the user's psychological state from the audio data; if emotions such as anxiety or tension are detected, the accuracy of the fraud assessment improves.

[0430] If fraudulent activity is deemed highly likely, the server will use alarm generation mechanisms to issue warnings and notifications. These methods include SMS, email, or a dedicated app, ensuring rapid information transmission through the communication element. For example, if a user receives a suspicious phone call, it is crucial that the system detects the unusual tension and immediately sends a notification to protect the individual.

[0431] An example of a prompt is, "Explain how the AI ​​monitors and detects fraudulent activity in real time, and how it alerts users." Entering this prompt into the AI ​​can facilitate further detailed analysis and improvements.

[0432] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0433] Step 1:

[0434] The device uses a high-performance microphone to capture the user's voice in real time. This input audio is raw, unprocessed data and contains noise and ambient sounds.

[0435] Step 2:

[0436] The terminal uses audio processing equipment to apply noise reduction to the acquired audio. In this step, unnecessary audio information is removed, and clear audio data suitable for analysis is output.

[0437] Step 3:

[0438] The terminal encodes the pre-processed audio data and transmits it to the server in an encrypted form via a transmission method. This output is protected using advanced encryption technology to prevent eavesdropping and data tampering.

[0439] Step 4:

[0440] The server receives encrypted audio data and uses speech recognition to convert the audio into text. In this step, the speech recognition algorithm outputs meaningful text from the input audio data.

[0441] Step 5:

[0442] The server uses analytical tools to analyze text information and assess the likelihood of fraudulent activity. In this step, a machine learning model is applied to output the risk of fraud based on specific keywords or phrases.

[0443] Step 6:

[0444] The server utilizes emotion analysis techniques to evaluate the user's psychological state based on voice data. If anxiety or tension is detected through this analysis, it is used as output to improve the accuracy of fraud detection.

[0445] Step 7:

[0446] If a fraudulent activity is deemed highly likely, the server will generate an alert using its alarm generation system and send notifications to the user and designated emergency contacts. This notification may be delivered via SMS, email, or a dedicated app.

[0447] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0448] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0449] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0450] [Third Embodiment]

[0451] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0452] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0453] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0454] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0455] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0456] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0457] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0458] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0459] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0460] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0461] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0462] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0463] This invention provides a fraud detection system that monitors user conversations in real time and detects potential fraud. This system mainly consists of a series of devices and methods for collecting, analyzing, and detecting anomalies in voice data.

[0464] The device is equipped with a microphone that continuously collects the user's conversation. The collected audio data is preprocessed, including noise reduction and volume normalization, and converted into a format that is easy to process. This audio data is encrypted and transmitted to the server via a communication device.

[0465] The server converts the received audio data into text using a speech recognition device. The text obtained through speech recognition is then analyzed in detail by an analysis device. Referring to past fraud data, if keywords or phrases indicating potential fraud are detected, patterns suggesting fraud are identified.

[0466] When a fraudulent activity is deemed highly likely, the server activates an alarm system and sends a notification to pre-registered family members or law enforcement agencies. This notification is sent via SMS, email, or push notification to a dedicated application. This allows for a swift response to prevent users from becoming victims of fraudulent activity.

[0467] As a concrete example, when an elderly person receives a phone call, the call is automatically monitored by the system. If phrases suggesting fraud, such as "transfer money," are heard during the call, the system detects this and immediately sends a warning notification to the family. The family can then receive this notification and contact the elderly person directly to confirm whether a problem has occurred.

[0468] Thus, this system utilizes machine learning models and constantly updates them with the latest data to adapt to new fraud methods. This makes it possible to effectively protect the elderly from fraud in real time.

[0469] The following describes the processing flow.

[0470] Step 1:

[0471] The device continuously collects ambient sound from the user's surroundings using a microphone. During this process, the device automatically performs noise reduction to ensure clear audio at an appropriate volume.

[0472] Step 2:

[0473] The terminal converts the collected voice data into a standard communication format and compresses the data as needed. This process reduces the load associated with transmitting data.

[0474] Step 3:

[0475] The device encrypts the voice data and sends it to the server via the internet connection. This ensures data security and privacy.

[0476] Step 4:

[0477] The server decrypts the received encrypted data and sends it to the speech recognition engine. The speech recognition engine converts the speech data into text data.

[0478] Step 5:

[0479] The server passes text data to an analysis device, which uses natural language processing to search for keywords and phrases that may indicate potential fraud. The analysis device uses machine learning models to detect anomalies.

[0480] Step 6:

[0481] The server evaluates the likelihood of fraud based on the analysis results. If signs of fraud are detected, it generates an appropriate alert.

[0482] Step 7:

[0483] The server sends alerts to the device or designated contacts. Notification methods include SMS, email, and notifications via a dedicated app.

[0484] Step 8:

[0485] Users receive notifications and respond according to pre-configured instructions. For example, they may be advised to immediately end a call or consult with a specific contact.

[0486] (Example 1)

[0487] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0488] In modern society, fraudulent activities are becoming increasingly sophisticated, and there is a particular problem in that the elderly and people unfamiliar with technology are often targeted. Traditional methods make it difficult to detect fraudulent activities or respond immediately, which can result in the damage being exacerbated. In response to this, there is a need to develop a system that uses voice data to detect signs of fraud in real time and issue a rapid warning.

[0489] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0490] In this invention, the server includes speech recognition means for converting audio data into text, analysis means for evaluating the likelihood of fraud, and detection means for identifying keywords or phrases that indicate the likelihood of fraud. This makes it possible to detect signs of fraud from audio in real time and issue an immediate warning before the user becomes involved in fraudulent activity.

[0491] "Voice acquisition means" refers to a device or function for continuously collecting the voice emitted by a user in real time.

[0492] "Preprocessing means" refers to a device or function that performs noise reduction or volume normalization in order to convert the collected audio data into a format that is easy to analyze.

[0493] "Communication means" refers to functions and devices for encrypting processed data and transmitting it securely.

[0494] "Speech recognition means" refers to technologies and devices that convert speech data into text.

[0495] "Analysis means" refers to a device or function for analyzing text data and evaluating the likelihood of fraud.

[0496] "Detection means" refers to functions or devices that identify keywords or phrases within text and identify signs of fraud.

[0497] An "alarm system" is a device or function that issues a warning when it is determined that there is a high probability of fraud.

[0498] "Communication method" refers to the means of transmitting alarms or notifications to designated recipients.

[0499] A "generative artificial intelligence model" is an algorithm that uses machine learning to learn from past data and detect new fraud patterns.

[0500] This invention is for realizing a fraud detection system, and is particularly designed to detect signs of fraudulent activity in real time.

[0501] The terminal is equipped with a standard microphone as a means of voice acquisition to capture the user's natural conversation. It incorporates pre-processing measures that reduce background noise and clarify the voice signal using multiple noise reduction techniques. The voice data is converted to a disturbance-resistant format and then transmitted to the server via a communication method using an encryption protocol such as AES.

[0502] The server converts the audio data into text using publicly known speech recognition software as a speech recognition means. This text data is then analyzed by an analysis means that implements a generative AI model. The model refers to past fraud data and evaluates the probability of fraud, while utilizing detection means to identify keywords or phrases that indicate potential fraud.

[0503] If a transaction is highly likely to be fraudulent, the server will use its alarm system to send alerts via SMS, email, or other communication methods to pre-registered family members or monitoring organizations. The notification will include specific instructions, such as, "A suspicious transaction has been detected. Please contact the user for verification."

[0504] For example, elderly people may receive phone calls containing phrases that suggest fraud, such as "bank account verification" or "transfer request." In such cases, the system immediately detects these words and quickly sends a warning notification to the relevant family members. This allows for early intervention and can prevent potential harm.

[0505] The use of generative AI models improves responsiveness to new fraud techniques and enhances accuracy by constantly referencing the latest fraud database. An example of a prompt message is: "If a suspicious situation is detected, please explain how to respond quickly."

[0506] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0507] Step 1:

[0508] The device uses a voice acquisition method to collect the user's conversation using a microphone.

[0509] Input: User's voice

[0510] Processing: Capture the collected audio as digital data.

[0511] Output: Digital audio data

[0512] Step 2:

[0513] The terminal applies noise reduction technology as a pre-processing step to clarify the audio. In addition, it normalizes the volume to process the data into a stable format.

[0514] Input: Digital audio data

[0515] Processing: Noise reduction, volume normalization

[0516] Output: Preprocessed audio data

[0517] Step 3:

[0518] The terminal encrypts the pre-processed audio data using an encryption algorithm and transmits it to the server via a communication method.

[0519] Input: Preprocessed audio data

[0520] Processing: AES encryption, data transmission

[0521] Output: Encrypted audio data, transmission complete.

[0522] Step 4:

[0523] The server converts the received audio data into text using speech recognition technology. Specific speech recognition software is then used to extract textual information from the audio.

[0524] Input: Encrypted audio data

[0525] Processing: Speech recognition, text conversion

[0526] Output: Text data

[0527] Step 5:

[0528] The server uses analytical tools on the text data and analyzes it with a generating AI model while referring to a database of fraudulent activities. This process identifies keywords that indicate potential fraud.

[0529] Input: Text data

[0530] Processing: Analysis using a generative AI model, detection of malicious words.

[0531] Output: Fraud probability assessment data

[0532] Step 6:

[0533] If the server determines, based on the analysis results, that there is a high probability of fraud, it will use an alerting mechanism to warn the relevant recipient. For example, it might send a detailed notification via SMS.

[0534] Input: Fraud risk assessment data

[0535] Processing: Evaluate alert conditions, send warning.

[0536] Output: Alarm notification sent successfully.

[0537] This series of steps enables the system to detect and notify users of fraudulent activity in real time, preventing them from becoming victims.

[0538] (Application Example 1)

[0539] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0540] There is a need to prevent elderly people from becoming victims of fraud through telephone or face-to-face conversations. However, constant monitoring by a third party is difficult from a privacy and feasibility standpoint. In addition, elderly people often have difficulty recognizing fraudulent activity themselves. A system is needed to address these problems and ensure the safety of the elderly.

[0541] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0542] In this invention, the server includes a device for acquiring the user's voice, a conversion means for converting the voice into data, and a communication means for securely transferring the data. This makes it possible to prevent elderly people from becoming involved in fraudulent activities and to ensure their safety.

[0543] A "device for acquiring user voices" is a device used to effectively collect ambient sounds and acquire target voice data.

[0544] A "conversion means for converting audio to data" is a mechanism for converting acquired audio signals into data in a format that is easy to process.

[0545] "Communication methods for securely transferring data" refer to means that incorporate technologies and protocols for encrypting and securely transmitting converted data.

[0546] "Recognition means for converting data into text information" refers to a device or mechanism that incorporates speech recognition technology to convert speech or speech data into text information.

[0547] "An evaluation method that analyzes text information to assess the likelihood of fraudulent activity" refers to a technology that uses text information to determine the likelihood of fraud using machine learning or rule-based approaches.

[0548] "Alert systems that issue warnings when fraudulent activity is suspected" are means of alerting users or guardians when fraud is suspected, and include devices or functions that emit visual and auditory warnings.

[0549] "A display means worn by the user to visually display a warning" refers to a device worn by the user to visually display a message through a digital display.

[0550] "External communication means for sending notifications to family or caregivers" refers to a communication device that has the function of sending notifications to family or caregivers in remote locations, triggered by an alarm.

[0551] To implement this invention, it is necessary to combine a smart device worn by the user with a system that includes a cloud server for processing data. The purpose of this system is to ensure user safety by acquiring voice data in real time and determining the possibility of fraudulent activity.

[0552] The server receives voice data transmitted from the terminal and converts it into text information using a speech recognition engine. A highly accurate speech recognition service such as Google Speech-to-Text is recommended for speech recognition. The converted text information is then analyzed to assess the risk of fraudulent activity. This process can utilize machine learning models to perform pattern recognition by referencing past fraud data. If the assessment determines a high risk, an alert is issued, and a notification is sent to the designated contact. This notification is sent via SMS or push notification to enable a quick response.

[0553] The device is equipped with a highly sensitive microphone that can clearly capture conversations around the user. The device temporarily records this audio data, performs noise reduction and volume normalization processing, and then securely transmits the data to the server. To enhance communication security during data transfer, it is desirable to utilize encryption technology.

[0554] If a user is wearing a smart device, a warning will be displayed on the device's screen when a risk is detected. This visual alert allows the user to immediately recognize the possibility of fraud. For example, if an elderly person is approached while doing their daily shopping and becomes a target of fraud, the system can detect the risk and issue a warning, preventing them from becoming a victim.

[0555] An example of a prompt using a generative AI model is: "Design an AI that protects seniors from fraud through real-time voice analysis. Include a process that uses a microphone and smart display to detect conversations suggestive of fraud, display a warning, and notify family members."

[0556] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0557] Step 1:

[0558] The device acquires the user's voice through a microphone. The input audio signal is captured as digital data, and noise reduction and volume normalization are performed. This process results in the output of clear and easily processable audio data.

[0559] Step 2:

[0560] The terminal encrypts the pre-processed audio data and transmits it to the server via a communication method. Algorithms such as AES are used for encryption to ensure data security. This allows the data to be transferred to the server while protected from unauthorized access.

[0561] Step 3:

[0562] The server inputs the received audio data into a speech recognition engine and converts it into text information. Using speech recognition services such as Google Speech-to-Text, the system converts the audio signal into string data, obtaining human-readable text information.

[0563] Step 4:

[0564] The server uses a machine learning model to analyze the converted text information. It refers to a database of past fraud cases and evaluates whether the text data contains patterns that suggest fraud. If it determines that there is a possibility of fraud, it outputs the corresponding risk score.

[0565] Step 5:

[0566] The server issues a warning through an alarm system if the risk score exceeds a certain threshold. Specifically, it displays a warning message on the display of a smart device worn by the user via an interface. It also sends a notification to a designated contact using an external communication method. The notification is sent quickly as an SMS, email, or push notification.

[0567] Step 6:

[0568] Users can recognize the risk of fraud by checking the warning displayed on their smart device's screen. As actual action, users can take appropriate measures, such as contacting family members.

[0569] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0570] This invention provides a fraud detection system that monitors users' voices in real time, detects potential fraud, and analyzes the users' emotional states. This system consists of multiple devices and technologies for collecting, analyzing, recognizing emotions, and detecting anomalies in voice data.

[0571] The device has a high-performance microphone to capture the user's natural conversation, and the acquired audio data is clarified through noise reduction. After preprocessing, this collected audio data is encrypted and transmitted to the server using a communication device.

[0572] The server converts the received audio data into text data using a speech recognition device. Next, an analysis device analyzes this text data to identify potentially fraudulent phrases and contexts. In addition, an emotion engine analyzes the user's emotional state from the audio data to improve the accuracy of the fraud assessment.

[0573] The emotion engine recognizes in real time when a user is experiencing emotional states such as anxiety, tension, or excitement, and feeds this information back into the analysis system's evaluation, helping to more accurately determine the likelihood of fraud. For example, if a user is clearly experiencing anxiety or tension, this information may be added to the trigger conditions for fraud alerts.

[0574] When a potential scam is detected, the server activates an alarm system and quickly sends a warning to family members or law enforcement agencies. This notification is typically delivered via SMS, email, or a dedicated app. For example, if an elderly person receives a suspicious phone call and becomes anxious, the system's emotion engine can detect this anxiety and send a scam alert earlier than usual.

[0575] This system can adapt to the latest fraud techniques by continuously updating its machine learning model based on past fraud data, and functions as a powerful tool to protect the elderly from fraud. This makes it possible to prevent damage from fraudulent activities.

[0576] The following describes the processing flow.

[0577] Step 1:

[0578] The device collects the user's conversation in real time via a microphone. During this process, noise reduction technology is applied to improve the quality of the audio data.

[0579] Step 2:

[0580] The device converts the collected audio data into a format that is easy to process. This process includes compressing and standardizing the audio data.

[0581] Step 3:

[0582] The terminal encrypts the processed voice data for secure transmission and sends it to the server via a communication device.

[0583] Step 4:

[0584] The server converts the received audio data into text data using a speech recognition device. The speech-to-text conversion process is performed using a speech recognition algorithm.

[0585] Step 5:

[0586] The server analyzes text data using an analysis device to detect keywords and phrases indicating fraudulent activity. A machine learning model using past fraud data supports this process.

[0587] Step 6:

[0588] The server uses an emotion engine to analyze the user's emotions from the voice data. If a specific emotional state is detected, the server integrates this into the fraud possibility assessment.

[0589] Step 7:

[0590] The server combines the analysis results and sentiment analysis results and issues an alert if it determines that there is a high probability of fraud. This alert is notified to the user's family or law enforcement agencies via a communication device.

[0591] Step 8:

[0592] Users can receive notifications from family members and law enforcement agencies, prompting them to take further action. Family members can contact the user, check on the situation, and intervene as needed.

[0593] (Example 2)

[0594] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0595] In modern society, fraud and scams are becoming more sophisticated, and many people are at risk of becoming victims. In particular, some scammers who are sensitive to emotional nuances not only use language and tone, but also appeal to people's emotions to carry out their fraudulent activities. However, conventional fraud detection systems have a problem in that they cannot adequately recognize and analyze these emotionally related elements, and there is a possibility that fraud will be overlooked.

[0596] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0597] In this invention, the server includes means for acquiring the user's voice, conversion means for converting the voice into a format that is easy to process, and emotion analysis means for analyzing the emotional state from the acoustic data and improving the accuracy of evaluating the possibility of fraud. This makes it possible to recognize suspicious emotional states associated with fraudulent activity in real time and to track the possibility of fraud more accurately.

[0598] "Means for acquiring user speech" refers to devices and technologies for accurately and effectively collecting the voices emitted by users.

[0599] "Conversion means for converting to an easily processable format" refers to devices and technologies that prepare collected audio data into an optimal data format so that subsequent analysis and interpretation can be easily performed.

[0600] "Transmission means" refers to technologies and devices for securely and quickly transmitting data to other devices or servers.

[0601] "Acoustic analysis means" refers to technologies and devices that analyze audio data and convert its content into textual information.

[0602] "Analysis means" refers to technologies and devices that analyze textual information to identify phrases and contexts that can detect fraudulent activity.

[0603] "Emotional analysis means" refers to analytical techniques and devices that analyze the emotional state contained in speech and use that analysis to enhance its effectiveness.

[0604] "Alarming measures" refer to devices or technologies that immediately send an alert to relevant parties when a potential fraud is detected.

[0605] A "generative AI model" refers to an algorithm or technology that utilizes artificial intelligence technology to generate new insights and make decisions based on information learned from past data.

[0606] This invention provides a specific embodiment of a system that monitors users' voices in real time and detects potential fraudulent activity or deception. This system is realized by combining the following technologies and means.

[0607] First, the device is equipped with a high-performance microphone, which allows for accurate capture of the user's voice. This microphone utilizes noise reduction technology to eliminate ambient noise, resulting in clearer voice data. This ensures reliable collection of user voice data even in noisy environments such as cafes.

[0608] The voice data collected by the device is pre-processed and then transmitted to the server in an encrypted state via a communication method. This encrypted communication ensures the privacy and security of the data.

[0609] The server analyzes the received audio data using a speech recognition device and converts it into text format. This text data is then analyzed using an analysis device to identify potentially fraudulent phrases and contexts, and further evaluates the likelihood of fraud based on the user's emotional state extracted from the audio through sentiment analysis.

[0610] The specific analysis utilizes an emotion engine to determine whether the user is experiencing anxiety, tension, excitement, or other states of mind, and this information is fed back into the fraud detection process. For example, if an elderly person receives a suspicious phone call and is feeling tense, the system will detect this tension and issue an alert earlier than usual.

[0611] Ultimately, if a fraud is deemed highly likely, the server will use alerting mechanisms to quickly notify family members and law enforcement agencies. This notification will be sent via SMS, email, and a dedicated app, enabling a swift response.

[0612] Furthermore, this system utilizes a generative AI model based on past fraud data, enabling it to respond to new fraud methods. An example of a prompt message to the generative AI model is: "Convert the content of suspicious phone calls received by elderly people into text format and detect the possibility of fraud. Also, analyze the emotional state from the voice and incorporate this into the decision to issue a fraud alert." In this way, the system constantly learns the latest information and takes the most appropriate action.

[0613] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0614] Step 1:

[0615] The device uses a high-performance microphone to acquire the user's voice. Noise reduction is performed to remove ambient noise, resulting in clear audio data. The input to this process is the user's voice, and the output is the noise-reduced audio data. Specifically, the system analyzes and filters noise in real time.

[0616] Step 2:

[0617] The terminal performs preprocessing on the acquired audio data. This preprocessing includes volume normalization and sample data formatting to improve data quality. The input is audio data with noise removed, and the output is preprocessed audio data. Specifically, it performs acoustic signal analysis and data format conversion.

[0618] Step 3:

[0619] The terminal encrypts the pre-processed audio data and sends it to the server. At this stage, encryption technology is used to ensure the data is transmitted securely. The input is pre-processed audio data, and the output is encrypted data. Specifically, the data is protected using an encryption algorithm.

[0620] Step 4:

[0621] The server decrypts the received encrypted audio data and converts it to text using a speech recognition device. This conversion process utilizes AI technology to materialize the audio data as textual information. The input is the decrypted audio data, and the output is text data. Specifically, it recognizes words and sentences from the audio and converts them into strings.

[0622] Step 5:

[0623] The server analyzes text data using an analysis device to identify potentially fraudulent phrases. This analysis uses a generative AI model to evaluate features associated with fraud. The input is text data, and the output is an evaluation of the likelihood of fraud. Specifically, it involves applying a pre-trained fraud detection algorithm.

[0624] Step 6:

[0625] The server analyzes the user's emotional state using emotion analysis techniques based on the audio data. Here, AI extracts emotional patterns from the audio, providing supplementary information for fraud assessment. The input is audio data (including text), and the output is the emotion analysis result. Specifically, the system estimates emotions from tone and word choice.

[0626] Step 7:

[0627] If a fraudulent activity is deemed highly likely, the server will immediately alert family members and law enforcement agencies using its alarm system. This alert will be sent via SMS or email as needed. The input is the result of fraud assessment and sentiment analysis, and the output is the notification action. Specifically, an automated notification will be sent to emergency contacts.

[0628] (Application Example 2)

[0629] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0630] In modern society, vulnerable individuals, particularly the elderly, are at increased risk of becoming victims of fraudulent telephone scams. However, there are limited means to detect these scams, which are conducted via voice, in real time and prevent victims from becoming victims. This invention aims to prevent victims from becoming victims of fraud by quickly and accurately detecting such fraudulent activities and providing necessary notifications.

[0631] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0632] In this invention, the server includes a sound receiving means for acquiring the user's voice, a sound processing means for pre-processing the voice, and a transmission means for encrypting and transmitting the pre-processed voice information. This makes it possible to protect users from fraudulent telephone scams.

[0633] A "sound receiving device" is a device used to accurately acquire the user's voice.

[0634] "Speech processing means" refers to the techniques and processes used to preprocess acquired speech into a format that is easy to analyze.

[0635] A "transmission means" is a device that provides the function of securely encrypting pre-processed audio information and transmitting it to other systems or servers.

[0636] "Speech recognition means" refers to technologies and devices for converting speech information into text information.

[0637] "Analysis means" refers to processes and devices used to evaluate the possibility of fraudulent activity based on textual information.

[0638] "Emotional analysis methods" refer to technologies and devices used to evaluate a user's psychological state from voice data.

[0639] "Alarm generation means" refers to devices or functions that issue notifications when there is a high probability of fraudulent activity.

[0640] A "communication element" is a component used to transmit alarms and notifications to remote recipients.

[0641] "Machine learning techniques" are algorithms and processes used to evaluate the likelihood of fraudulent activity based on data from past fraudulent activities and to improve the system.

[0642] To realize this invention, the system operates based on the following configuration. First, the terminal acquires the user's voice using a high-performance microphone. The acquired voice is pre-processed by the terminal's voice processing means to reduce noise. This pre-processed voice data is encrypted by the terminal's transmission means and sent to the server.

[0643] The server receives this data and uses speech recognition to convert the speech into text. The converted text is then evaluated by an analysis tool to determine if it contains potentially fraudulent content. The analysis tool utilizes machine learning techniques such as Google Cloud Speech-to-Text API and AWS Comprehend. Furthermore, sentiment analysis evaluates the user's psychological state from the audio data; if emotions such as anxiety or tension are detected, the accuracy of the fraud assessment improves.

[0644] If fraudulent activity is deemed highly likely, the server will use alarm generation mechanisms to issue warnings and notifications. These methods include SMS, email, or a dedicated app, ensuring rapid information transmission through the communication element. For example, if a user receives a suspicious phone call, it is crucial that the system detects the unusual tension and immediately sends a notification to protect the individual.

[0645] An example of a prompt is, "Explain how the AI ​​monitors and detects fraudulent activity in real time, and how it alerts users." Entering this prompt into the AI ​​can facilitate further detailed analysis and improvements.

[0646] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0647] Step 1:

[0648] The device uses a high-performance microphone to capture the user's voice in real time. This input audio is raw, unprocessed data and contains noise and ambient sounds.

[0649] Step 2:

[0650] The terminal uses audio processing equipment to apply noise reduction to the acquired audio. In this step, unnecessary audio information is removed, and clear audio data suitable for analysis is output.

[0651] Step 3:

[0652] The terminal encodes the pre-processed audio data and transmits it to the server in an encrypted form via a transmission method. This output is protected using advanced encryption technology to prevent eavesdropping and data tampering.

[0653] Step 4:

[0654] The server receives encrypted audio data and uses speech recognition to convert the audio into text. In this step, the speech recognition algorithm outputs meaningful text from the input audio data.

[0655] Step 5:

[0656] The server uses analytical tools to analyze text information and assess the likelihood of fraudulent activity. In this step, a machine learning model is applied to output the risk of fraud based on specific keywords or phrases.

[0657] Step 6:

[0658] The server utilizes emotion analysis techniques to evaluate the user's psychological state based on voice data. If anxiety or tension is detected through this analysis, it is used as output to improve the accuracy of fraud detection.

[0659] Step 7:

[0660] If a fraudulent activity is deemed highly likely, the server will generate an alert using its alarm generation system and send notifications to the user and designated emergency contacts. This notification may be delivered via SMS, email, or a dedicated app.

[0661] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0662] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0663] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0664] [Fourth Embodiment]

[0665] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0666] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0667] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0668] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0669] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0670] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0671] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0672] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0673] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0674] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0675] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0676] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0677] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0678] This invention provides a fraud detection system that monitors user conversations in real time and detects potential fraud. This system mainly consists of a series of devices and methods for collecting, analyzing, and detecting anomalies in voice data.

[0679] The device is equipped with a microphone that continuously collects the user's conversation. The collected audio data is preprocessed, including noise reduction and volume normalization, and converted into a format that is easy to process. This audio data is encrypted and transmitted to the server via a communication device.

[0680] The server converts the received audio data into text using a speech recognition device. The text obtained through speech recognition is then analyzed in detail by an analysis device. Referring to past fraud data, if keywords or phrases indicating potential fraud are detected, patterns suggesting fraud are identified.

[0681] When a fraudulent activity is deemed highly likely, the server activates an alarm system and sends a notification to pre-registered family members or law enforcement agencies. This notification is sent via SMS, email, or push notification to a dedicated application. This allows for a swift response to prevent users from becoming victims of fraudulent activity.

[0682] As a concrete example, when an elderly person receives a phone call, the call is automatically monitored by the system. If phrases suggesting fraud, such as "transfer money," are heard during the call, the system detects this and immediately sends a warning notification to the family. The family can then receive this notification and contact the elderly person directly to confirm whether a problem has occurred.

[0683] Thus, this system utilizes machine learning models and constantly updates them with the latest data to adapt to new fraud methods. This makes it possible to effectively protect the elderly from fraud in real time.

[0684] The following describes the processing flow.

[0685] Step 1:

[0686] The device continuously collects ambient sound from the user's surroundings using a microphone. During this process, the device automatically performs noise reduction to ensure clear audio at an appropriate volume.

[0687] Step 2:

[0688] The terminal converts the collected voice data into a standard communication format and compresses the data as needed. This process reduces the load associated with transmitting data.

[0689] Step 3:

[0690] The device encrypts the voice data and sends it to the server via the internet connection. This ensures data security and privacy.

[0691] Step 4:

[0692] The server decrypts the received encrypted data and sends it to the speech recognition engine. The speech recognition engine converts the speech data into text data.

[0693] Step 5:

[0694] The server passes text data to an analysis device, which uses natural language processing to search for keywords and phrases that may indicate potential fraud. The analysis device uses machine learning models to detect anomalies.

[0695] Step 6:

[0696] The server evaluates the likelihood of fraud based on the analysis results. If signs of fraud are detected, it generates an appropriate alert.

[0697] Step 7:

[0698] The server sends alerts to the device or designated contacts. Notification methods include SMS, email, and notifications via a dedicated app.

[0699] Step 8:

[0700] Users receive notifications and respond according to pre-configured instructions. For example, they may be advised to immediately end a call or consult with a specific contact.

[0701] (Example 1)

[0702] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0703] In modern society, fraudulent activities are becoming increasingly sophisticated, and there is a particular problem in that the elderly and people unfamiliar with technology are often targeted. Traditional methods make it difficult to detect fraudulent activities or respond immediately, which can result in the damage being exacerbated. In response to this, there is a need to develop a system that uses voice data to detect signs of fraud in real time and issue a rapid warning.

[0704] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0705] In this invention, the server includes speech recognition means for converting audio data into text, analysis means for evaluating the likelihood of fraud, and detection means for identifying keywords or phrases that indicate the likelihood of fraud. This makes it possible to detect signs of fraud from audio in real time and issue an immediate warning before the user becomes involved in fraudulent activity.

[0706] "Voice acquisition means" refers to a device or function for continuously collecting the voice emitted by a user in real time.

[0707] "Preprocessing means" refers to a device or function that performs noise reduction or volume normalization in order to convert the collected audio data into a format that is easy to analyze.

[0708] "Communication means" refers to functions and devices for encrypting processed data and transmitting it securely.

[0709] "Speech recognition means" refers to technologies and devices that convert speech data into text.

[0710] "Analysis means" refers to a device or function for analyzing text data and evaluating the likelihood of fraud.

[0711] "Detection means" refers to functions or devices that identify keywords or phrases within text and identify signs of fraud.

[0712] An "alarm system" is a device or function that issues a warning when it is determined that there is a high probability of fraud.

[0713] "Communication method" refers to the means of transmitting alarms or notifications to designated recipients.

[0714] A "generative artificial intelligence model" is an algorithm that uses machine learning to learn from past data and detect new fraud patterns.

[0715] This invention is for realizing a fraud detection system, and is particularly designed to detect signs of fraudulent activity in real time.

[0716] The terminal is equipped with a standard microphone as a means of voice acquisition to capture the user's natural conversation. It incorporates pre-processing measures that reduce background noise and clarify the voice signal using multiple noise reduction techniques. The voice data is converted to a disturbance-resistant format and then transmitted to the server via a communication method using an encryption protocol such as AES.

[0717] The server converts the audio data into text using publicly known speech recognition software as a speech recognition means. This text data is then analyzed by an analysis means that implements a generative AI model. The model refers to past fraud data and evaluates the probability of fraud, while utilizing detection means to identify keywords or phrases that indicate potential fraud.

[0718] If a transaction is highly likely to be fraudulent, the server will use its alarm system to send alerts via SMS, email, or other communication methods to pre-registered family members or monitoring organizations. The notification will include specific instructions, such as, "A suspicious transaction has been detected. Please contact the user for verification."

[0719] For example, elderly people may receive phone calls containing phrases that suggest fraud, such as "bank account verification" or "transfer request." In such cases, the system immediately detects these words and quickly sends a warning notification to the relevant family members. This allows for early intervention and can prevent potential harm.

[0720] The use of generative AI models improves responsiveness to new fraud techniques and enhances accuracy by constantly referencing the latest fraud database. An example of a prompt message is: "If a suspicious situation is detected, please explain how to respond quickly."

[0721] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0722] Step 1:

[0723] The device uses a voice acquisition method to collect the user's conversation using a microphone.

[0724] Input: User's voice

[0725] Processing: Capture the collected audio as digital data.

[0726] Output: Digital audio data

[0727] Step 2:

[0728] The terminal applies noise reduction technology as a pre-processing step to clarify the audio. In addition, it normalizes the volume to process the data into a stable format.

[0729] Input: Digital audio data

[0730] Processing: Noise reduction, volume normalization

[0731] Output: Preprocessed audio data

[0732] Step 3:

[0733] The terminal encrypts the pre-processed audio data using an encryption algorithm and transmits it to the server via a communication method.

[0734] Input: Preprocessed audio data

[0735] Processing: AES encryption, data transmission

[0736] Output: Encrypted audio data, transmission complete.

[0737] Step 4:

[0738] The server converts the received audio data into text using speech recognition technology. Specific speech recognition software is then used to extract textual information from the audio.

[0739] Input: Encrypted audio data

[0740] Processing: Speech recognition, text conversion

[0741] Output: Text data

[0742] Step 5:

[0743] The server uses analytical tools on the text data and analyzes it with a generating AI model while referring to a database of fraudulent activities. This process identifies keywords that indicate potential fraud.

[0744] Input: Text data

[0745] Processing: Analysis using a generative AI model, detection of malicious words.

[0746] Output: Fraud probability assessment data

[0747] Step 6:

[0748] If the server determines, based on the analysis results, that there is a high probability of fraud, it will use an alerting mechanism to warn the relevant recipient. For example, it might send a detailed notification via SMS.

[0749] Input: Fraud risk assessment data

[0750] Processing: Evaluate alert conditions, send warning.

[0751] Output: Alarm notification sent successfully.

[0752] This series of steps enables the system to detect and notify users of fraudulent activity in real time, preventing them from becoming victims.

[0753] (Application Example 1)

[0754] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0755] There is a need to prevent elderly people from becoming victims of fraud through telephone or face-to-face conversations. However, constant monitoring by a third party is difficult from a privacy and feasibility standpoint. In addition, elderly people often have difficulty recognizing fraudulent activity themselves. A system is needed to address these problems and ensure the safety of the elderly.

[0756] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0757] In this invention, the server includes a device for acquiring the user's voice, a conversion means for converting the voice into data, and a communication means for securely transferring the data. This makes it possible to prevent elderly people from becoming involved in fraudulent activities and to ensure their safety.

[0758] A "device for acquiring user voices" is a device used to effectively collect ambient sounds and acquire target voice data.

[0759] A "conversion means for converting audio to data" is a mechanism for converting acquired audio signals into data in a format that is easy to process.

[0760] "Communication methods for securely transferring data" refer to means that incorporate technologies and protocols for encrypting and securely transmitting converted data.

[0761] "Recognition means for converting data into text information" refers to a device or mechanism that incorporates speech recognition technology to convert speech or speech data into text information.

[0762] "An evaluation method that analyzes text information to assess the likelihood of fraudulent activity" refers to a technology that uses text information to determine the likelihood of fraud using machine learning or rule-based approaches.

[0763] "Alert systems that issue warnings when fraudulent activity is suspected" are means of alerting users or guardians when fraud is suspected, and include devices or functions that emit visual and auditory warnings.

[0764] "A display means worn by the user to visually display a warning" refers to a device worn by the user to visually display a message through a digital display.

[0765] "External communication means for sending notifications to family or caregivers" refers to a communication device that has the function of sending notifications to family or caregivers in remote locations, triggered by an alarm.

[0766] To implement this invention, it is necessary to combine a smart device worn by the user with a system that includes a cloud server for processing data. The purpose of this system is to ensure user safety by acquiring voice data in real time and determining the possibility of fraudulent activity.

[0767] The server receives voice data transmitted from the terminal and converts it into text information using a speech recognition engine. A highly accurate speech recognition service such as Google Speech-to-Text is recommended for speech recognition. The converted text information is then analyzed to assess the risk of fraudulent activity. This process can utilize machine learning models to perform pattern recognition by referencing past fraud data. If the assessment determines a high risk, an alert is issued, and a notification is sent to the designated contact. This notification is sent via SMS or push notification to enable a quick response.

[0768] The device is equipped with a highly sensitive microphone that can clearly capture conversations around the user. The device temporarily records this audio data, performs noise reduction and volume normalization processing, and then securely transmits the data to the server. To enhance communication security during data transfer, it is desirable to utilize encryption technology.

[0769] If a user is wearing a smart device, a warning will be displayed on the device's screen when a risk is detected. This visual alert allows the user to immediately recognize the possibility of fraud. For example, if an elderly person is approached while doing their daily shopping and becomes a target of fraud, the system can detect the risk and issue a warning, preventing them from becoming a victim.

[0770] An example of a prompt using a generative AI model is: "Design an AI that protects seniors from fraud through real-time voice analysis. Include a process that uses a microphone and smart display to detect conversations suggestive of fraud, display a warning, and notify family members."

[0771] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0772] Step 1:

[0773] The device acquires the user's voice through a microphone. The input audio signal is captured as digital data, and noise reduction and volume normalization are performed. This process results in the output of clear and easily processable audio data.

[0774] Step 2:

[0775] The terminal encrypts the pre-processed audio data and transmits it to the server via a communication method. Algorithms such as AES are used for encryption to ensure data security. This allows the data to be transferred to the server while protected from unauthorized access.

[0776] Step 3:

[0777] The server inputs the received audio data into a speech recognition engine and converts it into text information. Using speech recognition services such as Google Speech-to-Text, the system converts the audio signal into string data, obtaining human-readable text information.

[0778] Step 4:

[0779] The server uses a machine learning model to analyze the converted text information. It refers to a database of past fraud cases and evaluates whether the text data contains patterns that suggest fraud. If it determines that there is a possibility of fraud, it outputs the corresponding risk score.

[0780] Step 5:

[0781] The server issues a warning through an alarm system if the risk score exceeds a certain threshold. Specifically, it displays a warning message on the display of a smart device worn by the user via an interface. It also sends a notification to a designated contact using an external communication method. The notification is sent quickly as an SMS, email, or push notification.

[0782] Step 6:

[0783] Users can recognize the risk of fraud by checking the warning displayed on their smart device's screen. As actual action, users can take appropriate measures, such as contacting family members.

[0784] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0785] This invention provides a fraud detection system that monitors users' voices in real time, detects potential fraud, and analyzes the users' emotional states. This system consists of multiple devices and technologies for collecting, analyzing, recognizing emotions, and detecting anomalies in voice data.

[0786] The device has a high-performance microphone to capture the user's natural conversation, and the acquired audio data is clarified through noise reduction. After preprocessing, this collected audio data is encrypted and transmitted to the server using a communication device.

[0787] The server converts the received audio data into text data using a speech recognition device. Next, an analysis device analyzes this text data to identify potentially fraudulent phrases and contexts. In addition, an emotion engine analyzes the user's emotional state from the audio data to improve the accuracy of the fraud assessment.

[0788] The emotion engine recognizes in real time when a user is experiencing emotional states such as anxiety, tension, or excitement, and feeds this information back into the analysis system's evaluation, helping to more accurately determine the likelihood of fraud. For example, if a user is clearly experiencing anxiety or tension, this information may be added to the trigger conditions for fraud alerts.

[0789] When a potential scam is detected, the server activates an alarm system and quickly sends a warning to family members or law enforcement agencies. This notification is typically delivered via SMS, email, or a dedicated app. For example, if an elderly person receives a suspicious phone call and becomes anxious, the system's emotion engine can detect this anxiety and send a scam alert earlier than usual.

[0790] This system can adapt to the latest fraud techniques by continuously updating its machine learning model based on past fraud data, and functions as a powerful tool to protect the elderly from fraud. This makes it possible to prevent damage from fraudulent activities.

[0791] The following describes the processing flow.

[0792] Step 1:

[0793] The device collects the user's conversation in real time via a microphone. During this process, noise reduction technology is applied to improve the quality of the audio data.

[0794] Step 2:

[0795] The device converts the collected audio data into a format that is easy to process. This process includes compressing and standardizing the audio data.

[0796] Step 3:

[0797] The terminal encrypts the processed voice data for secure transmission and sends it to the server via a communication device.

[0798] Step 4:

[0799] The server converts the received audio data into text data using a speech recognition device. The speech-to-text conversion process is performed using a speech recognition algorithm.

[0800] Step 5:

[0801] The server analyzes text data using an analysis device to detect keywords and phrases indicating fraudulent activity. A machine learning model using past fraud data supports this process.

[0802] Step 6:

[0803] The server uses an emotion engine to analyze the user's emotions from the voice data. If a specific emotional state is detected, the server integrates this into the fraud possibility assessment.

[0804] Step 7:

[0805] The server combines the analysis results and sentiment analysis results and issues an alert if it determines that there is a high probability of fraud. This alert is notified to the user's family or law enforcement agencies via a communication device.

[0806] Step 8:

[0807] Users can receive notifications from family members and law enforcement agencies, prompting them to take further action. Family members can contact the user, check on the situation, and intervene as needed.

[0808] (Example 2)

[0809] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0810] In modern society, fraud and scams are becoming more sophisticated, and many people are at risk of becoming victims. In particular, some scammers who are sensitive to emotional nuances not only use language and tone, but also appeal to people's emotions to carry out their fraudulent activities. However, conventional fraud detection systems have a problem in that they cannot adequately recognize and analyze these emotionally related elements, and there is a possibility that fraud will be overlooked.

[0811] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0812] In this invention, the server includes means for acquiring the user's voice, conversion means for converting the voice into a format that is easy to process, and emotion analysis means for analyzing the emotional state from the acoustic data and improving the accuracy of evaluating the possibility of fraud. This makes it possible to recognize suspicious emotional states associated with fraudulent activity in real time and to track the possibility of fraud more accurately.

[0813] "Means for acquiring user speech" refers to devices and technologies for accurately and effectively collecting the voices emitted by users.

[0814] "Conversion means for converting to an easily processable format" refers to devices and technologies that prepare collected audio data into an optimal data format so that subsequent analysis and interpretation can be easily performed.

[0815] "Transmission means" refers to technologies and devices for securely and quickly transmitting data to other devices or servers.

[0816] "Acoustic analysis means" refers to technologies and devices that analyze audio data and convert its content into textual information.

[0817] "Analysis means" refers to technologies and devices that analyze textual information to identify phrases and contexts that can detect fraudulent activity.

[0818] "Emotional analysis means" refers to analytical techniques and devices that analyze the emotional state contained in speech and use that analysis to enhance its effectiveness.

[0819] "Alarming measures" refer to devices or technologies that immediately send an alert to relevant parties when a potential fraud is detected.

[0820] A "generative AI model" refers to an algorithm or technology that utilizes artificial intelligence technology to generate new insights and make decisions based on information learned from past data.

[0821] This invention provides a specific embodiment of a system that monitors users' voices in real time and detects potential fraudulent activity or deception. This system is realized by combining the following technologies and means.

[0822] First, the device is equipped with a high-performance microphone, which allows for accurate capture of the user's voice. This microphone utilizes noise reduction technology to eliminate ambient noise, resulting in clearer voice data. This ensures reliable collection of user voice data even in noisy environments such as cafes.

[0823] The voice data collected by the device is pre-processed and then transmitted to the server in an encrypted state via a communication method. This encrypted communication ensures the privacy and security of the data.

[0824] The server analyzes the received audio data using a speech recognition device and converts it into text format. This text data is then analyzed using an analysis device to identify potentially fraudulent phrases and contexts, and further evaluates the likelihood of fraud based on the user's emotional state extracted from the audio through sentiment analysis.

[0825] The specific analysis utilizes an emotion engine to determine whether the user is experiencing anxiety, tension, excitement, or other states of mind, and this information is fed back into the fraud detection process. For example, if an elderly person receives a suspicious phone call and is feeling tense, the system will detect this tension and issue an alert earlier than usual.

[0826] Ultimately, if a fraud is deemed highly likely, the server will use alerting mechanisms to quickly notify family members and law enforcement agencies. This notification will be sent via SMS, email, and a dedicated app, enabling a swift response.

[0827] Furthermore, this system utilizes a generative AI model based on past fraud data, enabling it to respond to new fraud methods. An example of a prompt message to the generative AI model is: "Convert the content of suspicious phone calls received by elderly people into text format and detect the possibility of fraud. Also, analyze the emotional state from the voice and incorporate this into the decision to issue a fraud alert." In this way, the system constantly learns the latest information and takes the most appropriate action.

[0828] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0829] Step 1:

[0830] The device uses a high-performance microphone to acquire the user's voice. Noise reduction is performed to remove ambient noise, resulting in clear audio data. The input to this process is the user's voice, and the output is the noise-reduced audio data. Specifically, the system analyzes and filters noise in real time.

[0831] Step 2:

[0832] The terminal performs preprocessing on the acquired audio data. This preprocessing includes volume normalization and sample data formatting to improve data quality. The input is audio data with noise removed, and the output is preprocessed audio data. Specifically, it performs acoustic signal analysis and data format conversion.

[0833] Step 3:

[0834] The terminal encrypts the pre-processed audio data and sends it to the server. At this stage, encryption technology is used to ensure the data is transmitted securely. The input is pre-processed audio data, and the output is encrypted data. Specifically, the data is protected using an encryption algorithm.

[0835] Step 4:

[0836] The server decrypts the received encrypted audio data and converts it to text using a speech recognition device. This conversion process utilizes AI technology to materialize the audio data as textual information. The input is the decrypted audio data, and the output is text data. Specifically, it recognizes words and sentences from the audio and converts them into strings.

[0837] Step 5:

[0838] The server analyzes text data using an analysis device to identify potentially fraudulent phrases. This analysis uses a generative AI model to evaluate features associated with fraud. The input is text data, and the output is an evaluation of the likelihood of fraud. Specifically, it involves applying a pre-trained fraud detection algorithm.

[0839] Step 6:

[0840] The server analyzes the user's emotional state using emotion analysis techniques based on the audio data. Here, AI extracts emotional patterns from the audio, providing supplementary information for fraud assessment. The input is audio data (including text), and the output is the emotion analysis result. Specifically, the system estimates emotions from tone and word choice.

[0841] Step 7:

[0842] If a fraudulent activity is deemed highly likely, the server will immediately alert family members and law enforcement agencies using its alarm system. This alert will be sent via SMS or email as needed. The input is the result of fraud assessment and sentiment analysis, and the output is the notification action. Specifically, an automated notification will be sent to emergency contacts.

[0843] (Application Example 2)

[0844] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0845] In modern society, vulnerable individuals, particularly the elderly, are at increased risk of becoming victims of fraudulent telephone scams. However, there are limited means to detect these scams, which are conducted via voice, in real time and prevent victims from becoming victims. This invention aims to prevent victims from becoming victims of fraud by quickly and accurately detecting such fraudulent activities and providing necessary notifications.

[0846] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0847] In this invention, the server includes a sound receiving means for acquiring the user's voice, a sound processing means for pre-processing the voice, and a transmission means for encrypting and transmitting the pre-processed voice information. This makes it possible to protect users from fraudulent telephone scams.

[0848] A "sound receiving device" is a device used to accurately acquire the user's voice.

[0849] "Speech processing means" refers to the techniques and processes used to preprocess acquired speech into a format that is easy to analyze.

[0850] A "transmission means" is a device that provides the function of securely encrypting pre-processed audio information and transmitting it to other systems or servers.

[0851] "Speech recognition means" refers to technologies and devices for converting speech information into text information.

[0852] "Analysis means" refers to processes and devices used to evaluate the possibility of fraudulent activity based on textual information.

[0853] "Emotional analysis methods" refer to technologies and devices used to evaluate a user's psychological state from voice data.

[0854] "Alarm generation means" refers to devices or functions that issue notifications when there is a high probability of fraudulent activity.

[0855] A "communication element" is a component used to transmit alarms and notifications to remote recipients.

[0856] "Machine learning techniques" are algorithms and processes used to evaluate the likelihood of fraudulent activity based on data from past fraudulent activities and to improve the system.

[0857] To realize this invention, the system operates based on the following configuration. First, the terminal acquires the user's voice using a high-performance microphone. The acquired voice is pre-processed by the terminal's voice processing means to reduce noise. This pre-processed voice data is encrypted by the terminal's transmission means and sent to the server.

[0858] The server receives this data and uses speech recognition to convert the speech into text. The converted text is then evaluated by an analysis tool to determine if it contains potentially fraudulent content. The analysis tool utilizes machine learning techniques such as Google Cloud Speech-to-Text API and AWS Comprehend. Furthermore, sentiment analysis evaluates the user's psychological state from the audio data; if emotions such as anxiety or tension are detected, the accuracy of the fraud assessment improves.

[0859] If fraudulent activity is deemed highly likely, the server will use alarm generation mechanisms to issue warnings and notifications. These methods include SMS, email, or a dedicated app, ensuring rapid information transmission through the communication element. For example, if a user receives a suspicious phone call, it is crucial that the system detects the unusual tension and immediately sends a notification to protect the individual.

[0860] An example of a prompt is, "Explain how the AI ​​monitors and detects fraudulent activity in real time, and how it alerts users." Entering this prompt into the AI ​​can facilitate further detailed analysis and improvements.

[0861] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0862] Step 1:

[0863] The device uses a high-performance microphone to capture the user's voice in real time. This input audio is raw, unprocessed data and contains noise and ambient sounds.

[0864] Step 2:

[0865] The terminal uses audio processing equipment to apply noise reduction to the acquired audio. In this step, unnecessary audio information is removed, and clear audio data suitable for analysis is output.

[0866] Step 3:

[0867] The terminal encodes the pre-processed audio data and transmits it to the server in an encrypted form via a transmission method. This output is protected using advanced encryption technology to prevent eavesdropping and data tampering.

[0868] Step 4:

[0869] The server receives encrypted audio data and uses speech recognition to convert the audio into text. In this step, the speech recognition algorithm outputs meaningful text from the input audio data.

[0870] Step 5:

[0871] The server uses analytical tools to analyze text information and assess the likelihood of fraudulent activity. In this step, a machine learning model is applied to output the risk of fraud based on specific keywords or phrases.

[0872] Step 6:

[0873] The server utilizes emotion analysis techniques to evaluate the user's psychological state based on voice data. If anxiety or tension is detected through this analysis, it is used as output to improve the accuracy of fraud detection.

[0874] Step 7:

[0875] If a fraudulent activity is deemed highly likely, the server will generate an alert using its alarm generation system and send notifications to the user and designated emergency contacts. This notification may be delivered via SMS, email, or a dedicated app.

[0876] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0877] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0878] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0879] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0880] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0881] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0882] The inside of the Emotion Map 400 represents what's in your mind, while the outside represents what you're doing. Therefore, the further you go out the 400-coordinate scale, the more visible your emotions become (the more they manifest in your actions).

[0883] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0884] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0885] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0886] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0887] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0888] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0889] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0890] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0891] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0892] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0893] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0894] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0895] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0896] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0897] The following is further disclosed regarding the embodiments described above.

[0898] (Claim 1)

[0899] A device for collecting user voices,

[0900] A processing device that converts the aforementioned audio into a format that is easy to process,

[0901] A communication device for securely transmitting the processed audio data,

[0902] A speech recognition device that converts the aforementioned speech data into text,

[0903] An analytical device that analyzes the aforementioned text and evaluates the possibility of fraud,

[0904] An alarm system that sounds an alarm when it is determined that there is a high probability of fraud,

[0905] A fraud detection system that includes the above.

[0906] (Claim 2)

[0907] In the fraud detection system according to claim 1,

[0908] The alarm device is a fraud detection system equipped with communication means for sending notifications to the user's family or law enforcement agencies.

[0909] (Claim 3)

[0910] In the fraud detection system according to claim 1,

[0911] The aforementioned analysis device is a fraud detection system that uses a machine learning model to evaluate the likelihood of fraud based on past fraudulent activity data.

[0912] "Example 1"

[0913] (Claim 1)

[0914] A means for acquiring voice data to collect user voices,

[0915] Preprocessing means for converting the aforementioned audio into a format that is easy to process,

[0916] A communication means for encrypting and transmitting the audio data after the aforementioned processing,

[0917] A speech recognition means for converting the aforementioned audio data into text,

[0918] An analytical means for analyzing the aforementioned text and evaluating the possibility of fraud,

[0919] Detection methods for identifying keywords and phrases that indicate potential fraud,

[0920] An alarm system that issues a warning when it is determined that there is a high probability of fraud,

[0921] A system that includes this.

[0922] (Claim 2)

[0923] The system according to claim 1, wherein the alarm means comprises a communication method for transmitting a notification to a person related to the user or a monitoring organization.

[0924] (Claim 3)

[0925] The analysis means utilizes a generative artificial intelligence model to evaluate the likelihood of fraud based on past fraudulent activity data, as described in claim 1.

[0926] "Application Example 1"

[0927] (Claim 1)

[0928] A device for acquiring the user's voice,

[0929] A conversion means for converting the aforementioned audio into data,

[0930] A communication means for securely transferring the aforementioned data,

[0931] A recognition means for converting the aforementioned data into text information,

[0932] An evaluation means for analyzing the aforementioned text information and evaluating the possibility of fraudulent activity,

[0933] An alarm system that issues a warning when it is determined that fraudulent activity may be occurring,

[0934] A display means that is attached to the user and visually displays a warning,

[0935] External means of communication for sending notifications to family or supporters,

[0936] A system that includes this.

[0937] (Claim 2)

[0938] The system according to claim 1, wherein the display means operates based on a smart device worn by the user.

[0939] (Claim 3)

[0940] The evaluation means is a system according to claim 1 that uses a learning algorithm to evaluate the likelihood of fraud based on a past database.

[0941] "Example 2 of combining an emotion engine"

[0942] (Claim 1)

[0943] Means for acquiring the user's voice,

[0944] A conversion means for converting the aforementioned speech into a format that is easy to process,

[0945] A transmission means for securely transmitting the converted acoustic data,

[0946] The acoustic analysis means for converting the aforementioned acoustic data into textual information,

[0947] An analysis means for analyzing the aforementioned textual information and evaluating the possibility of fraud,

[0948] An emotion analysis means for analyzing emotional states from the aforementioned acoustic data and improving the accuracy of evaluating the possibility of fraud,

[0949] An alarm system that issues an alert when it is determined that there is a high probability of fraud,

[0950] A system that includes this.

[0951] (Claim 2)

[0952] The system according to claim 1, wherein the alarm means comprises communication means for transmitting a notification to a user's related party or a law enforcement agency.

[0953] (Claim 3)

[0954] The system according to claim 1, wherein the analysis means uses a generative AI model to evaluate the possibility of fraud based on past fraudulent activity data.

[0955] "Application example 2 when combining with an emotional engine"

[0956] (Claim 1)

[0957] A means for receiving the user's voice,

[0958] Audio processing means for preprocessing the aforementioned audio,

[0959] A transmission means for encrypting and transmitting the pre-processed audio information,

[0960] A speech recognition means that converts the aforementioned speech information into text information,

[0961] An analytical means for analyzing the aforementioned textual information and evaluating the possibility of fraudulent activity,

[0962] An emotion analysis means for evaluating the user's psychological state from the aforementioned audio,

[0963] An alarm generating means that issues a notification when the analysis means and the sentiment analysis means determine that there is a high probability of fraudulent activity,

[0964] A system that includes this.

[0965] (Claim 2)

[0966] The alarm generating means includes a communication element for transmitting information to the user's relatives or a public institution, according to claim 1.

[0967] (Claim 3)

[0968] The system according to claim 1, wherein the analysis means uses a machine learning method to evaluate the likelihood of fraudulent activity based on information regarding past fraudulent activity. [Explanation of Symbols]

[0969] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A device for acquiring the user's voice, A conversion means for converting the aforementioned audio into data, A communication means for securely transferring the aforementioned data, A recognition means for converting the aforementioned data into text information, An evaluation means for analyzing the aforementioned text information and evaluating the possibility of fraudulent activity, An alarm system that issues a warning when it is determined that fraudulent activity may be occurring, A display means that is attached to the user and visually displays a warning, External means of communication for sending notifications to family or supporters, A system that includes this.

2. The system according to claim 1, wherein the display means operates based on a smart device worn by the user.

3. The evaluation means is a system according to claim 1 that uses a learning algorithm to evaluate the likelihood of fraud based on a past database.