system
The system addresses manual input errors and delays by using OCR and machine learning to automate application form data entry, ensuring accuracy and efficiency with user-tailored feedback.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
AI Technical Summary
Conventional manual application form information input processes are prone to errors and time delays, leading to decreased work efficiency and increased labor costs.
A system that utilizes OCR to convert scanned application forms into digital text, extracts necessary information using machine learning and natural language processing, formats it for system registration, verifies data integrity, and automatically registers it, with optional emotion engine feedback.
Significantly reduces human error and improves operational efficiency by enabling accurate, rapid, and automated information registration with user-friendly feedback.
Smart Images

Figure 2026101224000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In the conventional manual application form information input process, input errors and time delays are likely to occur, which cause a decrease in work efficiency. Furthermore, since it leads to an increase in labor costs, it is also a problem from the perspective of cost reduction. The purpose of this invention is to solve these problems and improve work efficiency and accuracy by automating the process of accurately and quickly registering the information of the application form in the system.
Means for Solving the Problems
[0005] To achieve this objective, the present invention first acquires image data of a scanned application form and performs character recognition processing using OCR based on that image. Next, it extracts the necessary information from the recognized text data and formats it into the format required by the system. Furthermore, it provides a mechanism to check the validity of the formatted information and, if there are no problems, automatically register it in the system. By constructing a series of automated processes, including notifying the user of any deficiencies, efficient and accurate information registration is achieved.
[0006] "Image data" refers to digital data generated when paper documents such as application forms are scanned.
[0007] "Character recognition processing" is the process of analyzing character information within image data and converting it into corresponding text data.
[0008] "Text data" refers to data in which character information extracted through character recognition processing is represented in digital format.
[0009] "Means of extracting information" refers to methods and techniques for identifying and extracting necessary data points from text data.
[0010] "Means of formatting to conform to a format" refers to methods of converting or adjusting extracted information into a specific data format.
[0011] "Means of validity verification" refers to methods for confirming whether formatted information conforms to the required standards and format.
[0012] "Methods for automatically registering information into a system" refer to methods of inputting formatted and verified information into a system database or similar without human intervention.
[0013] "Means of notifying users when defects are detected" refers to communication methods or techniques used to inform users when errors or defects are found in the data. [Brief explanation of the drawing]
[0014] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.
Mode for Carrying Out the Invention
[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0016] First, the terms used in the following description will be explained.
[0017] In the following embodiments, a labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0018] In the following embodiments, a labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0019] In the following embodiments, a labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.
[0020] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0022] [First Embodiment]
[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0035] This invention provides a system for automatically registering information from application forms into a system. First, the user scans the paper application form and captures it as digital image data on a terminal. Next, the processing begins when the image data is uploaded to a server. Upon receiving the image data, the server uses an OCR (Optical Character Recognition) engine to analyze the character information within the image and convert it into text data.
[0036] From the converted text data, the server extracts the necessary information using machine learning algorithms and natural language processing techniques. The extracted information is formatted into a specific format and transformed into a structure that meets the system's registration requirements. During this process, the server verifies the validity of the data from multiple angles to check for errors and inconsistencies. If any deficiencies are found, the server immediately notifies the user and requests correction. After the user makes the necessary corrections, the data is verified again.
[0037] Finally, validated data is automatically registered directly from the server into the system. This registration process is carried out via the system's API, ensuring speed and accuracy.
[0038] For example, if a user registers an application form containing customer and address information into the system, the system automatically extracts important information from the scanned image and accurately updates it to the system. Through this process, human error is expected to be reduced, and operational efficiency will be significantly improved.
[0039] The following describes the processing flow.
[0040] Step 1:
[0041] The user scans the application form and saves the digital image data to their device. The device then uploads the scanned image data to a server via a dedicated web portal or application.
[0042] Step 2:
[0043] The server performs OCR processing on image data received from the terminal. Using the OCR engine, it analyzes handwritten and printed characters within the image and converts them into text data.
[0044] Step 3:
[0045] The server analyzes the text data generated by OCR. Using machine learning algorithms and natural language processing techniques, it identifies important information and extracts necessary details such as customer names, addresses, and phone numbers.
[0046] Step 4:
[0047] The server formats the extracted information into the system format. It transforms the information to match the required data format and item structure, maintaining format consistency.
[0048] Step 5:
[0049] The server verifies the formatted data and checks its validity. It checks the accuracy and completeness of the data items and reformats them if necessary.
[0050] Step 6:
[0051] If the server finds any issues during verification, it will send a notification to the user. The user will then review and correct the data based on the details in the notification.
[0052] Step 7:
[0053] The user resubmits the corrected data to the server. The server reformats and validates the data, confirming that the deficiencies have been resolved.
[0054] Step 8:
[0055] The server automatically registers validated data into the system. The system's registration API is used to quickly and accurately save the data to the database.
[0056] (Example 1)
[0057] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0058] Conventional manual input and registration of digital information is time-consuming and labor-intensive, and prone to human error. Furthermore, verifying the consistency of analyzed data and detecting deficiencies is difficult, and these deficiencies could potentially impact subsequent operations. This invention aims to improve operational efficiency by enabling automated registration of digital data and early detection of deficiencies.
[0059] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0060] In this invention, the server includes means for converting an input document into digital information, means for identifying text information based on the digital information, and means for obtaining attribute data from the identified text information. This enables rapid and accurate registration of digital data.
[0061] An "input document" is an information recording medium submitted in paper or electronic format.
[0062] "Digital information" refers to information that has been converted from analog data into a digital format.
[0063] "Text information" refers to string data identified or extracted from digital information.
[0064] "Attribute data" refers to information with specific properties or characteristics extracted from the aforementioned text information.
[0065] A "data structure" is a set of formats and methods for organizing data.
[0066] "Consistency" refers to the property of data that maintains consistency according to specific standards or rules.
[0067] An "information processing device" is a computing device used to collect, process, store, and output data.
[0068] An "operator" is a person or entity that operates a system or device.
[0069] A "data analysis algorithm" is a computational method or set of rules used to process data and derive useful information.
[0070] "Language processing technology" refers to the techniques and methods for analyzing and understanding natural language data.
[0071] This invention is a system designed to promote paperless operations and improve the efficiency of information registration. The system consists of a series of steps for converting input documents into digital information and automatically registering the extracted information into an information processing device.
[0072] The user first converts the paper input document into a digital image format using a scanner (a common hardware device, such as a document scanner). This step utilizes OCR (Optical Character Recognition) technology to convert the information into digital format. Examples of such software include Tesseract OCR and Adobe PDF OCR.
[0073] Next, the user uploads the digital image file generated using the device to the server. The server identifies text information based on the received digital information. Data analysis algorithms and natural language processing techniques are used here, with software such as SpaCy and Hugging Face Transformers being particularly relevant.
[0074] The server extracts attribute data such as customer information and addresses from the identified text information and adapts it to the required data structure. It then evaluates the integrity of the formatted data to check for errors or deficiencies. If deficiencies are found, the server generates a notification and provides the user with instructions for correction.
[0075] As a concrete example, there is a process where a user scans a customer registration form and uploads it to a server, thereby accurately registering the customer's name and address in the system. This invention significantly simplifies the data entry process into the information processing device and improves its accuracy.
[0076] An example of a prompt message might be: "Please scan the application form containing customer information and upload it to the server. The system will automatically identify, format, and register the data. If there are any errors, you will be notified with instructions to correct them."
[0077] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0078] Step 1:
[0079] The user scans a paper document using a scanner and saves it to the terminal as a digital image. The input is a paper application form, and the output is a digital image in PDF or JPEG format. Specifically, the user adjusts the scanner settings, setting the resolution to 300 dpi to ensure the text information is clearly legible.
[0080] Step 2:
[0081] The user uploads digital images generated using the terminal's software to the server. The input is a digital image file on the terminal, and this data is transferred to the server. The output is the receipt of the image data sent to the server. During this process, the user selects files and performs account authentication as needed.
[0082] Step 3:
[0083] The server analyzes the received digital image data using an OCR engine and converts the text information within the image into digital character data. Here, the input is digital image data, and the output is text data. Specifically, the server uses software such as Tesseract OCR to perform image analysis and character recognition.
[0084] Step 4:
[0085] The server extracts necessary attribute data from the converted text data using machine learning algorithms. The input is text data converted by OCR, and the output is attribute data such as important customer information and addresses. The server uses natural language processing techniques, such as Hugging Face Transformers, to extract meaning from the text.
[0086] Step 5:
[0087] The server formats the extracted attribute data into a predetermined data structure and ensures consistency as data for registration in the information processing device. The input is the extracted attribute data, and the output is the formatted registration data. Specifically, it checks for compatibility with the database and scrutinizes it for errors.
[0088] Step 6:
[0089] The server evaluates the integrity of the formatted data and verifies that there are no deficiencies. The input is the formatted data, and the output is the verification result. Data whose integrity has been confirmed is automatically registered in the information processing device.
[0090] Step 7:
[0091] If a flaw is detected during integrity verification, the server will send a notification to the user requesting correction. The input is the data in which the flaw was detected, and the output is the notification message. This message will include specific instructions for correction.
[0092] Step 8:
[0093] After the user makes the necessary corrections, they send the data back to the server and repeat step 6. The input is the corrected data, and the server verifies its integrity again. The output is the final registered data.
[0094] (Application Example 1)
[0095] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0096] Digitizing paper application forms and various documents presents challenges such as the time-consuming and labor-intensive process of manual data entry, as well as the high likelihood of human error. Furthermore, there is a demand for digitized information to be integrated into systems and used as readily available data, rather than simply being stored as electronic files. Ultimately, the goal is to enable users to easily digitize and manage data using smart devices and wearable devices.
[0097] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0098] In this invention, the server includes means for acquiring image information, means for performing character recognition processing based on the image information, and means for extracting necessary data from the character information obtained by the character recognition processing. This enables efficient digitalization and automated data registration using smart devices and wearable devices.
[0099] "Means for acquiring image information" refers to a device or method for capturing paper documents as images in order to convert them into a digital format.
[0100] "Means for character recognition processing" refers to technology that detects characters from acquired image information and converts them into digital text.
[0101] "Means for extracting necessary data" refers to processing methods for identifying and extracting specific information from recognized text data.
[0102] "Means of formatting data to conform to the format of an information processing device" refers to the process of converting extracted data into a format suitable for the system used by the user.
[0103] "Means of validating validity" refer to checking methods to confirm that formatted data is accurate.
[0104] "Means for automatically registering data in an information processing device" refers to a method of directly inputting and saving validated data into the system without manual operation.
[0105] "Means of supporting digitalization using smart devices and wearable devices" refers to functions that utilize advanced devices such as smartphones and smart glasses to easily and quickly digitize documents.
[0106] "Means of notifying users" refers to a system that promptly communicates information to users when data inconsistencies or errors are detected.
[0107] To implement this invention, a system is required in which a smart device or wearable device (e.g., smartphone, smart glasses) and a cloud server work together. First, the user uses the camera function of the smart device to take a picture of a paper application form or document and acquire image data. Next, the acquired image data is transmitted to the cloud server via the internet.
[0108] On the server, the Tesseract OCR engine identifies characters in image data and converts them into digital text data. The generated text data is then processed using machine learning algorithms and natural language processing techniques to extract important data. The extracted data is then formatted to conform to the format of the information processing device (computer system) being used.
[0109] The formatted data is validated. If errors or inconsistencies are detected, the server notifies the user and requests correction. After validation is confirmed, the data is automatically registered in the information processing device via API. In this process, users can easily digitize documents and register data by utilizing smart devices and wearable devices.
[0110] As a concrete example, when digitizing volunteer activity registration forms, users can take a picture of the form with their smartphones, and the information is automatically registered in the system. An example of a prompt message would be, "Please extract and format the volunteer registration information written in this image. Pay particular attention to the activity details and contact information." In this way, it becomes possible to digitize and manage documents quickly and efficiently.
[0111] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0112] Step 1:
[0113] The user takes a picture of the application form with their smart device's camera. The captured image data is saved on the device. It is then prepared as an image file for the user's next processing step.
[0114] Step 2:
[0115] The device sends the acquired image data to a cloud server via the internet. The arrival of the transmitted image data at the server triggers the start of the next process.
[0116] Step 3:
[0117] The server receives image data and performs character recognition processing using the Tesseract OCR engine. Text data is extracted from the image. The input is image data, and the output is text data.
[0118] Step 4:
[0119] The server uses the generated text data to extract the necessary data by applying natural language processing techniques and machine learning algorithms. This step involves filtering and structuring the information that is important to the process.
[0120] Step 5:
[0121] The server formats the extracted data into the format of the information processing device. The input is the extracted data, and the output is the formatted data. Data processing related to formatting is performed.
[0122] Step 6:
[0123] The server validates the formatted data and sends a notification to the user if there are any problems. Data checking and error reporting are performed at this stage.
[0124] Step 7:
[0125] Once the formatted data has been validated, it is automatically registered from the server to the information processing device via API. After registration is complete, the data becomes available for use by the user in the system.
[0126] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0127] This invention combines a system for accurately and efficiently automatically registering information from application forms with an emotion engine that recognizes user emotions. First, the user scans the paper application form as a digital image file on the terminal using a dedicated scanner. Processing begins when the user uploads this image data to the server from the terminal.
[0128] When the server acquires image data, it performs OCR processing to convert handwritten or printed text information into text data. Then, it uses machine learning algorithms and natural language processing techniques to identify the necessary information and format the extracted data according to the system's specified format. The formatted data is verified to ensure it is in the correct format and content, and if there are no errors, it is registered directly into the system.
[0129] A distinctive feature of this invention is that the server uses an emotion engine to flexibly adjust its interaction with the user. Specifically, when notifying the user, the emotion engine estimates the user's emotional state through facial recognition and voice analysis, and provides feedback in an appropriate tone and content. For example, if the server determines that the user is irritated, it can display more polite guidance or provide additional hints for solving the problem.
[0130] For example, if a user makes a mistake when scanning an application form, the server analyzes the image data and notifies the user if there are any unclear parts. If the sentiment engine detects user dissatisfaction, the server expresses its apologies and provides guidance on how to improve scan quality. This allows the user to resolve the problem smoothly and complete data registration efficiently.
[0131] The following describes the processing flow.
[0132] Step 1:
[0133] The user scans the paper application form with a scanner and imports the digital image data into the terminal. The user then uploads this image data to the server using a dedicated interface.
[0134] Step 2:
[0135] The server receives the uploaded image data. After receiving the data, the server activates its OCR engine and analyzes the text information within the image, converting it into text data. At this stage, handwritten and printed characters are recognized.
[0136] Step 3:
[0137] The server processes the text data using machine learning algorithms and extracts the necessary information. Data points required by the system, such as customer name, address, and phone number, are identified.
[0138] Step 4:
[0139] The server formats the extracted information into the system's format. The format is converted to ensure the data conforms to registration requirements and maintains a consistent data structure.
[0140] Step 5:
[0141] The server validates the data. It checks the formatted information and verifies the accuracy and conformity of the fields. If any discrepancies are found, it records the details.
[0142] Step 6:
[0143] If the server detects a flaw during validation, it activates the emotion engine to recognize the user's emotions. It understands the user's emotional state through voice input and facial expression analysis, and adjusts the content and tone of notifications accordingly.
[0144] Step 7:
[0145] The server notifies the user of the problem. Depending on the user's feelings, it sends a message containing appropriate guidance and support information, and suggests the next steps to resolve the issue.
[0146] Step 8:
[0147] The user corrects the data based on the notification. After correcting the data, it is sent back to the server, and the process of reprocessing is automatically initiated.
[0148] Step 9:
[0149] The server automatically registers the data that has been finally validated into the system. The data is then stored in the database via the registration API, completing the process.
[0150] (Example 2)
[0151] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0152] Conventional application form information registration systems experienced errors during the process of digitizing paper application forms and accurately registering them in the system, particularly in accuracy issues with handwritten character recognition and extraction of necessary information. Furthermore, notifications to users when deficiencies occurred were uniform, lacking flexible responses that took into account the user's emotional state.
[0153] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0154] In this invention, the server includes means for acquiring image information, means for performing character recognition processing, and means for extracting necessary data. This enables highly accurate character recognition and data extraction, as well as detailed notifications to the user based on their emotional state.
[0155] "Image information" refers to visual data digitized from paper application forms using a scanner.
[0156] "Optical character recognition (OCR) processing" is the process of extracting text information from image information using OCR technology.
[0157] "Text information" refers to digital character data obtained through character recognition processing.
[0158] "Required data" refers to the specific information items that the system needs to process.
[0159] "Formatting" means converting the extracted necessary data into the format required by the system.
[0160] "Validating validity" is the process of verifying whether formatted data is accurate and complete.
[0161] An "information management system" refers to a database or file system in which digital data is stored and managed.
[0162] "Emotional state" refers to a situation that demonstrates a user's psychological and emotional response.
[0163] "Adjusting notification content" means tailoring the information provided to users to their emotional state, using appropriate content and tone.
[0164] A description of embodiments for carrying out this invention will be given.
[0165] This invention aims to enable users to digitize paper application forms and effectively register them in a system. First, the user captures the paper application form as a digital image on a terminal using a dedicated scanner. A high-precision, general-purpose scanner is used.
[0166] The user then uploads the image data to the server using their device. An application on the device verifies that the file format is correct and performs any necessary conversions. The software used in this process includes common format conversion tools.
[0167] The server performs character recognition processing on the received image data using OCR software. This process utilizes an industry-standard OCR engine, converting handwritten and printed character information into text data. The server also automatically performs image noise reduction and character skew correction.
[0168] The server then uses machine learning algorithms and natural language processing techniques to extract the necessary data from the text data. The AI models used include the latest natural language processing frameworks, specifically designed to accurately recognize information such as addresses and names.
[0169] Next, the server formats the extracted data into a specified format and then verifies the validity of the formatted data. Once verification is complete, data without errors is automatically registered in the information management system.
[0170] Finally, the server is equipped with an emotion engine that analyzes image and audio data to estimate the user's emotional state. Based on this emotion estimation, notifications to the user are tailored, and the system is designed to improve the user experience, especially when problems are detected.
[0171] For example, if a user scans an application form and uploads it at an angle, the server will automatically correct the content to ensure accurate data extraction. Furthermore, if a user shows difficulty during the upload process, the emotion engine will detect this and the server will provide guidance in a gentle tone.
[0172] A concrete example of a prompt message is, "Generate a notification message for when a user has scanned an application form incorrectly, in a way that is helpful even if the user is frustrated."
[0173] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0174] Step 1:
[0175] The user scans a paper application form using a dedicated scanner. The input is the paper application form, and the output is a high-resolution digital image. The scanner converts the scanned information into image data and saves it to the device. It is especially important to check the paper's position and resolution to ensure a clear scan.
[0176] Step 2:
[0177] The user uses a terminal to upload scanned image data to the server. The input is a digital image file, and the output is the image data stored on the server. The terminal checks the extension and size of the input file and converts it to the optimal format as needed. JPEG or PNG formats are common.
[0178] Step 3:
[0179] The server performs OCR processing on the received image data. The input is a scanned image, and the output is recognized text data. The server runs OCR software to convert handwritten or printed characters in the image into digital strings. This process automatically performs noise reduction and skew correction, achieving highly accurate character recognition.
[0180] Step 4:
[0181] The server extracts necessary data from recognized text data. The input is OCR-processed text data, and the output is specific data elements that have been extracted. The server uses machine learning algorithms and natural language processing techniques to accurately identify addresses, names, dates, etc., from the text. A generative AI model is used for analyzing the data elements.
[0182] Step 5:
[0183] The server formats the extracted data into a specified format and verifies its validity. The input consists of extracted data elements, and the output is the formatted data and its verification results. The server formats the data to conform to a defined database format. The validity of each data element is checked using a cross-check function, and if there are no problems, the process moves to the next stage.
[0184] Step 6:
[0185] The server analyzes image and audio data to estimate the user's emotional state. The input is the user's image and audio data, and the output is the estimated emotional state. The server then drives an emotion engine to adjust notification content based on the recognized emotion. If the user's emotion is irritation, the notification content is set to be more friendly and include more specific instructions.
[0186] Step 7:
[0187] The server registers formatted and validated data into the information management system. Validated data is taken as input, and registered data is obtained from the database as output. Once the registration process is complete, the data becomes officially usable within the system. Security protocols are applied to data storage during this process.
[0188] (Application Example 2)
[0189] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0190] Conventional data registration systems had problems such as users not receiving appropriate feedback when scanning and digitizing application forms, causing stress and delays in the registration process. Furthermore, the inefficient interface due to notifications that did not consider the user's emotional state was also a challenge.
[0191] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0192] In this invention, the server includes a device for acquiring image information, a device for processing encoded data recognition, and a device for estimating emotions and adjusting notification content. This makes it possible to provide appropriate feedback according to the user's emotional state and to proceed with the registration process efficiently.
[0193] "Image information" refers to data stored in a format that can be perceived visually, and is acquired by devices such as scanners and digital cameras.
[0194] "Encoded data recognition processing" is the process of converting characters and symbols contained in image information into a digital format, making them machine-readable data.
[0195] "Format data" refers to a collection of information generated by encoded data recognition processing, and is digital data that has a specific format and structure.
[0196] A "control device" is a system or platform used for inputting, processing, and outputting data.
[0197] "Processing" refers to the process of changing the format and structure of data according to the purpose, and adapting it to the format required by the control device.
[0198] "Appropriateness" is a concept that indicates whether data or information meets specific standards or conditions, and it serves as a criterion for determining whether or not to register it.
[0199] A "user" is an entity that operates a system or manages information through its interface, and is usually a person.
[0200] "Emotions" represent the user's psychological state and are internal reactions estimated through facial expressions and vocal characteristics.
[0201] "Feedback" refers to information or responses that a system outputs to a user, and is usually provided in real time.
[0202] The system for realizing this application is configured as follows: First, the terminal uses a dedicated device to acquire image information such as application forms. The hardware used includes scanners and smartphone cameras. The acquired image information is sent to a cloud server.
[0203] The server uses OCR (Optical Character Recognition) software to perform encoding data recognition processing on the received image information. This process generates formatting data and extracts the relevant information. To extract the necessary character information from the image as digital data, commonly used OCR processing services such as "Google® Cloud Vision API" and "Tesseract OCR" are utilized.
[0204] Next, the server applies machine learning algorithms to extract the necessary information and process the data to conform to the control device's format. These algorithms include natural language processing techniques to formalize the information, and the formatted data is then validated for appropriateness. This verifies consistency, and data without issues is automatically registered in the system.
[0205] Simultaneously, the server integrates facial recognition technology (e.g., the OpenCV library) and voice analysis APIs (e.g., IBM Watson®) to operate a device that estimates emotions from the user's facial expressions and voice. This allows notifications and feedback to be tailored to the user's psychological state, providing appropriate guidance. For example, if the user indicates discomfort, the server will select feedback that uses gentle language to help resolve the problem.
[0206] In this way, users can operate smoothly and efficiently register information. An example of a prompt message for the generating AI model is: "Develop an algorithm that analyzes the user's current emotional state and provides optimal feedback. The five main emotions are joy, surprise, anger, sadness, and fear."
[0207] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0208] Step 1:
[0209] The terminal acquires image information, such as application forms, via a scanner used by the user or a camera on a smart device. The input is a physical application form, and the output is scanned image data. This image data is sent to a cloud server for subsequent digital processing.
[0210] Step 2:
[0211] The server performs OCR (Optical Character Recognition) processing on the received image data using OCR software. The input is image data, and the output is text data. OCR processing analyzes the characters within the image and converts them into digital character format. A general-purpose OCR tool is used for this purpose.
[0212] Step 3:
[0213] The server applies machine learning algorithms to text data, extracts necessary information, and formats it into the control unit's specified format. Input is text data generated by OCR, and output is formatted data. This process utilizes natural language processing techniques to analyze the text and identify important sections.
[0214] Step 4:
[0215] The server validates the formatted data for appropriateness. The input is the formatted data, and the output is either validated data for registration or an error message. Validation checks if the data meets the specified conditions; if there are no problems, the data is sent for registration.
[0216] Step 5:
[0217] If inappropriate data is detected, the server sends an error notification to the user and provides feedback. The input is the error message, and the output is the notification content to the user. The notification content includes instructions and advice on how to correct the operation.
[0218] Step 6:
[0219] Simultaneously, the server uses facial recognition technology and voice analysis to estimate the user's emotions. Input is real-time facial images and voice data, while output is an evaluation of the emotional state. Based on this, notifications and feedback are further refined according to the user's psychological state.
[0220] Step 7:
[0221] Users receive appropriate and reassuring guidance, enabling them to improve their operations. Input is feedback from the system, and output is the result of the user's new actions. This allows for the smooth completion of the registration process.
[0222] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0223] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0224] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0225] [Second Embodiment]
[0226] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0227] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0228] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0229] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0230] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0231] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0232] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0233] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0234] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0235] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0236] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0237] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0238] This invention provides a system for automatically registering information from application forms into a system. First, the user scans the paper application form and captures it as digital image data on a terminal. Next, the processing begins when the image data is uploaded to a server. Upon receiving the image data, the server uses an OCR (Optical Character Recognition) engine to analyze the character information within the image and convert it into text data.
[0239] From the converted text data, the server extracts the necessary information using machine learning algorithms and natural language processing techniques. The extracted information is formatted into a specific format and transformed into a structure that meets the system's registration requirements. During this process, the server verifies the validity of the data from multiple angles to check for errors and inconsistencies. If any deficiencies are found, the server immediately notifies the user and requests correction. After the user makes the necessary corrections, the data is verified again.
[0240] Finally, validated data is automatically registered directly from the server into the system. This registration process is carried out via the system's API, ensuring speed and accuracy.
[0241] For example, if a user registers an application form containing customer and address information into the system, the system automatically extracts important information from the scanned image and accurately updates it to the system. Through this process, human error is expected to be reduced, and operational efficiency will be significantly improved.
[0242] The following describes the processing flow.
[0243] Step 1:
[0244] The user scans the application form and saves the digital image data to their device. The device then uploads the scanned image data to a server via a dedicated web portal or application.
[0245] Step 2:
[0246] The server performs OCR processing on image data received from the terminal. Using the OCR engine, it analyzes handwritten and printed characters within the image and converts them into text data.
[0247] Step 3:
[0248] The server analyzes the text data generated by OCR. Using machine learning algorithms and natural language processing techniques, it identifies important information and extracts necessary details such as customer names, addresses, and phone numbers.
[0249] Step 4:
[0250] The server formats the extracted information into the system format. It transforms the information to match the required data format and item structure, maintaining format consistency.
[0251] Step 5:
[0252] The server verifies the formatted data and checks its validity. It checks the accuracy and completeness of the data items and reformats them if necessary.
[0253] Step 6:
[0254] If the server finds any issues during verification, it will send a notification to the user. The user will then review and correct the data based on the details in the notification.
[0255] Step 7:
[0256] The user resubmits the corrected data to the server. The server reformats and validates the data, confirming that the deficiencies have been resolved.
[0257] Step 8:
[0258] The server automatically registers validated data into the system. The system's registration API is used to quickly and accurately save the data to the database.
[0259] (Example 1)
[0260] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0261] Conventional manual input and registration of digital information is time-consuming and labor-intensive, and prone to human error. Furthermore, verifying the consistency of analyzed data and detecting deficiencies is difficult, and these deficiencies could potentially impact subsequent operations. This invention aims to improve operational efficiency by enabling automated registration of digital data and early detection of deficiencies.
[0262] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0263] In this invention, the server includes means for converting an input document into digital information, means for identifying text information based on the digital information, and means for obtaining attribute data from the identified text information. This enables rapid and accurate registration of digital data.
[0264] An "input document" is an information recording medium submitted in paper or electronic format.
[0265] "Digital information" refers to information that has been converted from analog data into a digital format.
[0266] "Text information" refers to string data identified or extracted from digital information.
[0267] "Attribute data" refers to information with specific properties or characteristics extracted from the aforementioned text information.
[0268] A "data structure" is a set of formats and methods for organizing data.
[0269] "Consistency" refers to the property of data that maintains consistency according to specific standards or rules.
[0270] An "information processing device" is a computing device used to collect, process, store, and output data.
[0271] An "operator" is a person or entity that operates a system or device.
[0272] A "data analysis algorithm" is a computational method or set of rules used to process data and derive useful information.
[0273] "Language processing technology" refers to the techniques and methods for analyzing and understanding natural language data.
[0274] This invention is a system designed to promote paperless operations and improve the efficiency of information registration. The system consists of a series of steps for converting input documents into digital information and automatically registering the extracted information into an information processing device.
[0275] The user first converts the paper input document into a digital image format using a scanner (a common hardware device, such as a document scanner). This step utilizes OCR (Optical Character Recognition) technology to convert the information into digital format. Examples of such software include Tesseract OCR and Adobe PDF OCR.
[0276] Next, the user uploads the digital image file generated using the device to the server. The server identifies text information based on the received digital information. Data analysis algorithms and natural language processing techniques are used here, with software such as SpaCy and Hugging Face Transformers being particularly relevant.
[0277] The server extracts attribute data such as customer information and addresses from the identified text information and adapts it to the required data structure. It then evaluates the integrity of the formatted data to check for errors or deficiencies. If deficiencies are found, the server generates a notification and provides the user with instructions for correction.
[0278] As a specific example, there is a process in which a user scans a customer registration form and uploads it to a server, and the customer name and address are accurately registered in the system. According to this invention, the data input operation to the information processing device is greatly simplified and the accuracy is also improved.
[0279] As an example of the prompt sentence, content such as "Please scan the application form containing customer information and upload it to the server. The system will automatically identify, format, and register the data. If there is an error, an instruction for correction will be notified." can be considered.
[0280] The flow of the specific process in Example 1 will be described using FIG. 11.
[0281] Step 1:
[0282] The user scans the paper-based input document with a scanner and saves it as a digital image format on the terminal. The input here is a paper application form, and the output is a digital image in PDF or JPEG format. As a specific operation, the user adjusts the scanner settings and sets the resolution to 300 dpi so that the character information can be clearly read.
[0283] Step 2:
[0284] The user uploads the digital image generated using the terminal software to the server. The input is the digital image file on the terminal, and the data is transferred to the server side. The output is the reception of the image data transmitted to the server. In this process, the user selects the file and performs account authentication if necessary.
[0285] Step 3:
[0286] The server analyzes the received digital image data with an OCR engine and converts the text information in the image into digital character data. The input here is digital image data, and the output is text data. Specifically, the server uses software such as Tesseract OCR to perform image analysis and character recognition.
[0287] Step 4:
[0288] The server extracts the necessary attribute data from the converted text data using a machine learning algorithm. The input is the text data converted by OCR, and attribute data such as important customer information and addresses are output. The server uses, for example, Hugging Face Transformers and utilizes natural language processing technology to extract meaning from the text.
[0289] Step 5:
[0290] The server formats the extracted attribute data into a predetermined data structure and ensures consistency as registration data for the information processing device. The input is the extracted attribute data, and the output is the formatted registration data. As a specific operation, it checks the compatibility with the database and examines whether there are any errors.
[0291] Step 6:
[0292] The server evaluates the consistency of the formatted data and verifies whether there are any deficiencies. The input is the formatted data, and the output is the verification result. The data with confirmed consistency is automatically registered in the information processing device.
[0293] Step 7:
[0294] If there are deficiencies in the consistency verification, the server sends a notification to the user and requests correction. The input is the data in which deficiencies are detected, and the output is a notification message. This communication includes specific instructions for correction.
[0295] Step 8:
[0296] After the user makes the necessary corrections, they send the data back to the server and repeat step 6. The input is the corrected data, and the server verifies its integrity again. The output is the final registered data.
[0297] (Application Example 1)
[0298] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0299] Digitizing paper application forms and various documents presents challenges such as the time-consuming and labor-intensive process of manual data entry, as well as the high likelihood of human error. Furthermore, there is a demand for digitized information to be integrated into systems and used as readily available data, rather than simply being stored as electronic files. Ultimately, the goal is to enable users to easily digitize and manage data using smart devices and wearable devices.
[0300] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0301] In this invention, the server includes means for acquiring image information, means for performing character recognition processing based on the image information, and means for extracting necessary data from the character information obtained by the character recognition processing. This enables efficient digitalization and automated data registration using smart devices and wearable devices.
[0302] "Means for acquiring image information" refers to a device or method for capturing paper documents as images in order to convert them into a digital format.
[0303] "Means for character recognition processing" refers to technology that detects characters from acquired image information and converts them into digital text.
[0304] The "means for extracting necessary data" is a processing method for identifying and extracting specific information from the recognized text data.
[0305] The "means for formatting to conform to the format of the information processing device" is a process of converting the extracted data into a format suitable for the system used by the user.
[0306] The "means for verifying validity" is a checking method for confirming that the formatted data is accurate.
[0307] The "means for automatically registering in the information processing device" is a method of directly inputting and storing the data whose validity has been confirmed into the system without manual operation.
[0308] The "means for assisting digitization using smart devices and wearable devices" is a function that utilizes advanced devices such as smartphones and smart glasses to easily and quickly digitize documents.
[0309] The "means for notifying the user" is a mechanism that promptly transmits such information to the user when data inconsistencies or errors are detected.
[0310] To implement this invention, a system in which a smart device or wearable device (e.g., smartphone, smart glasses) and a cloud server operate in cooperation is required. First, the user uses the camera function of the smart device to take pictures of paper applications or documents to obtain image data. Next, the obtained image data is transmitted to the cloud server via the Internet.
[0311] On the server, the Tesseract OCR engine identifies characters in image data and converts them into digital text data. The generated text data is then processed using machine learning algorithms and natural language processing techniques to extract important data. The extracted data is then formatted to conform to the format of the information processing device (computer system) being used.
[0312] The formatted data is validated. If errors or inconsistencies are detected, the server notifies the user and requests correction. After validation is confirmed, the data is automatically registered in the information processing device via API. In this process, users can easily digitize documents and register data by utilizing smart devices and wearable devices.
[0313] As a concrete example, when digitizing volunteer activity registration forms, users can take a picture of the form with their smartphones, and the information is automatically registered in the system. An example of a prompt message would be, "Please extract and format the volunteer registration information written in this image. Pay particular attention to the activity details and contact information." In this way, it becomes possible to digitize and manage documents quickly and efficiently.
[0314] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0315] Step 1:
[0316] The user takes a picture of the application form with their smart device's camera. The captured image data is saved on the device. It is then prepared as an image file for the user's next processing step.
[0317] Step 2:
[0318] The device sends the acquired image data to a cloud server via the internet. The arrival of the transmitted image data at the server triggers the start of the next process.
[0319] Step 3:
[0320] The server receives image data and performs character recognition processing using the Tesseract OCR engine. Text data is extracted from the image. The input is image data, and the output is text data.
[0321] Step 4:
[0322] The server uses the generated text data to extract the necessary data by applying natural language processing techniques and machine learning algorithms. This step involves filtering and structuring the information that is important to the process.
[0323] Step 5:
[0324] The server formats the extracted data into the format of the information processing device. The input is the extracted data, and the output is the formatted data. Data processing related to formatting is performed.
[0325] Step 6:
[0326] The server validates the formatted data and sends a notification to the user if there are any problems. Data checking and error reporting are performed at this stage.
[0327] Step 7:
[0328] Once the formatted data has been validated, it is automatically registered from the server to the information processing device via API. After registration is complete, the data becomes available for use by the user in the system.
[0329] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0330] This invention combines a system for accurately and efficiently automatically registering information from application forms with an emotion engine that recognizes user emotions. First, the user scans the paper application form as a digital image file on the terminal using a dedicated scanner. Processing begins when the user uploads this image data to the server from the terminal.
[0331] When the server acquires image data, it performs OCR processing to convert handwritten or printed text information into text data. Then, it uses machine learning algorithms and natural language processing techniques to identify the necessary information and format the extracted data according to the system's specified format. The formatted data is verified to ensure it is in the correct format and content, and if there are no errors, it is registered directly into the system.
[0332] A distinctive feature of this invention is that the server uses an emotion engine to flexibly adjust its interaction with the user. Specifically, when notifying the user, the emotion engine estimates the user's emotional state through facial recognition and voice analysis, and provides feedback in an appropriate tone and content. For example, if the server determines that the user is irritated, it can display more polite guidance or provide additional hints for solving the problem.
[0333] For example, if a user makes a mistake when scanning an application form, the server analyzes the image data and notifies the user if there are any unclear parts. If the sentiment engine detects user dissatisfaction, the server expresses its apologies and provides guidance on how to improve scan quality. This allows the user to resolve the problem smoothly and complete data registration efficiently.
[0334] The following describes the processing flow.
[0335] Step 1:
[0336] The user scans the paper application form with a scanner and imports the digital image data into the terminal. The user then uploads this image data to the server using a dedicated interface.
[0337] Step 2:
[0338] The server receives the uploaded image data. After receiving the data, the server activates its OCR engine and analyzes the text information within the image, converting it into text data. At this stage, handwritten and printed characters are recognized.
[0339] Step 3:
[0340] The server processes the text data using machine learning algorithms and extracts the necessary information. Data points required by the system, such as customer name, address, and phone number, are identified.
[0341] Step 4:
[0342] The server formats the extracted information into the system's format. The format is converted to ensure the data conforms to registration requirements and maintains a consistent data structure.
[0343] Step 5:
[0344] The server validates the data. It checks the formatted information and verifies the accuracy and conformity of the fields. If any discrepancies are found, it records the details.
[0345] Step 6:
[0346] If the server detects a flaw during validation, it activates the emotion engine to recognize the user's emotions. It understands the user's emotional state through voice input and facial expression analysis, and adjusts the content and tone of notifications accordingly.
[0347] Step 7:
[0348] The server notifies the user of the problem. Depending on the user's feelings, it sends a message containing appropriate guidance and support information, and suggests the next steps to resolve the issue.
[0349] Step 8:
[0350] The user corrects the data based on the notification. After correcting the data, it is sent back to the server, and the process of reprocessing is automatically initiated.
[0351] Step 9:
[0352] The server automatically registers the data that has been finally validated into the system. The data is then stored in the database via the registration API, completing the process.
[0353] (Example 2)
[0354] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0355] Conventional application form information registration systems experienced errors during the process of digitizing paper application forms and accurately registering them in the system, particularly in accuracy issues with handwritten character recognition and extraction of necessary information. Furthermore, notifications to users when deficiencies occurred were uniform, lacking flexible responses that took into account the user's emotional state.
[0356] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0357] In this invention, the server includes means for acquiring image information, means for performing character recognition processing, and means for extracting necessary data. This enables highly accurate character recognition and data extraction, as well as detailed notifications to the user based on their emotional state.
[0358] "Image information" refers to visual data digitized from paper application forms using a scanner.
[0359] "Optical character recognition (OCR) processing" is the process of extracting text information from image information using OCR technology.
[0360] "Text information" refers to digital character data obtained through character recognition processing.
[0361] "Required data" refers to the specific information items that the system needs to process.
[0362] "Formatting" means converting the extracted necessary data into the format required by the system.
[0363] "Validating validity" is the process of verifying whether formatted data is accurate and complete.
[0364] An "information management system" refers to a database or file system in which digital data is stored and managed.
[0365] "Emotional state" refers to a situation that demonstrates a user's psychological and emotional response.
[0366] "Adjusting notification content" means tailoring the information provided to users to their emotional state, using appropriate content and tone.
[0367] A description of embodiments for carrying out this invention will be given.
[0368] This invention aims to enable users to digitize paper application forms and effectively register them in a system. First, the user captures the paper application form as a digital image on a terminal using a dedicated scanner. A high-precision, general-purpose scanner is used.
[0369] The user then uploads the image data to the server using their device. An application on the device verifies that the file format is correct and performs any necessary conversions. The software used in this process includes common format conversion tools.
[0370] The server performs character recognition processing on the received image data using OCR software. This process utilizes an industry-standard OCR engine, converting handwritten and printed character information into text data. The server also automatically performs image noise reduction and character skew correction.
[0371] The server then uses machine learning algorithms and natural language processing techniques to extract the necessary data from the text data. The AI model used includes the latest natural language processing frameworks, specifically designed to accurately recognize information such as addresses and names.
[0372] Next, the server formats the extracted data into a specified format and then verifies the validity of the formatted data. Once verification is complete, data without errors is automatically registered in the information management system.
[0373] Finally, the server is equipped with an emotion engine that analyzes image and audio data to estimate the user's emotional state. Based on this emotion estimation, notifications to the user are tailored, and the system is designed to improve the user experience, especially when problems are detected.
[0374] For example, if a user scans an application form and uploads it at an angle, the server will automatically correct the content to ensure accurate data extraction. Furthermore, if a user shows difficulty during the upload process, the emotion engine will detect this and the server will provide guidance in a gentle tone.
[0375] A concrete example of a prompt message is, "Generate a notification message for when a user has scanned an application form incorrectly, in a way that is helpful even if the user is frustrated."
[0376] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0377] Step 1:
[0378] The user scans a paper application form using a dedicated scanner. The input is the paper application form, and the output is a high-resolution digital image. The scanner converts the scanned information into image data and saves it to the device. It is especially important to check the paper's position and resolution to ensure a clear scan.
[0379] Step 2:
[0380] The user uses a terminal to upload scanned image data to the server. The input is a digital image file, and the output is the image data stored on the server. The terminal checks the extension and size of the input file and converts it to the optimal format as needed. JPEG or PNG formats are common.
[0381] Step 3:
[0382] The server performs OCR processing on the received image data. The input is a scanned image, and the output is recognized text data. The server runs OCR software to convert handwritten or printed characters in the image into digital strings. This process automatically performs noise reduction and skew correction, achieving highly accurate character recognition.
[0383] Step 4:
[0384] The server extracts necessary data from recognized text data. The input is OCR-processed text data, and the output is specific data elements that have been extracted. The server uses machine learning algorithms and natural language processing techniques to accurately identify addresses, names, dates, etc., from the text. A generative AI model is used for analyzing the data elements.
[0385] Step 5:
[0386] The server formats the extracted data into a specified format and verifies its validity. The input consists of extracted data elements, and the output is the formatted data and its verification results. The server formats the data to conform to a defined database format. The validity of each data element is checked using a cross-check function, and if there are no problems, the process moves to the next stage.
[0387] Step 6:
[0388] The server analyzes image and audio data to estimate the user's emotional state. The input is the user's image and audio data, and the output is the estimated emotional state. The server then drives an emotion engine to adjust notification content based on the recognized emotion. If the user's emotion is irritation, the notification content is set to be more friendly and include more specific instructions.
[0389] Step 7:
[0390] The server registers formatted and validated data into the information management system. Validated data is taken as input, and registered data is obtained from the database as output. Once the registration process is complete, the data becomes officially usable within the system. Security protocols are applied to data storage during this process.
[0391] (Application Example 2)
[0392] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0393] Conventional data registration systems had problems such as users not receiving appropriate feedback when scanning and digitizing application forms, causing stress and delays in the registration process. Furthermore, the inefficient interface due to notifications that did not consider the user's emotional state was also a challenge.
[0394] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0395] In this invention, the server includes a device for acquiring image information, a device for processing encoded data recognition, and a device for estimating emotions and adjusting notification content. This makes it possible to provide appropriate feedback according to the user's emotional state and to proceed with the registration process efficiently.
[0396] "Image information" refers to data stored in a format that can be perceived visually, and is acquired by devices such as scanners and digital cameras.
[0397] "Encoded data recognition processing" is the process of converting characters and symbols contained in image information into a digital format, making them machine-readable data.
[0398] "Format data" refers to a collection of information generated by encoded data recognition processing, and is digital data that has a specific format and structure.
[0399] A "control device" is a system or platform used for inputting, processing, and outputting data.
[0400] "Processing" refers to the process of changing the format and structure of data according to the purpose, and adapting it to the format required by the control device.
[0401] "Appropriateness" is a concept that indicates whether data or information meets specific standards or conditions, and it serves as a criterion for determining whether or not to register it.
[0402] A "user" is an entity that operates a system or manages information through its interface, and is usually a person.
[0403] "Emotions" represent the user's psychological state and are internal reactions estimated through facial expressions and vocal characteristics.
[0404] "Feedback" refers to information or responses that a system outputs to a user, and is usually provided in real time.
[0405] The system for realizing this application is configured as follows: First, the terminal uses a dedicated device to acquire image information such as application forms. The hardware used includes scanners and smartphone cameras. The acquired image information is sent to a cloud server.
[0406] The server uses OCR (Optical Character Recognition) software to perform encoding data recognition processing on the received image information. This process generates formatting data and extracts the relevant information. To extract the necessary character information from the image as digital data, commonly used OCR processing services such as "Google Cloud Vision API" and "Tesseract OCR" are utilized.
[0407] Next, the server applies machine learning algorithms to extract the necessary information and process the data to conform to the control device's format. These algorithms include natural language processing techniques to formalize the information, and the formatted data is then validated for appropriateness. This verifies consistency, and data without issues is automatically registered in the system.
[0408] Simultaneously, the server integrates facial recognition technology (e.g., the OpenCV library) and a voice analysis API (e.g., IBM Watson) to operate a device that estimates emotions from the user's facial expressions and voice. This allows notifications and feedback to be tailored to the user's psychological state, providing appropriate guidance. For example, if the user indicates discomfort, the server will select feedback that uses gentle language to help resolve the problem.
[0409] In this way, users can operate smoothly and efficiently register information. An example of a prompt message for the generating AI model is: "Develop an algorithm that analyzes the user's current emotional state and provides optimal feedback. The five main emotions are joy, surprise, anger, sadness, and fear."
[0410] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0411] Step 1:
[0412] The terminal acquires image information, such as application forms, via a scanner used by the user or a camera on a smart device. The input is a physical application form, and the output is scanned image data. This image data is sent to a cloud server for subsequent digital processing.
[0413] Step 2:
[0414] The server performs OCR (Optical Character Recognition) processing on the received image data using OCR software. The input is image data, and the output is text data. OCR processing analyzes the characters within the image and converts them into digital character format. General-purpose OCR tools are used for this purpose.
[0415] Step 3:
[0416] The server applies machine learning algorithms to text data, extracts necessary information, and formats it into the control device's specified format. Input is text data generated by OCR, and output is formatted data. This process utilizes natural language processing techniques to analyze the text and identify important sections.
[0417] Step 4:
[0418] The server validates the formatted data for appropriateness. The input is the formatted data, and the output is either validated data for registration or an error message. Validation checks if the data meets the specified conditions; if there are no problems, the data is sent for registration.
[0419] Step 5:
[0420] If inappropriate data is detected, the server sends an error notification to the user and provides feedback. The input is the error message, and the output is the notification content to the user. The notification content includes instructions and advice on how to correct the operation.
[0421] Step 6:
[0422] Simultaneously, the server uses facial recognition technology and voice analysis to estimate the user's emotions. Input is real-time facial images and voice data, while output is an evaluation of the emotional state. Based on this, notifications and feedback are further refined according to the user's psychological state.
[0423] Step 7:
[0424] Users receive appropriate and reassuring guidance, enabling them to improve their operations. Input is feedback from the system, and output is the result of the user's new actions. This allows for the smooth completion of the registration process.
[0425] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0426] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0427] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0428] [Third Embodiment]
[0429] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0430] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0431] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0432] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0433] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0434] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0435] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0436] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0437] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0438] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0439] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0440] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0441] This invention provides a system for automatically registering information from application forms into a system. First, the user scans the paper application form and captures it as digital image data on a terminal. Next, the processing begins when the image data is uploaded to a server. Upon receiving the image data, the server uses an OCR (Optical Character Recognition) engine to analyze the character information within the image and convert it into text data.
[0442] From the converted text data, the server extracts the necessary information using machine learning algorithms and natural language processing techniques. The extracted information is formatted into a specific format and transformed into a structure that meets the system's registration requirements. During this process, the server verifies the validity of the data from multiple angles to check for errors and inconsistencies. If any deficiencies are found, the server immediately notifies the user and requests correction. After the user makes the necessary corrections, the data is verified again.
[0443] Finally, validated data is automatically registered directly from the server into the system. This registration process is carried out via the system's API, ensuring speed and accuracy.
[0444] For example, if a user registers an application form containing customer and address information into the system, the system automatically extracts important information from the scanned image and accurately updates it to the system. Through this process, human error is expected to be reduced, and operational efficiency will be significantly improved.
[0445] The following describes the processing flow.
[0446] Step 1:
[0447] The user scans the application form and saves the digital image data to their device. The device then uploads the scanned image data to a server via a dedicated web portal or application.
[0448] Step 2:
[0449] The server performs OCR processing on image data received from the terminal. Using the OCR engine, it analyzes handwritten and printed characters within the image and converts them into text data.
[0450] Step 3:
[0451] The server analyzes the text data generated by OCR. Using machine learning algorithms and natural language processing techniques, it identifies important information and extracts necessary details such as customer names, addresses, and phone numbers.
[0452] Step 4:
[0453] The server formats the extracted information into the system format. It transforms the information to match the required data format and item structure, maintaining format consistency.
[0454] Step 5:
[0455] The server verifies the formatted data and checks its validity. It checks the accuracy and completeness of the data items and reformats them if necessary.
[0456] Step 6:
[0457] If the server finds any issues during verification, it will send a notification to the user. The user will then review and correct the data based on the details in the notification.
[0458] Step 7:
[0459] The user resubmits the corrected data to the server. The server reformats and validates the data, confirming that the deficiencies have been resolved.
[0460] Step 8:
[0461] The server automatically registers validated data into the system. The system's registration API is used to quickly and accurately save the data to the database.
[0462] (Example 1)
[0463] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0464] Conventional manual input and registration of digital information is time-consuming and labor-intensive, and prone to human error. Furthermore, verifying the consistency of analyzed data and detecting deficiencies is difficult, and these deficiencies could potentially impact subsequent operations. This invention aims to improve operational efficiency by enabling automated registration of digital data and early detection of deficiencies.
[0465] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0466] In this invention, the server includes means for converting an input document into digital information, means for identifying text information based on the digital information, and means for obtaining attribute data from the identified text information. This enables rapid and accurate registration of digital data.
[0467] An "input document" is an information recording medium submitted in paper or electronic format.
[0468] "Digital information" refers to information that has been converted from analog data into a digital format.
[0469] "Text information" refers to string data identified or extracted from digital information.
[0470] "Attribute data" refers to information with specific properties or characteristics extracted from the aforementioned text information.
[0471] A "data structure" is a set of formats and methods for organizing data.
[0472] "Consistency" refers to the property of data that maintains consistency according to specific standards or rules.
[0473] An "information processing device" is a computing device used to collect, process, store, and output data.
[0474] An "operator" is a person or entity that operates a system or device.
[0475] A "data analysis algorithm" is a computational method or set of rules used to process data and derive useful information.
[0476] "Language processing technology" refers to the techniques and methods for analyzing and understanding natural language data.
[0477] This invention is a system designed to promote paperless operations and improve the efficiency of information registration. The system consists of a series of steps for converting input documents into digital information and automatically registering the extracted information into an information processing device.
[0478] The user first converts the paper input document into a digital image format using a scanner (a common hardware device, such as a document scanner). This step utilizes OCR (Optical Character Recognition) technology to convert the information into digital format. Examples of such software include Tesseract OCR and Adobe PDF OCR.
[0479] Next, the user uploads the digital image file generated using the device to the server. The server identifies text information based on the received digital information. Data analysis algorithms and natural language processing techniques are used here, with software such as SpaCy and Hugging Face Transformers being particularly relevant.
[0480] The server extracts attribute data such as customer information and addresses from the identified text information and adapts it to the required data structure. It then evaluates the integrity of the formatted data to check for errors or deficiencies. If deficiencies are found, the server generates a notification and provides the user with instructions for correction.
[0481] As a concrete example, there is a process in which a user scans a customer registration form and uploads it to a server, thereby accurately registering the customer's name and address in the system. This invention significantly simplifies the data entry process into the information processing device and improves its accuracy.
[0482] An example of a prompt message might be: "Please scan the application form containing customer information and upload it to the server. The system will automatically identify, format, and register the data. If there are any errors, you will be notified with instructions to correct them."
[0483] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0484] Step 1:
[0485] The user scans a paper document using a scanner and saves it to the terminal as a digital image. The input is a paper application form, and the output is a digital image in PDF or JPEG format. Specifically, the user adjusts the scanner settings, setting the resolution to 300 dpi to ensure the text information is clearly legible.
[0486] Step 2:
[0487] The user uploads digital images generated using the terminal's software to the server. The input is a digital image file on the terminal, and this data is transferred to the server. The output is the receipt of the image data sent to the server. During this process, the user selects files and performs account authentication as needed.
[0488] Step 3:
[0489] The server analyzes the received digital image data using an OCR engine and converts the text information within the image into digital character data. Here, the input is digital image data, and the output is text data. Specifically, the server uses software such as Tesseract OCR to perform image analysis and character recognition.
[0490] Step 4:
[0491] The server extracts necessary attribute data from the converted text data using machine learning algorithms. The input is text data converted by OCR, and the output is attribute data such as important customer information and addresses. The server uses natural language processing techniques, such as Hugging Face Transformers, to extract meaning from the text.
[0492] Step 5:
[0493] The server extracts attribute data, formats it into a predetermined data structure, and ensures consistency as data for registration in the information processing device. The input is the extracted attribute data, and the output is the formatted registration data. Specifically, it checks for compatibility with the database and scrutinizes it for errors.
[0494] Step 6:
[0495] The server evaluates the integrity of the formatted data and verifies that there are no deficiencies. The input is the formatted data, and the output is the verification result. Data whose integrity has been confirmed is automatically registered in the information processing device.
[0496] Step 7:
[0497] If a flaw is detected during integrity verification, the server will send a notification to the user requesting correction. The input is the data in which the flaw was detected, and the output is the notification message. This message will include specific instructions for correction.
[0498] Step 8:
[0499] After the user makes the necessary corrections, they send the data back to the server and repeat step 6. The input is the corrected data, and the server verifies its integrity again. The output is the final registered data.
[0500] (Application Example 1)
[0501] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0502] Digitizing paper application forms and various documents presents challenges such as the time-consuming and labor-intensive process of manual data entry, as well as the high likelihood of human error. Furthermore, there is a demand for digitized information to be integrated into systems and used as readily available data, rather than simply being stored as electronic files. Ultimately, the goal is to enable users to easily digitize and manage data using smart devices and wearable devices.
[0503] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0504] In this invention, the server includes means for acquiring image information, means for performing character recognition processing based on the image information, and means for extracting necessary data from the character information obtained by the character recognition processing. This enables efficient digitalization and automated data registration using smart devices and wearable devices.
[0505] "Means for acquiring image information" refers to a device or method for capturing paper documents as images in order to convert them into a digital format.
[0506] "Means for character recognition processing" refers to technology that detects characters from acquired image information and converts them into digital text.
[0507] "Means for extracting necessary data" refers to processing methods for identifying and extracting specific information from recognized text data.
[0508] "Means of formatting data to conform to the format of an information processing device" refers to the process of converting extracted data into a format suitable for the system used by the user.
[0509] "Means of validating validity" refers to checking methods to confirm that formatted data is accurate.
[0510] "Means for automatically registering data in an information processing device" refers to a method of directly inputting and saving validated data into the system without manual operation.
[0511] "Means of supporting digitalization using smart devices and wearable devices" refers to functions that utilize advanced devices such as smartphones and smart glasses to easily and quickly digitize documents.
[0512] "Means of notifying users" refers to a system that promptly communicates information to users when data inconsistencies or errors are detected.
[0513] To implement this invention, a system is required in which a smart device or wearable device (e.g., smartphone, smart glasses) and a cloud server work together. First, the user uses the camera function of the smart device to take a picture of a paper application form or document and acquire image data. Next, the acquired image data is transmitted to the cloud server via the internet.
[0514] On the server, the Tesseract OCR engine identifies characters in image data and converts them into digital text data. The generated text data is then processed using machine learning algorithms and natural language processing techniques to extract important data. The extracted data is then formatted to conform to the format of the information processing device (computer system) being used.
[0515] The formatted data is validated. If errors or inconsistencies are detected, the server notifies the user and requests correction. After validation is confirmed, the data is automatically registered in the information processing device via API. In this process, users can easily digitize documents and register data by utilizing smart devices and wearable devices.
[0516] As a concrete example, when digitizing volunteer activity registration forms, users can take a picture of the form with their smartphones, and the information is automatically registered in the system. An example of a prompt message would be, "Please extract and format the volunteer registration information written in this image. Pay particular attention to the activity details and contact information." In this way, it becomes possible to digitize and manage documents quickly and efficiently.
[0517] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0518] Step 1:
[0519] The user takes a picture of the application form with their smart device's camera. The captured image data is saved on the device. It is then prepared as an image file for the user's next processing step.
[0520] Step 2:
[0521] The device sends the acquired image data to a cloud server via the internet. The arrival of the transmitted image data at the server triggers the start of the next process.
[0522] Step 3:
[0523] The server receives image data and performs character recognition processing using the Tesseract OCR engine. Text data is extracted from the image. The input is image data, and the output is text data.
[0524] Step 4:
[0525] The server uses the generated text data to extract the necessary data by applying natural language processing techniques and machine learning algorithms. This step involves filtering and structuring the information that is important to the process.
[0526] Step 5:
[0527] The server formats the extracted data into the format of the information processing device. The input is the extracted data, and the output is the formatted data. Data processing related to formatting is performed.
[0528] Step 6:
[0529] The server validates the formatted data and sends a notification to the user if there are any problems. Data checking and error reporting are performed at this stage.
[0530] Step 7:
[0531] Once the formatted data has been validated, it is automatically registered from the server to the information processing device via API. After registration is complete, the data becomes available for use by the user in the system.
[0532] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0533] This invention combines a system for accurately and efficiently automatically registering information from application forms with an emotion engine that recognizes user emotions. First, the user scans the paper application form as a digital image file on the terminal using a dedicated scanner. Processing begins when the user uploads this image data to the server from the terminal.
[0534] When the server acquires image data, it performs OCR processing to convert handwritten or printed text information into text data. Then, it uses machine learning algorithms and natural language processing techniques to identify the necessary information and format the extracted data according to the system's specified format. The formatted data is verified to ensure it is in the correct format and content, and if there are no errors, it is registered directly into the system.
[0535] A distinctive feature of this invention is that the server uses an emotion engine to flexibly adjust its interaction with the user. Specifically, when notifying the user, the emotion engine estimates the user's emotional state through facial recognition and voice analysis, and provides feedback in an appropriate tone and content. For example, if the server determines that the user is irritated, it can display more polite guidance or provide additional hints for solving the problem.
[0536] For example, if a user makes a mistake when scanning an application form, the server analyzes the image data and notifies the user if there are any unclear parts. If the sentiment engine detects user dissatisfaction, the server expresses its apologies and provides guidance on how to improve scan quality. This allows the user to resolve the problem smoothly and complete data registration efficiently.
[0537] The following describes the processing flow.
[0538] Step 1:
[0539] The user scans the paper application form with a scanner and imports the digital image data into the terminal. The user then uploads this image data to the server using a dedicated interface.
[0540] Step 2:
[0541] The server receives the uploaded image data. After receiving the data, the server activates its OCR engine and analyzes the text information within the image, converting it into text data. At this stage, handwritten and printed characters are recognized.
[0542] Step 3:
[0543] The server processes the text data using machine learning algorithms and extracts the necessary information. Data points required by the system, such as customer name, address, and phone number, are identified.
[0544] Step 4:
[0545] The server formats the extracted information into the system's format. The format is converted to ensure the data conforms to registration requirements and maintains a consistent data structure.
[0546] Step 5:
[0547] The server validates the data. It checks the formatted information and verifies the accuracy and conformity of the fields. If any discrepancies are found, it records the details.
[0548] Step 6:
[0549] If the server detects a flaw during validation, it activates the emotion engine to recognize the user's emotions. It understands the user's emotional state through voice input and facial expression analysis, and adjusts the content and tone of notifications accordingly.
[0550] Step 7:
[0551] The server notifies the user of the problem. Depending on the user's feelings, it sends a message containing appropriate guidance and support information, and suggests the next steps to resolve the issue.
[0552] Step 8:
[0553] The user corrects the data based on the notification. After correcting the data, it is sent back to the server, and the process of reprocessing is automatically initiated.
[0554] Step 9:
[0555] The server automatically registers the data that has been finally validated into the system. The data is then stored in the database via the registration API, completing the process.
[0556] (Example 2)
[0557] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0558] Conventional application form information registration systems experienced errors during the process of digitizing paper application forms and accurately registering them in the system, particularly in accuracy issues with handwritten character recognition and extraction of necessary information. Furthermore, notifications to users when deficiencies occurred were uniform, lacking flexible responses that took into account the user's emotional state.
[0559] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0560] In this invention, the server includes means for acquiring image information, means for performing character recognition processing, and means for extracting necessary data. This enables highly accurate character recognition and data extraction, as well as detailed notifications to the user based on their emotional state.
[0561] "Image information" refers to visual data digitized from paper application forms using a scanner.
[0562] "Optical character recognition (OCR) processing" is the process of extracting text information from image information using OCR technology.
[0563] "Text information" refers to digital character data obtained through character recognition processing.
[0564] "Required data" refers to the specific information items that the system needs to process.
[0565] "Formatting" means converting the extracted necessary data into the format required by the system.
[0566] "Validating validity" is the process of verifying whether formatted data is accurate and complete.
[0567] An "information management system" refers to a database or file system in which digital data is stored and managed.
[0568] "Emotional state" refers to a situation that demonstrates a user's psychological and emotional response.
[0569] "Adjusting notification content" means tailoring the information provided to users to their emotional state, using appropriate content and tone.
[0570] A description of embodiments for carrying out this invention will be given.
[0571] This invention aims to enable users to digitize paper application forms and effectively register them in a system. First, the user captures the paper application form as a digital image on a terminal using a dedicated scanner. A high-precision, general-purpose scanner is used.
[0572] The user then uploads the image data to the server using their device. An application on the device verifies that the file format is correct and performs any necessary conversions. The software used in this process includes common format conversion tools.
[0573] The server performs character recognition processing on the received image data using OCR software. This process utilizes an industry-standard OCR engine, converting handwritten and printed character information into text data. The server also automatically performs image noise reduction and character skew correction.
[0574] The server then uses machine learning algorithms and natural language processing techniques to extract the necessary data from the text data. The AI model used includes the latest natural language processing frameworks, specifically designed to accurately recognize information such as addresses and names.
[0575] Next, the server formats the extracted data into a specified format and then verifies the validity of the formatted data. Once verification is complete, data without errors is automatically registered in the information management system.
[0576] Finally, the server is equipped with an emotion engine that analyzes image and audio data to estimate the user's emotional state. Based on this emotion estimation, notifications to the user are tailored, and the system is designed to improve the user experience, especially when problems are detected.
[0577] For example, if a user scans an application form and uploads it at an angle, the server will automatically correct the content to ensure accurate data extraction. Furthermore, if a user shows difficulty during the upload process, the emotion engine will detect this and the server will provide guidance in a gentle tone.
[0578] A concrete example of a prompt message is, "Generate a notification message for when a user has scanned an application form incorrectly, in a way that is helpful even if the user is frustrated."
[0579] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0580] Step 1:
[0581] The user scans a paper application form using a dedicated scanner. The input is the paper application form, and the output is a high-resolution digital image. The scanner converts the scanned information into image data and saves it to the device. It is especially important to check the paper's position and resolution to ensure a clear scan.
[0582] Step 2:
[0583] The user uses a terminal to upload scanned image data to the server. The input is a digital image file, and the output is the image data stored on the server. The terminal checks the extension and size of the input file and converts it to the optimal format as needed. JPEG or PNG formats are common.
[0584] Step 3:
[0585] The server performs OCR processing on the received image data. The input is a scanned image, and the output is recognized text data. The server runs OCR software to convert handwritten or printed characters in the image into digital strings. This process automatically performs noise reduction and skew correction, achieving highly accurate character recognition.
[0586] Step 4:
[0587] The server extracts necessary data from recognized text data. The input is OCR-processed text data, and the output is specific data elements that have been extracted. The server uses machine learning algorithms and natural language processing techniques to accurately identify addresses, names, dates, etc., from the text. A generative AI model is used for analyzing the data elements.
[0588] Step 5:
[0589] The server formats the extracted data into a specified format and verifies its validity. The input consists of extracted data elements, and the output is the formatted data and its verification results. The server formats the data to conform to a defined database format. The validity of each data element is checked using a cross-check function, and if there are no problems, the process moves to the next stage.
[0590] Step 6:
[0591] The server analyzes image and audio data to estimate the user's emotional state. The input is the user's image and audio data, and the output is the estimated emotional state. The server then drives an emotion engine to adjust notification content based on the recognized emotion. If the user's emotion is irritation, the notification content is set to be more friendly and include more specific instructions.
[0592] Step 7:
[0593] The server registers formatted and validated data into the information management system. Validated data is taken as input, and registered data is obtained from the database as output. Once the registration process is complete, the data becomes officially usable within the system. Security protocols are applied to data storage during this process.
[0594] (Application Example 2)
[0595] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0596] Conventional data registration systems had problems such as users not receiving appropriate feedback when scanning and digitizing application forms, causing stress and delays in the registration process. Furthermore, the inefficient interface due to notifications that did not consider the user's emotional state was also a challenge.
[0597] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0598] In this invention, the server includes a device for acquiring image information, a device for processing encoded data recognition, and a device for estimating emotions and adjusting notification content. This makes it possible to provide appropriate feedback according to the user's emotional state and to proceed with the registration process efficiently.
[0599] "Image information" refers to data stored in a format that can be perceived visually, and is acquired by devices such as scanners and digital cameras.
[0600] "Encoded data recognition processing" is the process of converting characters and symbols contained in image information into a digital format, making them machine-readable data.
[0601] "Format data" refers to a collection of information generated by encoded data recognition processing, and is digital data that has a specific format and structure.
[0602] A "control device" is a system or platform used for inputting, processing, and outputting data.
[0603] "Processing" refers to the process of changing the format and structure of data according to the purpose, and adapting it to the format required by the control device.
[0604] "Appropriateness" is a concept that indicates whether data or information meets specific standards or conditions, and it serves as a criterion for determining whether or not to register it.
[0605] A "user" is an entity that operates a system or manages information through its interface, and is usually a person.
[0606] "Emotions" represent the user's psychological state and are internal reactions estimated through facial expressions and vocal characteristics.
[0607] "Feedback" refers to information or responses that a system outputs to a user, and is usually provided in real time.
[0608] The system for realizing this application is configured as follows: First, the terminal uses a dedicated device to acquire image information such as application forms. The hardware used includes scanners and smartphone cameras. The acquired image information is sent to a cloud server.
[0609] The server uses OCR (Optical Character Recognition) software to perform encoding data recognition processing on the received image information. This process generates formatting data and extracts the relevant information. To extract the necessary character information from the image as digital data, commonly used OCR processing services such as "Google Cloud Vision API" and "Tesseract OCR" are utilized.
[0610] Next, the server applies machine learning algorithms to extract the necessary information and process the data to conform to the control device's format. These algorithms include natural language processing techniques to formalize the information, and the formatted data is then validated for appropriateness. This verifies consistency, and data without issues is automatically registered in the system.
[0611] Simultaneously, the server integrates facial recognition technology (e.g., the OpenCV library) and a voice analysis API (e.g., IBM Watson) to operate a device that estimates emotions from the user's facial expressions and voice. This allows notifications and feedback to be tailored to the user's psychological state, providing appropriate guidance. For example, if the user indicates discomfort, the server will select feedback that uses gentle language to help resolve the problem.
[0612] In this way, users can operate smoothly and efficiently register information. An example of a prompt message for the generating AI model is: "Develop an algorithm that analyzes the user's current emotional state and provides optimal feedback. The five main emotions are joy, surprise, anger, sadness, and fear."
[0613] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0614] Step 1:
[0615] The terminal acquires image information, such as application forms, via a scanner used by the user or a camera on a smart device. The input is a physical application form, and the output is scanned image data. This image data is sent to a cloud server for subsequent digital processing.
[0616] Step 2:
[0617] The server performs OCR (Optical Character Recognition) processing on the received image data using OCR software. The input is image data, and the output is text data. OCR processing analyzes the characters within the image and converts them into digital character format. General-purpose OCR tools are used for this purpose.
[0618] Step 3:
[0619] The server applies machine learning algorithms to text data, extracts necessary information, and formats it into the control device's specified format. Input is text data generated by OCR, and output is formatted data. This process utilizes natural language processing techniques to analyze the text and identify important sections.
[0620] Step 4:
[0621] The server validates the formatted data for appropriateness. The input is the formatted data, and the output is either validated data for registration or an error message. Validation checks if the data meets the specified conditions; if there are no problems, the data is sent for registration.
[0622] Step 5:
[0623] If inappropriate data is detected, the server sends an error notification to the user and provides feedback. The input is the error message, and the output is the notification content to the user. The notification content includes instructions and advice on how to correct the operation.
[0624] Step 6:
[0625] Simultaneously, the server uses facial recognition technology and voice analysis to estimate the user's emotions. Input is real-time facial images and voice data, while output is an evaluation of the emotional state. Based on this, notifications and feedback are further refined according to the user's psychological state.
[0626] Step 7:
[0627] Users receive appropriate and reassuring guidance, enabling them to improve their operations. Input is feedback from the system, and output is the result of the user's new actions. This allows for the smooth completion of the registration process.
[0628] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0629] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0630] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0631] [Fourth Embodiment]
[0632] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0633] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0634] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0635] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0636] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0637] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0638] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0639] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0640] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0641] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0642] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0643] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0644] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0645] This invention provides a system for automatically registering information from application forms into a system. First, the user scans the paper application form and captures it as digital image data on a terminal. Next, the processing begins when the image data is uploaded to a server. Upon receiving the image data, the server uses an OCR (Optical Character Recognition) engine to analyze the character information within the image and convert it into text data.
[0646] From the converted text data, the server extracts the necessary information using machine learning algorithms and natural language processing techniques. The extracted information is formatted into a specific format and transformed into a structure that meets the system's registration requirements. During this process, the server verifies the validity of the data from multiple angles to check for errors and inconsistencies. If any deficiencies are found, the server immediately notifies the user and requests correction. After the user makes the necessary corrections, the data is verified again.
[0647] Finally, validated data is automatically registered directly from the server into the system. This registration process is carried out via the system's API, ensuring speed and accuracy.
[0648] For example, if a user registers an application form containing customer and address information into the system, the system automatically extracts important information from the scanned image and accurately updates it to the system. Through this process, human error is expected to be reduced, and operational efficiency will be significantly improved.
[0649] The following describes the processing flow.
[0650] Step 1:
[0651] The user scans the application form and saves the digital image data to their device. The device then uploads the scanned image data to a server via a dedicated web portal or application.
[0652] Step 2:
[0653] The server performs OCR processing on image data received from the terminal. Using the OCR engine, it analyzes handwritten and printed characters within the image and converts them into text data.
[0654] Step 3:
[0655] The server analyzes the text data generated by OCR. Using machine learning algorithms and natural language processing techniques, it identifies important information and extracts necessary details such as customer names, addresses, and phone numbers.
[0656] Step 4:
[0657] The server formats the extracted information into the system format. It transforms the information to match the required data format and item structure, maintaining format consistency.
[0658] Step 5:
[0659] The server verifies the formatted data and checks its validity. It checks the accuracy and completeness of the data items and reformats them if necessary.
[0660] Step 6:
[0661] If the server finds any issues during verification, it will send a notification to the user. The user will then review and correct the data based on the details in the notification.
[0662] Step 7:
[0663] The user resubmits the corrected data to the server. The server reformats and validates the data, confirming that the deficiencies have been resolved.
[0664] Step 8:
[0665] The server automatically registers validated data into the system. The system's registration API is used to quickly and accurately save the data to the database.
[0666] (Example 1)
[0667] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0668] Conventional manual input and registration of digital information is time-consuming and labor-intensive, and prone to human error. Furthermore, verifying the consistency of analyzed data and detecting deficiencies is difficult, and these deficiencies could potentially impact subsequent operations. This invention aims to improve operational efficiency by enabling automated registration of digital data and early detection of deficiencies.
[0669] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0670] In this invention, the server includes means for converting an input document into digital information, means for identifying text information based on the digital information, and means for obtaining attribute data from the identified text information. This enables rapid and accurate registration of digital data.
[0671] An "input document" is an information recording medium submitted in paper or electronic format.
[0672] "Digital information" refers to information that has been converted from analog data into a digital format.
[0673] "Text information" refers to string data identified or extracted from digital information.
[0674] "Attribute data" refers to information with specific properties or characteristics extracted from the aforementioned text information.
[0675] A "data structure" is a set of formats and methods for organizing data.
[0676] "Consistency" refers to the property of data that maintains consistency according to specific standards or rules.
[0677] An "information processing device" is a computing device used to collect, process, store, and output data.
[0678] An "operator" is a person or entity that operates a system or device.
[0679] A "data analysis algorithm" is a computational method or set of rules used to process data and derive useful information.
[0680] "Language processing technology" refers to the techniques and methods for analyzing and understanding natural language data.
[0681] This invention is a system designed to promote paperless operations and improve the efficiency of information registration. The system consists of a series of steps for converting input documents into digital information and automatically registering the extracted information into an information processing device.
[0682] The user first converts the paper input document into a digital image format using a scanner (a common hardware device, such as a document scanner). This step utilizes OCR (Optical Character Recognition) technology to convert the information into digital format. Examples of such software include Tesseract OCR and Adobe PDF OCR.
[0683] Next, the user uploads the digital image file generated using the device to the server. The server identifies text information based on the received digital information. Data analysis algorithms and natural language processing techniques are used here, with software such as SpaCy and Hugging Face Transformers being particularly relevant.
[0684] The server extracts attribute data such as customer information and addresses from the identified text information and adapts it to the required data structure. It then evaluates the integrity of the formatted data to check for errors or deficiencies. If deficiencies are found, the server generates a notification and provides the user with instructions for correction.
[0685] As a concrete example, there is a process in which a user scans a customer registration form and uploads it to a server, thereby accurately registering the customer's name and address in the system. This invention significantly simplifies the data entry process into the information processing device and improves its accuracy.
[0686] An example of a prompt message might be: "Please scan the application form containing customer information and upload it to the server. The system will automatically identify, format, and register the data. If there are any errors, you will be notified with instructions to correct them."
[0687] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0688] Step 1:
[0689] The user scans a paper document using a scanner and saves it to the terminal as a digital image. The input is a paper application form, and the output is a digital image in PDF or JPEG format. Specifically, the user adjusts the scanner settings, setting the resolution to 300 dpi to ensure the text information is clearly legible.
[0690] Step 2:
[0691] The user uploads digital images generated using the terminal's software to the server. The input is a digital image file on the terminal, and this data is transferred to the server. The output is the receipt of the image data sent to the server. During this process, the user selects files and performs account authentication as needed.
[0692] Step 3:
[0693] The server analyzes the received digital image data using an OCR engine and converts the text information within the image into digital character data. Here, the input is digital image data, and the output is text data. Specifically, the server uses software such as Tesseract OCR to perform image analysis and character recognition.
[0694] Step 4:
[0695] The server extracts necessary attribute data from the converted text data using machine learning algorithms. The input is text data converted by OCR, and the output is attribute data such as important customer information and addresses. The server uses natural language processing techniques, such as Hugging Face Transformers, to extract meaning from the text.
[0696] Step 5:
[0697] The server extracts attribute data, formats it into a predetermined data structure, and ensures consistency as data for registration in the information processing device. The input is the extracted attribute data, and the output is the formatted registration data. Specifically, it checks for compatibility with the database and scrutinizes it for errors.
[0698] Step 6:
[0699] The server evaluates the integrity of the formatted data and verifies that there are no deficiencies. The input is the formatted data, and the output is the verification result. Data whose integrity has been confirmed is automatically registered in the information processing device.
[0700] Step 7:
[0701] If a flaw is detected during integrity verification, the server will send a notification to the user requesting correction. The input is the data in which the flaw was detected, and the output is the notification message. This message will include specific instructions for correction.
[0702] Step 8:
[0703] After the user makes the necessary corrections, they send the data back to the server and repeat step 6. The input is the corrected data, and the server verifies its integrity again. The output is the final registered data.
[0704] (Application Example 1)
[0705] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0706] Digitizing paper application forms and various documents presents challenges such as the time-consuming and labor-intensive process of manual data entry, as well as the high likelihood of human error. Furthermore, there is a demand for digitized information to be integrated into systems and used as readily available data, rather than simply being stored as electronic files. Ultimately, the goal is to enable users to easily digitize and manage data using smart devices and wearable devices.
[0707] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0708] In this invention, the server includes means for acquiring image information, means for performing character recognition processing based on the image information, and means for extracting necessary data from the character information obtained by the character recognition processing. This enables efficient digitalization and automated data registration using smart devices and wearable devices.
[0709] "Means for acquiring image information" refers to a device or method for capturing paper documents as images in order to convert them into a digital format.
[0710] "Means for character recognition processing" refers to technology that detects characters from acquired image information and converts them into digital text.
[0711] "Means for extracting necessary data" refers to processing methods for identifying and extracting specific information from recognized text data.
[0712] "Means of formatting data to conform to the format of an information processing device" refers to the process of converting extracted data into a format suitable for the system used by the user.
[0713] "Means of validating validity" refers to checking methods to confirm that formatted data is accurate.
[0714] "Means for automatically registering data in an information processing device" refers to a method of directly inputting and saving validated data into the system without manual operation.
[0715] "Means of supporting digitalization using smart devices and wearable devices" refers to functions that utilize advanced devices such as smartphones and smart glasses to easily and quickly digitize documents.
[0716] "Means of notifying users" refers to a system that promptly communicates information to users when data inconsistencies or errors are detected.
[0717] To implement this invention, a system is required in which a smart device or wearable device (e.g., smartphone, smart glasses) and a cloud server work together. First, the user uses the camera function of the smart device to take a picture of a paper application form or document and acquire image data. Next, the acquired image data is transmitted to the cloud server via the internet.
[0718] On the server, the Tesseract OCR engine identifies characters in image data and converts them into digital text data. The generated text data is then processed using machine learning algorithms and natural language processing techniques to extract important data. The extracted data is then formatted to conform to the format of the information processing device (computer system) being used.
[0719] The formatted data is validated. If errors or inconsistencies are detected, the server notifies the user and requests correction. After validation is confirmed, the data is automatically registered in the information processing device via API. In this process, users can easily digitize documents and register data by utilizing smart devices and wearable devices.
[0720] As a concrete example, when digitizing volunteer activity registration forms, users can take a picture of the form with their smartphones, and the information is automatically registered in the system. An example of a prompt message would be, "Please extract and format the volunteer registration information written in this image. Pay particular attention to the activity details and contact information." In this way, it becomes possible to digitize and manage documents quickly and efficiently.
[0721] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0722] Step 1:
[0723] The user takes a picture of the application form with their smart device's camera. The captured image data is saved on the device. It is then prepared as an image file for the user's next processing step.
[0724] Step 2:
[0725] The device sends the acquired image data to a cloud server via the internet. The arrival of the transmitted image data at the server triggers the start of the next process.
[0726] Step 3:
[0727] The server receives image data and performs character recognition processing using the Tesseract OCR engine. Text data is extracted from the image. The input is image data, and the output is text data.
[0728] Step 4:
[0729] The server uses the generated text data to extract the necessary data by applying natural language processing techniques and machine learning algorithms. This step involves filtering and structuring the information that is important to the process.
[0730] Step 5:
[0731] The server formats the extracted data into the format of the information processing device. The input is the extracted data, and the output is the formatted data. Data processing related to formatting is performed.
[0732] Step 6:
[0733] The server validates the formatted data and sends a notification to the user if there are any problems. Data checking and error reporting are performed at this stage.
[0734] Step 7:
[0735] Once the formatted data has been validated, it is automatically registered from the server to the information processing device via API. After registration is complete, the data becomes available for use by the user in the system.
[0736] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0737] This invention combines a system for accurately and efficiently automatically registering information from application forms with an emotion engine that recognizes user emotions. First, the user scans the paper application form as a digital image file on the terminal using a dedicated scanner. Processing begins when the user uploads this image data to the server from the terminal.
[0738] When the server acquires image data, it performs OCR processing to convert handwritten or printed text information into text data. Then, it uses machine learning algorithms and natural language processing techniques to identify the necessary information and format the extracted data according to the system's specified format. The formatted data is verified to ensure it is in the correct format and content, and if there are no errors, it is registered directly into the system.
[0739] A distinctive feature of this invention is that the server uses an emotion engine to flexibly adjust its interaction with the user. Specifically, when notifying the user, the emotion engine estimates the user's emotional state through facial recognition and voice analysis, and provides feedback in an appropriate tone and content. For example, if the server determines that the user is irritated, it can display more polite guidance or provide additional hints for solving the problem.
[0740] For example, if a user makes a mistake when scanning an application form, the server analyzes the image data and notifies the user if there are any unclear parts. If the sentiment engine detects user dissatisfaction, the server expresses its apologies and provides guidance on how to improve scan quality. This allows the user to resolve the problem smoothly and complete data registration efficiently.
[0741] The following describes the processing flow.
[0742] Step 1:
[0743] The user scans the paper application form with a scanner and imports the digital image data into the terminal. The user then uploads this image data to the server using a dedicated interface.
[0744] Step 2:
[0745] The server receives the uploaded image data. After receiving the data, the server activates its OCR engine and analyzes the text information within the image, converting it into text data. At this stage, handwritten and printed characters are recognized.
[0746] Step 3:
[0747] The server processes the text data using machine learning algorithms and extracts the necessary information. Data points required by the system, such as customer name, address, and phone number, are identified.
[0748] Step 4:
[0749] The server formats the extracted information into the system's format. The format is converted to ensure the data conforms to registration requirements and maintains a consistent data structure.
[0750] Step 5:
[0751] The server validates the data. It checks the formatted information and verifies the accuracy and conformity of the fields. If any discrepancies are found, it records the details.
[0752] Step 6:
[0753] If the server detects a flaw during validation, it activates the emotion engine to recognize the user's emotions. It understands the user's emotional state through voice input and facial expression analysis, and adjusts the content and tone of notifications accordingly.
[0754] Step 7:
[0755] The server notifies the user of the problem. Depending on the user's feelings, it sends a message containing appropriate guidance and support information, and suggests the next steps to resolve the issue.
[0756] Step 8:
[0757] The user corrects the data based on the notification. After correcting the data, it is sent back to the server, and the process of reprocessing is automatically initiated.
[0758] Step 9:
[0759] The server automatically registers the data that has been finally validated into the system. The data is then stored in the database via the registration API, completing the process.
[0760] (Example 2)
[0761] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0762] Conventional application form information registration systems experienced errors during the process of digitizing paper application forms and accurately registering them in the system, particularly in accuracy issues with handwritten character recognition and extraction of necessary information. Furthermore, notifications to users when deficiencies occurred were uniform, lacking flexible responses that took into account the user's emotional state.
[0763] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0764] In this invention, the server includes means for acquiring image information, means for performing character recognition processing, and means for extracting necessary data. This enables highly accurate character recognition and data extraction, as well as detailed notifications to the user based on their emotional state.
[0765] "Image information" refers to visual data digitized from paper application forms using a scanner.
[0766] "Optical character recognition (OCR) processing" is the process of extracting text information from image information using OCR technology.
[0767] "Text information" refers to digital character data obtained through character recognition processing.
[0768] "Required data" refers to the specific information items that the system needs to process.
[0769] "Formatting" means converting the extracted necessary data into the format required by the system.
[0770] "Validating validity" is the process of verifying whether formatted data is accurate and complete.
[0771] An "information management system" refers to a database or file system in which digital data is stored and managed.
[0772] "Emotional state" refers to a situation that demonstrates a user's psychological and emotional response.
[0773] "Adjusting notification content" means tailoring the information provided to users to their emotional state, using appropriate content and tone.
[0774] A description of embodiments for carrying out this invention will be given.
[0775] This invention aims to enable users to digitize paper application forms and effectively register them in a system. First, the user captures the paper application form as a digital image on a terminal using a dedicated scanner. A high-precision, general-purpose scanner is used.
[0776] The user then uploads the image data to the server using their device. An application on the device verifies that the file format is correct and performs any necessary conversions. The software used in this process includes common format conversion tools.
[0777] The server performs character recognition processing on the received image data using OCR software. This process utilizes an industry-standard OCR engine, converting handwritten and printed character information into text data. The server also automatically performs image noise reduction and character skew correction.
[0778] The server then uses machine learning algorithms and natural language processing techniques to extract the necessary data from the text data. The AI model used includes the latest natural language processing frameworks, specifically designed to accurately recognize information such as addresses and names.
[0779] Next, the server formats the extracted data into a specified format and then verifies the validity of the formatted data. Once verification is complete, data without errors is automatically registered in the information management system.
[0780] Finally, the server is equipped with an emotion engine that analyzes image and audio data to estimate the user's emotional state. Based on this emotion estimation, notifications to the user are tailored, and the system is designed to improve the user experience, especially when problems are detected.
[0781] For example, if a user scans an application form and uploads it at an angle, the server will automatically correct the content to ensure accurate data extraction. Furthermore, if a user shows difficulty during the upload process, the emotion engine will detect this and the server will provide guidance in a gentle tone.
[0782] A concrete example of a prompt message is, "Generate a notification message for when a user has scanned an application form incorrectly, in a way that is helpful even if the user is frustrated."
[0783] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0784] Step 1:
[0785] The user scans a paper application form using a dedicated scanner. The input is the paper application form, and the output is a high-resolution digital image. The scanner converts the scanned information into image data and saves it to the device. It is especially important to check the paper's position and resolution to ensure a clear scan.
[0786] Step 2:
[0787] The user uses a terminal to upload scanned image data to the server. The input is a digital image file, and the output is the image data stored on the server. The terminal checks the extension and size of the input file and converts it to the optimal format as needed. JPEG or PNG formats are common.
[0788] Step 3:
[0789] The server performs OCR processing on the received image data. The input is a scanned image, and the output is recognized text data. The server runs OCR software to convert handwritten or printed characters in the image into digital strings. This process automatically performs noise reduction and skew correction, achieving highly accurate character recognition.
[0790] Step 4:
[0791] The server extracts necessary data from recognized text data. The input is OCR-processed text data, and the output is specific data elements that have been extracted. The server uses machine learning algorithms and natural language processing techniques to accurately identify addresses, names, dates, etc., from the text. A generative AI model is used for analyzing the data elements.
[0792] Step 5:
[0793] The server formats the extracted data into a specified format and verifies its validity. The input consists of extracted data elements, and the output is the formatted data and its verification results. The server formats the data to conform to a defined database format. The validity of each data element is checked using a cross-check function, and if there are no problems, the process moves to the next stage.
[0794] Step 6:
[0795] The server analyzes image and audio data to estimate the user's emotional state. The input is the user's image and audio data, and the output is the estimated emotional state. The server then drives an emotion engine to adjust notification content based on the recognized emotion. If the user's emotion is irritation, the notification content is set to be more friendly and include more specific instructions.
[0796] Step 7:
[0797] The server registers formatted and validated data into the information management system. Validated data is taken as input, and registered data is obtained from the database as output. Once the registration process is complete, the data becomes officially usable within the system. Security protocols are applied to data storage during this process.
[0798] (Application Example 2)
[0799] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0800] Conventional data registration systems had problems such as users not receiving appropriate feedback when scanning and digitizing application forms, causing stress and delays in the registration process. Furthermore, the inefficient interface due to notifications that did not consider the user's emotional state was also a challenge.
[0801] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0802] In this invention, the server includes a device for acquiring image information, a device for processing encoded data recognition, and a device for estimating emotions and adjusting notification content. This makes it possible to provide appropriate feedback according to the user's emotional state and to proceed with the registration process efficiently.
[0803] "Image information" refers to data stored in a format that can be perceived visually, and is acquired by devices such as scanners and digital cameras.
[0804] "Encoded data recognition processing" is the process of converting characters and symbols contained in image information into a digital format, making them machine-readable data.
[0805] "Format data" refers to a collection of information generated by encoded data recognition processing, and is digital data that has a specific format and structure.
[0806] A "control device" is a system or platform used for inputting, processing, and outputting data.
[0807] "Processing" refers to the process of changing the format and structure of data according to the purpose, and adapting it to the format required by the control device.
[0808] "Appropriateness" is a concept that indicates whether data or information meets specific standards or conditions, and it serves as a criterion for determining whether or not to register it.
[0809] A "user" is an entity that operates a system or manages information through its interface, and is usually a person.
[0810] "Emotions" represent the user's psychological state and are internal reactions estimated through facial expressions and vocal characteristics.
[0811] "Feedback" refers to information or responses that a system outputs to a user, and is usually provided in real time.
[0812] The system for realizing this application is configured as follows: First, the terminal uses a dedicated device to acquire image information such as application forms. The hardware used includes scanners and smartphone cameras. The acquired image information is sent to a cloud server.
[0813] The server uses OCR (Optical Character Recognition) software to perform encoding data recognition processing on the received image information. This process generates formatting data and extracts the relevant information. To extract the necessary character information from the image as digital data, commonly used OCR processing services such as "Google Cloud Vision API" and "Tesseract OCR" are utilized.
[0814] Next, the server applies machine learning algorithms to extract the necessary information and process the data to conform to the control device's format. These algorithms include natural language processing techniques to formalize the information, and the formatted data is then validated for appropriateness. This verifies consistency, and data without issues is automatically registered in the system.
[0815] Simultaneously, the server integrates facial recognition technology (e.g., the OpenCV library) and a voice analysis API (e.g., IBM Watson) to operate a device that estimates emotions from the user's facial expressions and voice. This allows notifications and feedback to be tailored to the user's psychological state, providing appropriate guidance. For example, if the user indicates discomfort, the server will select feedback that uses gentle language to help resolve the problem.
[0816] In this way, users can operate smoothly and efficiently register information. An example of a prompt message for the generating AI model is: "Develop an algorithm that analyzes the user's current emotional state and provides optimal feedback. The five main emotions are joy, surprise, anger, sadness, and fear."
[0817] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0818] Step 1:
[0819] The terminal acquires image information, such as application forms, via a scanner used by the user or a camera on a smart device. The input is a physical application form, and the output is scanned image data. This image data is sent to a cloud server for subsequent digital processing.
[0820] Step 2:
[0821] The server performs OCR (Optical Character Recognition) processing on the received image data using OCR software. The input is image data, and the output is text data. OCR processing analyzes the characters within the image and converts them into digital character format. General-purpose OCR tools are used for this purpose.
[0822] Step 3:
[0823] The server applies machine learning algorithms to text data, extracts necessary information, and formats it into the control device's specified format. Input is text data generated by OCR, and output is formatted data. This process utilizes natural language processing techniques to analyze the text and identify important sections.
[0824] Step 4:
[0825] The server validates the formatted data for appropriateness. The input is the formatted data, and the output is either validated data for registration or an error message. Validation checks if the data meets the specified conditions; if there are no problems, the data is sent for registration.
[0826] Step 5:
[0827] If inappropriate data is detected, the server sends an error notification to the user and provides feedback. The input is the error message, and the output is the notification content to the user. The notification content includes instructions and advice on how to correct the operation.
[0828] Step 6:
[0829] Simultaneously, the server uses facial recognition technology and voice analysis to estimate the user's emotions. Input is real-time facial images and voice data, while output is an evaluation of the emotional state. Based on this, notifications and feedback are further refined according to the user's psychological state.
[0830] Step 7:
[0831] Users receive appropriate and reassuring guidance, enabling them to improve their operations. Input is feedback from the system, and output is the result of the user's new actions. This allows for the smooth completion of the registration process.
[0832] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0833] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0834] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0835] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0836] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0837] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0838] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0839] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0840] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0841] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0842] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0843] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0844] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0845] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0846] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0847] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0848] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0849] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0850] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0851] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0852] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0853] The following is further disclosed regarding the embodiments described above.
[0854] (Claim 1)
[0855] Means for acquiring image data,
[0856] Means for performing character recognition processing based on the aforementioned image data,
[0857] A means for extracting necessary information from the text data obtained by the aforementioned character recognition process,
[0858] Means for formatting the extracted information to conform to the system's format,
[0859] A means for verifying the validity of the formatted information,
[0860] A means for automatically registering information that has been determined to be free of problems through the aforementioned validation into the system,
[0861] A means of notifying the user when a deficiency is detected through the aforementioned validation process,
[0862] A system that includes this.
[0863] (Claim 2)
[0864] The system according to claim 1, wherein the character recognition process is performed using a machine learning algorithm.
[0865] (Claim 3)
[0866] The system according to claim 1, wherein the extraction of the necessary information is performed using natural language processing technology.
[0867] "Example 1"
[0868] (Claim 1)
[0869] A means of converting input documents into digital information,
[0870] means for identifying text information based on the aforementioned digital information,
[0871] A means for obtaining attribute data from the identified text information,
[0872] Means for adapting the acquired attribute data to a specified data structure,
[0873] Means for evaluating the consistency of the adapted data,
[0874] A means for automatically registering data that has been determined to be accurate by the aforementioned consistency evaluation into an information processing device,
[0875] A means for notifying the operator when a deficiency is detected in the aforementioned consistency evaluation,
[0876] A system that includes this.
[0877] (Claim 2)
[0878] The system according to claim 1, wherein the identification of the text information is performed using a data analysis algorithm.
[0879] (Claim 3)
[0880] The system according to claim 1, wherein the acquisition of the attribute data is performed using language processing technology.
[0881] "Application Example 1"
[0882] (Claim 1)
[0883] Means for acquiring image information,
[0884] Means for performing character recognition processing based on the aforementioned image information,
[0885] A means for extracting necessary data from the character information obtained by the aforementioned character recognition process,
[0886] Means for formatting the extracted data to conform to the format of the information processing device,
[0887] A means for verifying the validity of the formatted data,
[0888] A means for automatically registering data that has been determined to be free of problems through the validation process into an information processing device,
[0889] The aforementioned information processing device includes means for supporting digitalization using smart devices and wearable devices,
[0890] A means of notifying the user when a deficiency is detected through the aforementioned validation process,
[0891] A system that includes this.
[0892] (Claim 2)
[0893] The system according to claim 1, wherein the character recognition processing is performed using a machine learning method.
[0894] (Claim 3)
[0895] The system according to claim 1, wherein the extraction of the necessary information is performed using natural language processing technology.
[0896] "Example 2 of combining an emotion engine"
[0897] (Claim 1)
[0898] Means for acquiring image information,
[0899] Means for performing character recognition processing based on the aforementioned image information,
[0900] Means for extracting necessary data from text information obtained by the aforementioned character recognition process,
[0901] A means for formatting the extracted data to conform to a unified format,
[0902] A means for verifying the validity of the formatted data,
[0903] A means for automatically registering data that has been determined to be free of problems through the aforementioned validation into the information management system,
[0904] A means of notifying the user when a deficiency is detected through the aforementioned validation process,
[0905] A means for estimating the user's emotional state and adjusting the content of the notification,
[0906] A system that includes this.
[0907] (Claim 2)
[0908] The system according to claim 1, wherein the character recognition processing is performed using a machine learning method.
[0909] (Claim 3)
[0910] The system according to claim 1, wherein the extraction of the necessary data is performed using natural language processing technology.
[0911] "Application example 2 when combining with an emotional engine"
[0912] (Claim 1)
[0913] A device for acquiring image information,
[0914] A device that performs encoded data recognition processing based on the aforementioned image information,
[0915] A device for extracting necessary information from format data obtained by the above-mentioned encoded data recognition process,
[0916] A device for processing the extracted information to conform to the format of the control device,
[0917] A device for verifying the appropriateness of the processed information,
[0918] A device that automatically registers information that has been determined to be free of problems through the aforementioned appropriateness verification into the control device,
[0919] A device that notifies the user when a deficiency is detected through the aforementioned appropriateness verification,
[0920] A device that analyzes the user's facial expressions and estimates their emotions,
[0921] A device that adjusts the output content of the notification device according to the estimated emotion,
[0922] A system that includes this.
[0923] (Claim 2)
[0924] The system according to claim 1, wherein the encoded data recognition process is performed using a machine learning method.
[0925] (Claim 3)
[0926] The system according to claim 1, wherein the extraction of the necessary information is performed using language processing technology. [Explanation of Symbols]
[0927] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. Means for acquiring image information, Means for performing character recognition processing based on the aforementioned image information, A means for extracting necessary data from the character information obtained by the aforementioned character recognition process, Means for formatting the extracted data to conform to the format of the information processing device, A means for verifying the validity of the formatted data, A means for automatically registering data that has been determined to be free of problems through the validation process into an information processing device, The aforementioned information processing device includes means for supporting digitalization using smart devices and wearable devices, A means of notifying the user when a deficiency is detected through the aforementioned validation process, A system that includes this.
2. The system according to claim 1, wherein the character recognition processing is performed using a machine learning method.
3. The system according to claim 1, wherein the extraction of the necessary information is performed using natural language processing technology.