system
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-13
- Publication Date
- 2026-06-25
AI Technical Summary
Legal and intellectual property departments face inefficiencies in analyzing large volumes of contract documents and patent drawings, leading to human errors and insufficient risk management due to labor-intensive manual processes.
A system comprising document receiving, text extraction, risk identification, image analysis, report generation, and interaction means, with multilingual support, to automate document analysis and enhance risk management.
Enables efficient, accurate, and interactive document analysis across various formats and languages, supporting risk management and decision-making in legal and intellectual property departments.
Smart Images

Figure 2026104610000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In the legal department and the intellectual property department, it is necessary to manually analyze a large number of contract documents and patent drawings, which requires a lot of time and labor. Especially in organizations with limited personnel and resources, there are problems such as human errors and insufficient proper risk management. Therefore, there is a need for a method to improve the efficiency of document analysis and enhance the risk management ability.
Means for Solving the Problems
[0005] The present invention provides a system comprising a document receiving means, a text extraction means, a risk identification means, an image analysis means, a report generation means, and an interaction means. This enables efficient risk management by automatically analyzing documents and identifying risk items and important information. Furthermore, by determining the necessity of OCR processing using a format determination means and applying it when necessary, the system can handle various document formats, and by including a multilingual support means, it enables international use.
[0006] The term "document" refers to all types of document files, including contracts and patent drawings.
[0007] "Receiving means" refers to a device or program that has the function of importing documents uploaded by a user into the system.
[0008] "Text extraction means" refers to technology that identifies character information from a received document and extracts it as digital text.
[0009] "Risk identification methods" refer to technologies that analyze extracted text information and automatically extract and identify potential risk clauses contained within contracts and documents.
[0010] "Image analysis means" refers to technology that analyzes image information contained in a document and extracts important designs and information related to patents from it.
[0011] The "report generation method" is a function that summarizes the results in a user-friendly format based on the analyzed text and image information, and creates a report document.
[0012] "Dialogue means" refers to interactive technology that provides information based on analysis results in response to questions and requests from users.
[0013] The "determination means" is a function that automatically identifies the format of a document and selects the appropriate processing method according to that format.
[0014] "Multilingual support methods" refer to technologies that enable analysis results and system operation to be available in multiple languages. [Brief explanation of the drawing]
[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13]It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when the emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when the emotion engine is combined.
Embodiment for Carrying Out the Invention
[0016] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0017] First, the terms used in the following description will be explained.
[0018] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include CPU (Central Processing Unit), GPU (Graphics Processing Unit), GPGPU (General-Purpose computing on Graphics Processing Units), APU (Accelerated Processing Unit), etc.
[0019] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0020] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disk (e.g., hard disk), or magnetic tape, etc.
[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0023] [First Embodiment]
[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0036] This invention is a system for efficiently analyzing contracts and patent drawings handled by legal and intellectual property departments, and for promoting risk management. Specifically, the present invention can be implemented as follows.
[0037] Users using a terminal upload contracts or patent drawings they wish to have analyzed to the system. This sends the document files to the server.
[0038] The server begins processing the received document for analysis. It checks the document format and, if necessary, applies OCR (Optical Character Recognition) to extract text. This text extraction method ensures that all text, including text information within images, is converted into text without omission.
[0039] Next, the server analyzes the text extracted using text analysis tools with natural language processing technology. Here, important risk information is automatically extracted by identifying risk clauses within the contract. For example, important clauses such as "payment terms" and "contract termination" are identified.
[0040] In parallel, the server drives image analysis tools to analyze the patent drawings. Using computer vision technology, it identifies design elements within the images and extracts drawing features that may be relevant to patent examination. This analysis helps to eliminate design doubts and confirm similarities with competing patents.
[0041] The server aggregates the results of text and image analysis and generates user-friendly reports. This report generation method allows users to efficiently grasp important information. The reports include extracted risk information, design points of interest, and recommended countermeasures as needed.
[0042] Furthermore, users can interactively engage with the AI to ask questions about the analysis results and obtain additional information. This interactive method can deepen the user's understanding and support their decision-making.
[0043] This system, with its multilingual support on a single platform and configuration for international use, can be utilized in the risk management operations of many companies both domestically and internationally. For example, its effectiveness is demonstrated when companies conduct international trade agreements, analyzing contract documents in various languages and managing risks.
[0044] The following describes the processing flow.
[0045] Step 1:
[0046] The user operates the terminal, specifies the contracts or patent drawings they want to analyze, and uploads them to the server. The user selects files through the system interface and transfers the data to the server by clicking the send button.
[0047] Step 2:
[0048] The server receives the uploaded document. After receiving it, it automatically determines the document format (PDF, PNG, etc.) and, if necessary, extracts text from the image using OCR technology. This process makes all text information within the document available for analysis.
[0049] Step 3:
[0050] The server uses text analysis tools to perform a detailed analysis of the text data extracted through OCR processing. It employs natural language processing techniques to identify specific risk-related clauses from the contract and pinpoint the risk clauses.
[0051] Step 4:
[0052] The server simultaneously drives image analysis tools to analyze patent drawings. Using computer vision technology, it analyzes structures and design elements within the drawings, identifying key points. This process makes it possible to detect design flaws and potentially competing patents.
[0053] Step 5:
[0054] The server integrates the results of text and image analysis and generates a user-friendly report. The report generation mechanism organizes the analysis results in an easy-to-understand format and provides them to the user.
[0055] Step 6:
[0056] The user receives and verifies the report generated from their device. If necessary, the user can interact with the AI through the interface to request a detailed explanation of the analysis results or to view additional risk information.
[0057] Step 7:
[0058] The server will thoroughly store analysis results and user interaction history, and will be prepared to meet the diverse language needs of users by advancing multilingual support. At this stage, the analysis data will be managed in a way that allows for future use.
[0059] (Example 1)
[0060] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0061] Conventional document analysis systems performed text and image analysis separately, resulting in the loss of some information when integrating the results. Furthermore, analyzing international documents requiring multilingual support and optical character recognition presented challenges in balancing processing complexity and accuracy. Additionally, the lack of interactive information provision tools for efficiently utilizing analysis results made it difficult to effectively support user decision-making.
[0062] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0063] In this invention, the server includes receiving means for receiving information, discrimination means for determining the format of the information and applying optical character recognition as necessary, and multilingual support means using natural language processing technology to process the extracted character data. This enables highly accurate analysis regardless of the format of the information and can be used internationally. Furthermore, by integrating the analysis results and providing them interactively, it becomes possible to effectively support the user's decision-making.
[0064] A "receiving mechanism" is a function for acquiring information from the outside and incorporating it into the system.
[0065] A "character data extraction means" is a function that identifies a string of characters from received information and extracts it as electronic data.
[0066] A "risk factor identification method" is a function that analyzes extracted text data and automatically identifies potential risks and important clauses.
[0067] "Image analysis means" refers to a function that processes visual information and extracts important information and features contained within an image.
[0068] The "report generation method" refers to a function that integrates analysis results and creates a report in an easy-to-understand format.
[0069] A "means of dialogue" refers to a function that provides information interactively through interaction with the user, thereby deepening the user's understanding.
[0070] A "discrimination means" is a function that recognizes the format of received information and selects the appropriate processing method.
[0071] Optical Character Recognition (OCR) is a technology that analyzes character information contained in images and other data, and converts it into text data.
[0072] "Multilingual support means" refers to a function that converts analysis results into multiple languages to accommodate users who use different languages.
[0073] "Element identification means" refers to a function that identifies specific design elements from drawings and other information, and evaluates their similarities and relationships.
[0074] A "generative artificial intelligence model" is an advanced computer algorithm that generates and provides information based on user requests.
[0075] This invention uses an information system to efficiently analyze contracts and patent drawings, thereby supporting risk management. Specifically, the server, terminal, and user elements work together in coordination.
[0076] Users upload the documents they wish to analyze to the system via their terminal. This process sends the documents to the server. The server first uses recognition tools to verify the format of the received documents. If necessary, it extracts character data from the file using Optical Character Recognition (OCR) technology. Tesseract OCR is commonly used as the specific software for this purpose.
[0077] The server uses generative AI models and natural language processing (NLP) techniques on the extracted text data. This allows for the identification of risk elements, even from complex contract documents. Libraries such as SpaCy and BERT are useful for behavioral analysis and legal risk assessment.
[0078] Simultaneously, the server utilizes computer vision technology for drawing analysis, identifying design elements from patent drawings. OpenCV and TENSORFLOW® are used in this process. The server applies machine learning algorithms to identify key design structures and features, and compares them with other similar designs.
[0079] Once the analysis is complete, the server generates a report in a user-friendly format. This report often includes identified hazards, design considerations, and recommended countermeasures. The report is typically provided in PDF format or accessible through a web interface.
[0080] Furthermore, users can interact with the server using prompts related to the generated report. This helps them extract additional information and deepen their understanding of risk management. Examples of specific prompts include inquiries such as, "Please list the risk items in this contract," or "What are the distinctive design elements in the patent drawings?"
[0081] This system streamlines document analysis in legal and intellectual property departments and can handle multilingual international risk management tasks. A key feature of this invention is its flexible analysis capabilities across diverse document formats and information languages.
[0082] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0083] Step 1:
[0084] The user selects the documents to be analyzed on their terminal and uploads them to the system. Input files include contracts and patent drawings. The files are then sent to the server.
[0085] Step 2:
[0086] The server uses a discrimination mechanism to verify the format of the received document. The received document is the input, and the format determination result is the output. At this stage, it supports various file formats such as PDF, DOCX, and JPEG. Depending on the format, it sets a flag to apply OCR technology.
[0087] Step 3:
[0088] The server extracts character data using OCR technology depending on the format. The input is the document that has been identified and the OCR application flag, and the output is the extracted character data. Specifically, it performs optical character recognition to obtain text from image-based files.
[0089] Step 4:
[0090] The server analyzes the extracted text data using a generative AI model and natural language processing technology. The input is text data, and the output is a list of risk elements. This process identifies risk items such as "payment terms" and "contract termination" from contract documents.
[0091] Step 5:
[0092] The server analyzes patent drawings using image analysis technology. The input is the data of the patent drawing, and the output is identified design elements. Computer vision algorithms are used to identify design elements within the image and verify their similarity.
[0093] Step 6:
[0094] The server compiles the results of text and image analysis and generates a report using a report generation system. The input is the analysis results, and the output is the final report. The report includes extracted risk information and key design considerations.
[0095] Step 7:
[0096] The user receives the generated report and interacts with the server using prompts as needed. Input consists of user inquiries, and output consists of additional information and advice. Through this interaction, the user gains further insights into risk management.
[0097] (Application Example 1)
[0098] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0099] In today's business environment, quickly and accurately analyzing documents such as contracts and patent drawings, and assessing risks, is crucial for smooth business operations. However, manually analyzing vast amounts of documents is time-consuming, labor-intensive, and prone to errors. Furthermore, international transactions require multilingual support, adding further complexity. Therefore, there is a need for technologies that can efficiently solve these challenges on portable devices using automated systems.
[0100] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0101] In this invention, the server includes means for receiving data, means for extracting information from the received data, and means for identifying risk elements by analyzing the extracted information. This allows users to easily upload contracts and patent drawings from portable devices and quickly understand their risks. Furthermore, by providing multilingual report generation and interactive dialogue functions, it can be used in international business environments.
[0102] "Means of receiving data" refers to devices or programs that acquire information such as contracts and patent drawings provided by the user.
[0103] "Information extraction means" refers to a technology or device that extracts necessary text or image information from received data.
[0104] "Methods for identifying elements" refer to techniques or processes for analyzing extracted information to identify risks and critical elements.
[0105] "Visual analysis means" refers to a technology that analyzes images within a document and extracts important features and information related to patent examination.
[0106] A "report generation method" refers to a device or program that summarizes analysis results in a format that is easy for the user to understand.
[0107] A "means of dialogue" refers to a function or device that allows a user to communicate interactively with a system and obtain additional information.
[0108] "Means for operating on portable devices" refers to designs and programs that run on portable devices such as smartphones and tablets.
[0109] "Optical recognition processing" is a technology that extracts text from documents and images as digital data.
[0110] "Language support means" refers to a technology or program that makes analysis results available in multiple languages.
[0111] In the system implementing this invention, the user takes photos of contracts and patent drawings using a portable device such as a smartphone or tablet, and the data is received. The received data is then processed on the server as follows.
[0112] The server uses Python as software to determine the type of data received and, if necessary, performs optical recognition processing using Tesseract OCR. This extracts text information from the image. This information is further analyzed using natural language processing with spaCy to identify risks and important elements.
[0113] Subsequently, visual analysis is performed using OpenCV to extract important design elements from the image information of the patent data. The analysis results are generated in a user-friendly report format and sent to the user's device.
[0114] Users can also obtain additional information using a dialogue system powered by a generative AI model. This interactive dialogue feature allows users to ask questions about the analysis and obtain further information.
[0115] As a concrete example, there is a case where a businessman handling international transactions was able to take a picture of a new contract using his smartphone while on a business trip, perform a risk assessment, and take appropriate action on the spot. In this case, the following prompt can be used:
[0116] "Please analyze the new contract and identify the risk factors. Specifically, I'd like to know about important clauses regarding payment terms and contract termination. Please display the results in a report on my smartphone."
[0117] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0118] Step 1:
[0119] Users use a portable device to photograph the contracts and patent drawings they wish to analyze and upload them to the server via the device. The input is an image file of the contract or patent drawing, and the output is the unprocessed document data transferred to the server.
[0120] Step 2:
[0121] The server checks the format of the received data and performs optical recognition processing using Tesseract OCR as needed. The input is image data sent to the server, and the output is information converted into text by OCR. At this stage, data processing involves extracting character information from the image and generating text data.
[0122] Step 3:
[0123] The server analyzes the text data extracted using spaCy through natural language processing to identify risk elements in the contract. The input is text data obtained by OCR, and the output is the identified risk items. At this stage, keywords and risk-related context within the text are examined, and important elements are extracted.
[0124] Step 4:
[0125] The server uses OpenCV to perform image analysis on patent drawings and extract design elements and important information. The input is image data of the patent drawings, and the output is information on design elements related to patent examination. Here, the features of the images are analyzed, and useful information is organized as data.
[0126] Step 5:
[0127] The server combines the results of text and image analysis to generate a user-friendly report. The input consists of data on risk and design elements, while the output is a detailed analysis report. At this stage, the analysis results are integrated and organized and displayed in report form.
[0128] Step 6:
[0129] Users can use a dialogue function powered by a generative AI model to communicate interactively with the server and obtain additional information. Input is a prompt or question from the user, and output is the AI's answer or additional information. Through the dialogue, users can obtain even more detailed information.
[0130] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0131] This invention is a system that, in addition to analyzing documents, can recognize user emotions in real time and adjust the presentation of results and dialogue based on that emotional information. A specific embodiment is shown below.
[0132] Users upload contracts and patent drawings they wish to have analyzed via their terminal. The uploaded documents are sent to the server, which receives them using a receiving device.
[0133] The server first checks the format of the received document and applies OCR processing as needed. This extracts all the text information from the document.
[0134] After text extraction is complete, the server analyzes the text data to identify key risk items. Risk identification tools are then used to identify potential risks and deficiencies within the contract.
[0135] Furthermore, the server uses image analysis tools to analyze the design information contained in the patent drawings. It utilizes computer vision technology to identify important design elements and similarities with competing patents within the drawings.
[0136] These analysis results are integrated and organized by a report generation system and provided to the user in an easy-to-understand format. The report includes risk items, important design information, and related recommendations.
[0137] Furthermore, this system incorporates an emotion engine on the server that recognizes the user's emotions in real time as they review reports. This emotion engine analyzes the user's facial expressions and tone of voice to infer their current emotional state.
[0138] Based on the user's perceived emotions, the server dynamically adjusts the presentation of reports and the tone of dialogue. For example, if the user is stressed, the results will be presented more concisely and positively to provide reassurance. Furthermore, emotional feedback allows for flexible modification of responses through dialogue channels, improving the user experience.
[0139] In this way, by realizing sophisticated interactions that incorporate emotions, it is possible to promote user understanding and maximize the utilization of analysis results. For example, when using the system in the context of international contract negotiations, it becomes possible to quickly grasp legal risks while simultaneously responding in a way that takes into account the emotions of the person in charge in the other country.
[0140] The following describes the processing flow.
[0141] Step 1:
[0142] The user uses a terminal to select the contracts and patent drawings to be analyzed and uploads them to the system. The user then specifies the relevant file from the operation interface and presses the send button, at which point the document is transferred to the server.
[0143] Step 2:
[0144] The server receives the uploaded document via a receiving device. It determines the format of the received document (PDF, JPEG, etc.) and, if necessary, extracts the text from the document using OCR technology. This allows text information to be obtained from the image.
[0145] Step 3:
[0146] The server uses text extraction methods to analyze the extracted text data. Natural language processing techniques are used to identify risk items within the contract. In this process, risk clauses and inappropriate conditions are identified through machine learning algorithms.
[0147] Step 4:
[0148] The server uses image analysis techniques to analyze patent drawings. Computer vision is used to identify design features and evaluate the novelty and similarity of the patent. This clarifies the relevant technical elements.
[0149] Step 5:
[0150] The server generates a comprehensive report based on the analysis results using a report generation system. The report includes extracted risk clauses, image analysis results, and proposed action plans. This report is designed to support user decision-making.
[0151] Step 6:
[0152] The user receives and reviews a report generated through their device. Simultaneously, the emotion engine recognizes the user's emotions. Using the device's camera and microphone, it analyzes the user's facial expressions and tone of voice to evaluate their emotional state in real time.
[0153] Step 7:
[0154] The server adjusts the presentation of reports based on information from the emotion engine, according to the user's emotions. For example, if anxiety is detected, the content is modified to emphasize positive language and provide a sense of reassurance. Interactive dialogue content is also changed based on the emotional state.
[0155] Step 8:
[0156] Based on reports tailored to their emotions, users can consider necessary actions and engage in further dialogue with the AI through the server. This allows users to make the most of the analysis results and supports quick and effective decision-making.
[0157] (Example 2)
[0158] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0159] Traditional systems only analyze information within documents and do not provide information based on the user's emotions. As a result, users may experience stress and anxiety. Furthermore, providing analysis results in different languages is difficult, limiting global use.
[0160] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0161] In this invention, the server includes means for receiving information, means for recognizing the user's emotional state and adjusting the analysis results, and means for providing the analysis results in multiple languages. This enables information provision that takes the user's emotions into consideration and allows for global use.
[0162] "Means of receiving information" refers to the function of allowing a server to receive input information from an external source.
[0163] "Means for extracting character information" refers to a function that processes information to obtain string data from received information.
[0164] "Means for identifying hazards" refers to a function that analyzes extracted textual information to identify potential risks.
[0165] "Means for analyzing visual information" refers to functions for detecting and analyzing important elements from image data contained in information.
[0166] "Means of generating and providing reports" refers to a function that organizes analysis results and outputs them in a format that is easy for users to understand.
[0167] "Means for recognizing the user's emotional state" refers to a function that analyzes the user's emotions in real time and provides information appropriate to that situation.
[0168] "Means for interacting with users, providing additional information, and coordinating feedback" refers to functions that interact with users, provide additional information as needed, and adjust the content.
[0169] "Means for performing character recognition processing" refers to a function that applies OCR technology to extract characters according to the format of the received document.
[0170] "Means of providing analysis results in multiple languages" refers to a function that translates the analyzed information into multiple languages and presents it to the user.
[0171] This invention provides a system that allows users to easily analyze documents and receive emotion-based feedback. Specifically, users upload the documents they wish to have analyzed to the system using a terminal. These documents can be in the format of contracts or patent drawings.
[0172] First, the server receives the document using a receiving device and determines its format. If necessary, it extracts text information using OCR software (e.g., Tesseract). This process allows text information to be obtained even from documents uploaded in image format.
[0173] Next, the server performs analysis on the extracted text data using natural language processing. This makes it possible to identify risk items hidden in the contract and identify potential dangers. In addition, computer vision technology (e.g., OpenCV) is used to analyze design elements in patent drawings and their similarities to competing technologies.
[0174] The server generates a report based on the analysis results and provides it to users in an easy-to-understand manner through its multilingual support function. This ensures that users from different language regions can fully understand the analysis results.
[0175] Furthermore, the server uses the user's device camera and microphone to recognize the user's emotions in real time. By analyzing facial expressions and voice tone, it infers how the user is receiving the information. Based on this information, the server dynamically adjusts the presentation of reports and the dialogue style to ensure the user is comfortable receiving the information.
[0176] For example, using this system in international contract negotiations allows for the rapid identification of legal risks and facilitates smoother negotiations while taking into account the feelings of the other party's representatives.
[0177] An example of a prompt message would be, "Please identify the main risk items in this contract and suggest improvements if there are any deficiencies. Also, please summarize the key points included in the patent drawings." This allows the user to communicate specific analysis requests to the system through prompt messages.
[0178] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0179] Step 1:
[0180] The user uses a terminal to select the documents they wish to analyze and upload them to the system. The input files are often in PDF or image format, and this information is sent to the server. The server receives the documents and prepares for the next processing step.
[0181] Step 2:
[0182] The server determines the format of the received document. Based on this determination, it uses OCR software to extract text information from images or PDFs if necessary. The input is an image or PDF, and the output is text data. This process extracts character data from visual information.
[0183] Step 3:
[0184] The server analyzes the extracted text data and uses natural language processing techniques to identify key risk items. The input is text data, and the output is analytical information including identified risk items and potential hazards. Processing includes keyword extraction and pattern recognition.
[0185] Step 4:
[0186] The server analyzes images contained in received documents and uses computer vision technology to find important design elements and similarities. The input is image data, and the output is the analysis results of design information and similarity with competing technologies. Feature points are extracted and analyzed using an image processing library.
[0187] Step 5:
[0188] The server generates a report based on the analyzed data and provides it to the user. At this time, it utilizes multilingual support to create reports translated into different languages. The input is analyzed information, and the output is a report in a format that is easy for the user to understand. The report generation system systematically organizes the information.
[0189] Step 6:
[0190] The server uses the device's camera and microphone to analyze facial expressions and voice in real time to recognize the user's emotional state. The input is the user's voice and video data, and the output is an estimate of the user's emotional state. This allows the information provided to be tailored based on the user's current emotions.
[0191] Step 7:
[0192] The server appropriately adjusts the presentation of materials and the content of dialogue based on the user's emotional state. The input is the result of an analysis of the emotional state, and the output is the adjusted information presentation and dialogue content. This optimizes the user experience and provides situation-appropriate feedback.
[0193] (Application Example 2)
[0194] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0195] This invention relates to an interactive system that combines document analysis and user emotion recognition. Conventional systems fail to adequately optimize the user experience because they do not consider user emotions when providing document analysis results. Therefore, there is a challenge in realizing flexible dialogue and information provision based on user emotions and improving convenience for users.
[0196] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0197] In this invention, the server includes means for receiving information, means for extracting textual information from the received information, means for analyzing the extracted textual information and identifying risk elements, means for analyzing diagrams and extracting important elements, means for generating and providing the analysis results as a report, means for providing additional information through dialogue with the user, and emotion recognition means for estimating the user's emotional state and adjusting the content of the dialogue accordingly. This enables the provision of information and dynamic adjustment of dialogue content in accordance with the user's emotions.
[0198] "Means of receiving information" refers to devices or software that have the function of acquiring data or documents transmitted electronically from an external source and enabling processing within the system.
[0199] "Means for extracting textual information" refers to a device or program that has the function of recognizing textual characters from received documents or images and extracting them as digital data.
[0200] "Means for identifying risk factors" refers to devices or software that have the function of analyzing extracted textual information and identifying risks or defects hidden within a document.
[0201] "Means for analyzing diagrams and extracting important elements" refers to a device or software that has the function of extracting important data such as design information and similarity by analyzing the graphic information contained in a document.
[0202] "Means of generating and providing a report" refers to a device or program that has the function of organizing and integrating analysis results and presenting them as information in a format that is easy for users to understand.
[0203] "Means of providing additional information through interaction with users" refers to devices or software that have the function of enabling a system to communicate interactively with users and appropriately provide necessary information and advice.
[0204] "Emotion recognition means" refers to a device or program that analyzes the user's facial expressions and tone of voice, infers their emotional state, and appropriately adjusts the content of the dialogue and the presentation of information based on that.
[0205] The system that realizes this invention relies primarily on three elements: a server, a terminal, and a user. The server plays a central role, handling various processes such as receiving information, extracting text information, identifying risk elements, analyzing diagrams, extracting important elements, and generating and providing the results as a report.
[0206] The server processes images from received documents using OpenCV and performs OCR on text using the Google Cloud Vision API. Furthermore, it has the capability to recognize user emotions in real time from facial expressions and voice using TensorFlow. This enables flexible dialogue and information provision based on the user's emotional state.
[0207] The terminal functions as a user input device, uploading and receiving information, and collecting data using the camera and microphone. This data is sent to a server, where it is analyzed and information is provided. Users can interact with the system through the terminal and obtain any additional information they need.
[0208] As a concrete example, if a user is interested in a particular product, the server will display a detailed description of that product and related products, adjusting the way information is presented based on the user's response. For instance, if it is determined that the user is hesitant about purchasing, the server may highlight the product's advantages and reviews from other users.
[0209] The generative AI model is used to further optimize information delivery using prompt messages. The following is an example of such a prompt message.
[0210] "Please enter the name of the product that the customer is considering purchasing. Considering the product's characteristics, please generate a reassuring description."
[0211] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0212] Step 1:
[0213] The terminal receives input from the user and uploads documents and information to the server. Input may include contracts and product information. The terminal receives this data and sends it to the server. The output is the data file sent to the server.
[0214] Step 2:
[0215] The server determines the format of the received document and performs OCR processing using the Google Cloud Vision API as needed. The input is an unformatted data file, and the output is data with the text extracted. The server converts this data into digital text and prepares it for analysis.
[0216] Step 3:
[0217] The server uses OpenCV to analyze images within a document and extract important elements. The input is a document containing image data, and the output is information about the important elements obtained from the image analysis. The server identifies the image information and performs further analysis based on it.
[0218] Step 4:
[0219] The server uses TensorFlow to analyze face and voice data sent from the device and estimate the user's emotional state in real time. The input is face and voice data sent from the device, and the output is the estimated emotional state. Based on the emotion recognition, the server prepares to adjust the dialogue.
[0220] Step 5:
[0221] The server uses natural language processing technologies such as text blobs to analyze textual information and identify risk factors and recommendations. Input consists of textual information and all data obtained from analyzed images and sentiment data. Based on this, the server generates and outputs a report.
[0222] Step 6:
[0223] The server utilizes a generative AI model to provide optimal information using prompt messages. For example, it might create prompts such as, "Please enter the name of a product the customer is considering purchasing. Considering the product's characteristics, generate a reassuring description." The output is user-optimized information.
[0224] Step 7:
[0225] The user reviews the report through their device and asks additional questions or engages in dialogue as needed. The server receives real-time emotional feedback from the user and dynamically adjusts the dialogue to achieve the optimal user experience. The input is the user's feedback, and the output is the optimized dialogue.
[0226] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0227] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0228] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0229] [Second Embodiment]
[0230] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0231] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0232] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0233] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0234] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0235] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0236] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0237] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0238] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0239] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0240] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0241] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0242] This invention is a system for efficiently analyzing contracts and patent drawings handled by legal and intellectual property departments, and for promoting risk management. Specifically, the present invention can be implemented as follows.
[0243] Users using a terminal upload contracts or patent drawings they wish to have analyzed to the system. This sends the document files to the server.
[0244] The server begins processing the received document for analysis. It checks the document format and, if necessary, applies OCR (Optical Character Recognition) to extract text. This text extraction method ensures that all text, including text information within images, is converted into text without omission.
[0245] Next, the server analyzes the text extracted using text analysis tools with natural language processing technology. Here, important risk information is automatically extracted by identifying risk clauses within the contract. For example, important clauses such as "payment terms" and "contract termination" are identified.
[0246] In parallel, the server drives image analysis tools to analyze the patent drawings. Using computer vision technology, it identifies design elements within the images and extracts drawing features that may be relevant to patent examination. This analysis helps to eliminate design doubts and confirm similarities with competing patents.
[0247] The server aggregates the results of text and image analysis and generates user-friendly reports. This report generation method allows users to efficiently grasp important information. The reports include extracted risk information, design points of interest, and recommended countermeasures as needed.
[0248] Furthermore, users can interactively engage with the AI to ask questions about the analysis results and obtain additional information. This interactive method can deepen the user's understanding and support their decision-making.
[0249] This system, with its multilingual support on a single platform and configuration for international use, can be utilized in the risk management operations of many companies both domestically and internationally. For example, its effectiveness is demonstrated when companies conduct international trade agreements, analyzing contract documents in various languages and managing risks.
[0250] The following describes the processing flow.
[0251] Step 1:
[0252] The user operates the terminal, specifies the contracts or patent drawings they want to analyze, and uploads them to the server. The user selects files through the system interface and transfers the data to the server by clicking the send button.
[0253] Step 2:
[0254] The server receives the uploaded document. After receiving it, it automatically determines the document format (PDF, PNG, etc.) and, if necessary, extracts text from the image using OCR technology. This process makes all text information within the document available for analysis.
[0255] Step 3:
[0256] The server uses text analysis tools to perform a detailed analysis of the text data extracted by OCR processing. It employs natural language processing techniques to identify specific risk-related clauses from the contract and pinpoint the risk clauses.
[0257] Step 4:
[0258] The server simultaneously drives image analysis tools to analyze patent drawings. Using computer vision technology, it analyzes structures and design elements within the drawings, identifying key points. This process makes it possible to detect design flaws and potentially competing patents.
[0259] Step 5:
[0260] The server integrates the results of text and image analysis and generates a user-friendly report. The report generation mechanism organizes the analysis results in an easy-to-understand format and provides them to the user.
[0261] Step 6:
[0262] The user receives and verifies the report generated from their device. If necessary, the user can interact with the AI through the interface to request a detailed explanation of the analysis results or to view additional risk information.
[0263] Step 7:
[0264] The server will thoroughly store analysis results and user interaction history, and will be prepared to meet the diverse language needs of users by advancing multilingual support. At this stage, the analysis data will be managed in a way that allows for future use.
[0265] (Example 1)
[0266] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0267] Conventional document analysis systems performed text and image analysis separately, resulting in the loss of some information when integrating the results. Furthermore, analyzing international documents requiring multilingual support and optical character recognition presented challenges in balancing processing complexity and accuracy. Additionally, the lack of interactive information provision tools for efficiently utilizing analysis results made it difficult to effectively support user decision-making.
[0268] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0269] In this invention, the server includes receiving means for receiving information, discrimination means for determining the format of the information and applying optical character recognition as necessary, and multilingual support means using natural language processing technology to process the extracted character data. This enables highly accurate analysis regardless of the format of the information and can be used internationally. Furthermore, by integrating the analysis results and providing them interactively, it becomes possible to effectively support the user's decision-making.
[0270] A "receiving mechanism" is a function for acquiring information from an external source and incorporating it into the system.
[0271] A "character data extraction means" is a function that identifies a string of characters from received information and extracts it as electronic data.
[0272] A "risk factor identification method" is a function that analyzes extracted text data and automatically identifies potential risks and important clauses.
[0273] "Image analysis means" refers to a function that processes visual information and extracts important information and features contained within an image.
[0274] The "report generation method" refers to a function that integrates analysis results and creates a report in an easy-to-understand format.
[0275] A "means of dialogue" refers to a function that provides information interactively through interaction with the user, thereby deepening the user's understanding.
[0276] A "discrimination means" is a function that recognizes the format of received information and selects the appropriate processing method.
[0277] Optical Character Recognition (OCR) is a technology that analyzes character information contained in images and other data, and converts it into text data.
[0278] "Multilingual support means" refers to a function that converts analysis results into multiple languages to accommodate users who use different languages.
[0279] "Element identification means" refers to a function that identifies specific design elements from drawings and other information, and evaluates their similarities and relationships.
[0280] A "generative artificial intelligence model" is an advanced computer algorithm that generates and provides information based on user requests.
[0281] In this invention, an information system is used to efficiently analyze contract documents and patent drawings to support risk management. Specifically, each element of the server, terminal, and user operates in cooperation.
[0282] The user uploads the document to be analyzed to the system through the terminal. Through this operation, the document is sent to the server. The server first checks the format of the received document using discrimination means. If necessary, optical character recognition (OCR) technology is used to extract character data from the file. As specific software, it is common to use Tesseract OCR.
[0283] The server uses a generative AI model and natural language processing (NLP) technology for the extracted character data. As a result, risk factors are identified even from particularly complex contract documents. Libraries such as SpaCy and BERT are applicable as functions useful for behavior analysis and legal risk assessment.
[0284] At the same time, the server makes full use of computer vision technology for drawing analysis to identify design elements from patent drawings. In this process, OpenCV and TensorFlow are utilized. The server applies machine learning algorithms to identify important design structures and features and compare them with other similar designs.
[0285] When the analysis is completed, the server generates a report in a format that is easy for the user to understand. This report often includes the extracted risk factors, design highlights, and recommended countermeasures. The report is usually provided in PDF format or can be viewed through a web interface.
[0286] Furthermore, the user can interact with the server using prompt sentences related to the generated report, which helps to elicit additional information and deepen the understanding of risk management. Examples of specific prompt sentences include inquiries such as "Please list the risk items in this contract" and "What are the characteristic design elements in the patent drawings?"
[0287] This system enables the efficient analysis of documents in the legal and intellectual property departments and can also handle international risk management operations in multiple languages. The ability to flexibly analyze various document formats and information languages is a major feature of this invention.
[0288] The flow of the specific process in Example 1 will be described using FIG. 11.
[0289] Step 1:
[0290] The user selects the document to be analyzed on the terminal and uploads it to the system. The inputs include files of contracts and patent drawings. As a result, the files are sent to the server.
[0291] Step 2:
[0292] The server uses discriminant means to check the format of the received document. The input is the received document, and the output is the result of the format determination. At this stage, it supports various file formats such as PDF, DOCX, and JPEG. According to the format, an operation of setting a flag for applying OCR technology is performed.
[0293] Step 3:
[0294] The server extracts character data using OCR technology according to the format. The inputs are the determined document and the OCR application flag, and the output is the extracted character data. Specifically, optical character recognition for obtaining text from image-based files is executed.
[0295] Step 4:
[0296] The server analyzes the extracted text data using a generative AI model and natural language processing technology. The input is text data, and the output is a list of risk elements. This process identifies risk items such as "payment terms" and "contract termination" from contract documents.
[0297] Step 5:
[0298] The server analyzes patent drawings using image analysis technology. The input is the data of the patent drawing, and the output is identified design elements. Computer vision algorithms are used to identify design elements within the image and verify their similarity.
[0299] Step 6:
[0300] The server compiles the results of text and image analysis and generates a report using a report generation system. The input is the analysis results, and the output is the final report. The report includes extracted risk information and key design considerations.
[0301] Step 7:
[0302] The user receives the generated report and interacts with the server using prompts as needed. Input consists of user inquiries, and output consists of additional information and advice. Through this interaction, the user gains further insights into risk management.
[0303] (Application Example 1)
[0304] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0305] In a modern business environment, it is important to quickly and accurately analyze documents such as contracts and patent drawings and evaluate risks for smooth business operations. However, manually analyzing a huge amount of documents requires time and effort and can also cause mistakes. In addition, since multilingual support is required in international transactions, it further increases complexity. Therefore, there is a need for technology that can efficiently solve these problems on a portable device by an automated system.
[0306] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0307] In this invention, the server includes means for receiving data, information extraction means for extracting information from the received data, and element identification means for analyzing the extracted information and identifying risk factors. As a result, the user can easily upload contracts and patent drawings from a portable device and quickly grasp the risks. In addition, by providing report generation in multiple languages and an interactive dialogue function, it can also be used in an international business environment.
[0308] The "means for receiving data" is a device or program that acquires information such as contracts and patent drawings provided by the user.
[0309] The "information extraction means" is a technology or device that extracts necessary text and image information from the received data.
[0310] The "element identification means" is a technology or process for analyzing the extracted information and identifying risks and important elements.
[0311] The "visual analysis means" is a technology that analyzes images in a document and extracts important features and information related to patent examination.
[0312] The "report generation means" is a device or program that summarizes the results in an easy-to-understand form for the user based on the analysis results.
[0313] A "means of dialogue" refers to a function or device that allows a user to communicate interactively with a system and obtain additional information.
[0314] "Means for operating on portable devices" refers to designs and programs that run on portable devices such as smartphones and tablets.
[0315] "Optical recognition processing" is a technology that extracts text from documents and images as digital data.
[0316] "Language support means" refers to a technology or program that makes analysis results available in multiple languages.
[0317] In the system implementing this invention, the user takes photos of contracts and patent drawings using a portable device such as a smartphone or tablet, and the data is received. The received data is then processed on the server as follows.
[0318] The server uses Python as software to determine the type of data received and, if necessary, performs optical recognition processing using Tesseract OCR. This extracts text information from the image. This information is further analyzed using natural language processing with spaCy to identify risks and important elements.
[0319] Subsequently, visual analysis is performed using OpenCV to extract important design elements from the image information of the patent data. The analysis results are generated in a user-friendly report format and sent to the user's device.
[0320] Users can also obtain additional information using a dialogue system powered by a generative AI model. This interactive dialogue feature allows users to ask questions about the analysis and obtain further information.
[0321] As a concrete example, there is a case where a businessman handling international transactions was able to take a picture of a new contract using his smartphone while on a business trip, perform a risk assessment, and take appropriate action on the spot. In this case, the following prompt can be used:
[0322] "Please analyze the new contract and identify the risk factors. Specifically, I'd like to know about important clauses regarding payment terms and contract termination. Please display the results in a report on my smartphone."
[0323] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0324] Step 1:
[0325] Users use a portable device to photograph the contracts and patent drawings they wish to analyze and upload them to the server via the device. The input is an image file of the contract or patent drawing, and the output is the unprocessed document data transferred to the server.
[0326] Step 2:
[0327] The server checks the format of the received data and performs optical recognition processing using Tesseract OCR as needed. The input is image data sent to the server, and the output is information converted into text by OCR. At this stage, data processing involves extracting character information from the image and generating text data.
[0328] Step 3:
[0329] The server analyzes the text data extracted using spaCy through natural language processing to identify risk elements in the contract. The input is text data obtained by OCR, and the output is the identified risk items. At this stage, keywords and risk-related context within the text are examined, and important elements are extracted.
[0330] Step 4:
[0331] The server uses OpenCV to perform image analysis on patent drawings and extract design elements and important information. The input is image data of the patent drawings, and the output is information on design elements related to patent examination. Here, the features of the images are analyzed, and useful information is organized as data.
[0332] Step 5:
[0333] The server combines the results of text and image analysis to generate a user-friendly report. The input consists of data on risk and design elements, while the output is a detailed analysis report. At this stage, the analysis results are integrated and organized and displayed in report form.
[0334] Step 6:
[0335] Users can use a dialogue function powered by a generative AI model to communicate interactively with the server and obtain additional information. Input is a prompt or question from the user, and output is the AI's answer or additional information. Through the dialogue, users can obtain even more detailed information.
[0336] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0337] This invention is a system that, in addition to analyzing documents, can recognize user emotions in real time and adjust the presentation of results and dialogue based on that emotional information. A specific embodiment is shown below.
[0338] Users upload contracts and patent drawings they wish to have analyzed via their terminal. The uploaded documents are sent to the server, which receives them using a receiving device.
[0339] The server first checks the format of the received document and applies OCR processing as needed. This extracts all the text information from the document.
[0340] After text extraction is complete, the server analyzes the text data to identify key risk items. Risk identification tools are then used to identify potential risks and deficiencies within the contract.
[0341] Furthermore, the server uses image analysis tools to analyze the design information contained in the patent drawings. It utilizes computer vision technology to identify important design elements and similarities with competing patents within the drawings.
[0342] These analysis results are integrated and organized by a report generation system and provided to the user in an easy-to-understand format. The report includes risk items, important design information, and related recommendations.
[0343] Furthermore, this system incorporates an emotion engine on the server that recognizes the user's emotions in real time as they review reports. This emotion engine analyzes the user's facial expressions and tone of voice to infer their current emotional state.
[0344] Based on the user's perceived emotions, the server dynamically adjusts the presentation of reports and the tone of dialogue. For example, if the user is stressed, the results will be presented more concisely and positively to provide reassurance. Furthermore, emotional feedback allows for flexible modification of responses through dialogue channels, improving the user experience.
[0345] In this way, by realizing sophisticated interactions that incorporate emotions, it is possible to promote user understanding and maximize the utilization of analysis results. For example, when using the system in the context of international contract negotiations, it becomes possible to quickly grasp legal risks while simultaneously responding in a way that takes into account the emotions of the person in charge in the other country.
[0346] The following describes the processing flow.
[0347] Step 1:
[0348] The user uses a terminal to select the contracts and patent drawings to be analyzed and uploads them to the system. The user then specifies the relevant file from the operation interface and presses the send button, at which point the document is transferred to the server.
[0349] Step 2:
[0350] The server receives the uploaded document via a receiving device. It determines the format of the received document (PDF, JPEG, etc.) and, if necessary, extracts the text from the document using OCR technology. This allows text information to be obtained from the image.
[0351] Step 3:
[0352] The server uses text extraction methods to analyze the extracted text data. Natural language processing techniques are used to identify risk items within the contract. In this process, risk clauses and inappropriate conditions are identified through machine learning algorithms.
[0353] Step 4:
[0354] The server uses image analysis techniques to analyze patent drawings. Computer vision is used to identify design features and evaluate the novelty and similarity of the patent. This clarifies the relevant technical elements.
[0355] Step 5:
[0356] The server generates a comprehensive report based on the analysis results using a report generation system. The report includes extracted risk clauses, image analysis results, and proposed action plans. This report is designed to support user decision-making.
[0357] Step 6:
[0358] The user receives and reviews a report generated through their device. Simultaneously, the emotion engine recognizes the user's emotions. Using the device's camera and microphone, it analyzes the user's facial expressions and tone of voice to evaluate their emotional state in real time.
[0359] Step 7:
[0360] The server adjusts the presentation of reports based on information from the emotion engine, according to the user's emotions. For example, if anxiety is detected, the content is modified to emphasize positive language and provide a sense of reassurance. Interactive dialogue content is also changed based on the emotional state.
[0361] Step 8:
[0362] Based on reports tailored to their emotions, users can consider necessary actions and engage in further dialogue with the AI through the server. This allows users to make the most of the analysis results and supports quick and effective decision-making.
[0363] (Example 2)
[0364] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0365] Traditional systems only analyze information within documents and do not provide information based on the user's emotions. As a result, users may experience stress and anxiety. Furthermore, providing analysis results in different languages is difficult, limiting global use.
[0366] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0367] In this invention, the server includes means for receiving information, means for recognizing the user's emotional state and adjusting the analysis results, and means for providing the analysis results in multiple languages. This enables information provision that takes the user's emotions into consideration and allows for global use.
[0368] "Means of receiving information" refers to the function of allowing a server to receive input information from an external source.
[0369] "Means for extracting character information" refers to a function that processes information to obtain string data from received information.
[0370] "Means for identifying hazards" refers to a function that analyzes extracted textual information to identify potential risks.
[0371] "Means for analyzing visual information" refers to functions for detecting and analyzing important elements from image data contained in information.
[0372] "Means of generating and providing reports" refers to a function that organizes analysis results and outputs them in a format that is easy for users to understand.
[0373] "Means for recognizing the user's emotional state" refers to a function that analyzes the user's emotions in real time and provides information appropriate to that situation.
[0374] "Means for interacting with users, providing additional information, and coordinating feedback" refers to functions that interact with users, provide additional information as needed, and adjust the content.
[0375] "Means for performing character recognition processing" refers to a function that applies OCR technology to extract characters according to the format of the received document.
[0376] "Means of providing analysis results in multiple languages" refers to a function that translates the analyzed information into multiple languages and presents it to the user.
[0377] This invention provides a system that allows users to easily analyze documents and receive emotion-based feedback. Specifically, users upload the documents they wish to have analyzed to the system using a terminal. These documents can be in the format of contracts or patent drawings.
[0378] First, the server receives the document using a receiving device and determines its format. If necessary, it extracts text information using OCR software (e.g., Tesseract). This process allows text information to be obtained even from documents uploaded in image format.
[0379] Next, the server performs analysis on the extracted text data using natural language processing. This makes it possible to identify risk items hidden in the contract and identify potential dangers. In addition, computer vision technology (e.g., OpenCV) is used to analyze design elements in patent drawings and their similarities to competing technologies.
[0380] The server generates a report based on the analysis results and provides it to users in an easy-to-understand manner through its multilingual support function. This ensures that users from different language regions can fully understand the analysis results.
[0381] Furthermore, the server uses the user's device camera and microphone to recognize the user's emotions in real time. By analyzing facial expressions and voice tone, it infers how the user is receiving the information. Based on this information, the server dynamically adjusts the presentation of reports and the dialogue style to ensure the user is comfortable receiving the information.
[0382] For example, using this system in international contract negotiations allows for the rapid identification of legal risks and facilitates smoother negotiations while taking into account the feelings of the other party's representatives.
[0383] An example of a prompt message would be, "Please identify the main risk items in this contract and suggest improvements if there are any deficiencies. Also, please summarize the key points included in the patent drawings." This allows the user to communicate specific analysis requests to the system through prompt messages.
[0384] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0385] Step 1:
[0386] The user uses a terminal to select the documents they wish to analyze and upload them to the system. The input files are often in PDF or image format, and this information is sent to the server. The server receives the documents and prepares for the next processing step.
[0387] Step 2:
[0388] The server determines the format of the received document. Based on this determination, it uses OCR software to extract text information from images or PDFs if necessary. The input is an image or PDF, and the output is text data. This process extracts character data from visual information.
[0389] Step 3:
[0390] The server analyzes the extracted text data and uses natural language processing techniques to identify key risk items. The input is text data, and the output is analytical information including identified risk items and potential hazards. Processing includes keyword extraction and pattern recognition.
[0391] Step 4:
[0392] The server analyzes images contained in received documents and uses computer vision technology to find important design elements and similarities. The input is image data, and the output is the analysis results of design information and similarity with competing technologies. Feature points are extracted and analyzed using an image processing library.
[0393] Step 5:
[0394] The server generates a report based on the analyzed data and provides it to the user. At this time, it utilizes multilingual support to create reports translated into different languages. The input is analyzed information, and the output is a report in a format that is easy for the user to understand. The report generation system systematically organizes the information.
[0395] Step 6:
[0396] The server uses the device's camera and microphone to analyze facial expressions and voice in real time to recognize the user's emotional state. The input is the user's voice and video data, and the output is an estimate of the user's emotional state. This allows the information provided to be tailored based on the user's current emotions.
[0397] Step 7:
[0398] The server appropriately adjusts the presentation of materials and the content of dialogue based on the user's emotional state. The input is the result of an analysis of the emotional state, and the output is the adjusted information presentation and dialogue content. This optimizes the user experience and provides situation-appropriate feedback.
[0399] (Application Example 2)
[0400] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".
[0401] This invention relates to an interactive system that combines document analysis and user emotion recognition. Conventional systems fail to adequately optimize the user experience because they do not consider user emotions when providing document analysis results. Therefore, there is a challenge in realizing flexible dialogue and information provision based on user emotions and improving convenience for users.
[0402] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0403] In this invention, the server includes means for receiving information, means for extracting textual information from the received information, means for analyzing the extracted textual information and identifying risk elements, means for analyzing diagrams and extracting important elements, means for generating and providing the analysis results as a report, means for providing additional information through dialogue with the user, and emotion recognition means for estimating the user's emotional state and adjusting the content of the dialogue accordingly. This enables the provision of information and dynamic adjustment of dialogue content in accordance with the user's emotions.
[0404] "Means of receiving information" refers to devices or software that have the function of acquiring data or documents transmitted electronically from an external source and enabling processing within the system.
[0405] "Means for extracting textual information" refers to a device or program that has the function of recognizing textual characters from received documents or images and extracting them as digital data.
[0406] "Means for identifying risk factors" refers to devices or software that have the function of analyzing extracted textual information and identifying risks or defects hidden within a document.
[0407] "Means for analyzing diagrams and extracting important elements" refers to a device or software that has the function of extracting important data such as design information and similarity by analyzing the graphic information contained in a document.
[0408] "Means of generating and providing a report" refers to a device or program that has the function of organizing and integrating analysis results and presenting them as information in a format that is easy for users to understand.
[0409] "Means of providing additional information through interaction with users" refers to devices or software that have the function of enabling a system to communicate interactively with users and appropriately provide necessary information and advice.
[0410] "Emotion recognition means" refers to a device or program that analyzes the user's facial expressions and tone of voice, infers their emotional state, and appropriately adjusts the content of the dialogue and the presentation of information based on that.
[0411] The system that realizes this invention relies primarily on three elements: a server, a terminal, and a user. The server plays a central role, handling various processes such as receiving information, extracting text information, identifying risk elements, analyzing diagrams, extracting important elements, and generating and providing the results as a report.
[0412] The server processes images from received documents using OpenCV and performs OCR on text using the Google Cloud Vision API. Furthermore, it has the capability to recognize user emotions in real time from facial expressions and voice using TensorFlow. This enables flexible dialogue and information provision based on the user's emotional state.
[0413] The terminal functions as a user input device, uploading and receiving information, and collecting data using the camera and microphone. This data is sent to a server, where it is analyzed and information is provided. Users can interact with the system through the terminal and obtain any additional information they need.
[0414] As a concrete example, if a user is interested in a particular product, the server will display a detailed description of that product and related products, adjusting the way information is presented based on the user's response. For instance, if it is determined that the user is hesitant about purchasing, the server may highlight the product's advantages and reviews from other users.
[0415] The generative AI model is used to further optimize information delivery using prompt messages. The following is an example of such a prompt message.
[0416] "Please enter the name of the product that the customer is considering purchasing. Considering the product's characteristics, please generate a reassuring description."
[0417] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0418] Step 1:
[0419] The terminal receives input from the user and uploads documents and information to the server. Input may include contracts and product information. The terminal receives this data and sends it to the server. The output is the data file sent to the server.
[0420] Step 2:
[0421] The server determines the format of the received document and performs OCR processing using the Google Cloud Vision API as needed. The input is an unformatted data file, and the output is data with the text extracted. The server converts this data into digital text and prepares it for analysis.
[0422] Step 3:
[0423] The server uses OpenCV to analyze images within a document and extract important elements. The input is a document containing image data, and the output is information about the important elements obtained from the image analysis. The server identifies the image information and performs further analysis based on it.
[0424] Step 4:
[0425] The server uses TensorFlow to analyze face and voice data sent from the device and estimate the user's emotional state in real time. The input is face and voice data sent from the device, and the output is the estimated emotional state. Based on the emotion recognition, the server prepares to adjust the dialogue.
[0426] Step 5:
[0427] The server uses natural language processing technologies such as text blobs to analyze textual information and identify risk factors and recommendations. Input consists of textual information and all data obtained from analyzed images and sentiment data. Based on this, the server generates and outputs a report.
[0428] Step 6:
[0429] The server utilizes a generative AI model to provide optimal information using prompt messages. For example, it might create prompts such as, "Please enter the name of a product the customer is considering purchasing. Considering the product's characteristics, generate a reassuring description." The output is user-optimized information.
[0430] Step 7:
[0431] The user reviews the report through their device and asks additional questions or engages in dialogue as needed. The server receives real-time emotional feedback from the user and dynamically adjusts the dialogue to achieve the optimal user experience. The input is the user's feedback, and the output is the optimized dialogue.
[0432] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0433] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0434] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0435] [Third Embodiment]
[0436] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0437] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0438] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0439] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0440] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0441] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0442] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0443] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0444] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0445] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0446] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0447] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0448] This invention is a system for efficiently analyzing contracts and patent drawings handled by legal and intellectual property departments, and for promoting risk management. Specifically, the present invention can be implemented as follows.
[0449] Users using a terminal upload contracts or patent drawings they wish to have analyzed to the system. This sends the document files to the server.
[0450] The server begins processing the received document for analysis. It checks the document format and, if necessary, applies OCR (Optical Character Recognition) to extract text. This text extraction method ensures that all text, including text information within images, is converted into text without omission.
[0451] Next, the server analyzes the text extracted using text analysis tools with natural language processing technology. Here, important risk information is automatically extracted by identifying risk clauses within the contract. For example, important clauses such as "payment terms" and "contract termination" are identified.
[0452] In parallel, the server drives image analysis tools to analyze the patent drawings. Using computer vision technology, it identifies design elements within the images and extracts drawing features that may be relevant to patent examination. This analysis helps to eliminate design doubts and confirm similarities with competing patents.
[0453] The server aggregates the results of text and image analysis and generates user-friendly reports. This report generation method allows users to efficiently grasp important information. The reports include extracted risk information, design points of interest, and recommended countermeasures as needed.
[0454] Furthermore, users can interactively engage with the AI to ask questions about the analysis results and obtain additional information. This interactive method can deepen the user's understanding and support their decision-making.
[0455] This system, with its multilingual support on a single platform and configuration for international use, can be utilized in the risk management operations of many companies both domestically and internationally. For example, its effectiveness is demonstrated when companies conduct international trade agreements, analyzing contract documents in various languages and managing risks.
[0456] The following describes the processing flow.
[0457] Step 1:
[0458] The user operates the terminal, specifies the contracts or patent drawings they want to analyze, and uploads them to the server. The user selects files through the system interface and transfers the data to the server by clicking the send button.
[0459] Step 2:
[0460] The server receives the uploaded document. After receiving it, it automatically determines the document format (PDF, PNG, etc.) and, if necessary, extracts text from the image using OCR technology. This process makes all text information within the document available for analysis.
[0461] Step 3:
[0462] The server uses text analysis tools to perform a detailed analysis of the text data extracted by OCR processing. It employs natural language processing techniques to identify specific risk-related clauses from the contract and pinpoint the risk clauses.
[0463] Step 4:
[0464] The server simultaneously drives image analysis tools to analyze patent drawings. Using computer vision technology, it analyzes structures and design elements within the drawings, identifying key points. This process makes it possible to detect design flaws and potentially competing patents.
[0465] Step 5:
[0466] The server integrates the results of text and image analysis and generates a user-friendly report. The report generation mechanism organizes the analysis results in an easy-to-understand format and provides them to the user.
[0467] Step 6:
[0468] The user receives and verifies the report generated from their device. If necessary, the user can interact with the AI through the interface to request a detailed explanation of the analysis results or to view additional risk information.
[0469] Step 7:
[0470] The server will thoroughly store analysis results and user interaction history, and will be prepared to meet the diverse language needs of users by advancing multilingual support. At this stage, the analysis data will be managed in a way that allows for future use.
[0471] (Example 1)
[0472] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0473] Conventional document analysis systems performed text and image analysis separately, resulting in the loss of some information when integrating the results. Furthermore, analyzing international documents requiring multilingual support and optical character recognition presented challenges in balancing processing complexity and accuracy. Additionally, the lack of interactive information provision tools for efficiently utilizing analysis results made it difficult to effectively support user decision-making.
[0474] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0475] In this invention, the server includes receiving means for receiving information, discrimination means for determining the format of the information and applying optical character recognition as necessary, and multilingual support means using natural language processing technology to process the extracted character data. This enables highly accurate analysis regardless of the format of the information and can be used internationally. Furthermore, by integrating the analysis results and providing them interactively, it becomes possible to effectively support the user's decision-making.
[0476] A "receiving mechanism" is a function for acquiring information from an external source and incorporating it into the system.
[0477] A "character data extraction means" is a function that identifies a string of characters from received information and extracts it as electronic data.
[0478] A "risk factor identification method" is a function that analyzes extracted text data and automatically identifies potential risks and important clauses.
[0479] "Image analysis means" refers to a function that processes visual information and extracts important information and features contained within an image.
[0480] The "report generation method" refers to a function that integrates analysis results and creates a report in an easy-to-understand format.
[0481] A "means of dialogue" refers to a function that provides information interactively through interaction with the user, thereby deepening the user's understanding.
[0482] A "discrimination means" is a function that recognizes the format of received information and selects the appropriate processing method.
[0483] Optical Character Recognition (OCR) is a technology that analyzes character information contained in images and other data, and converts it into text data.
[0484] "Multilingual support means" refers to a function that converts analysis results into multiple languages to accommodate users who use different languages.
[0485] "Element identification means" refers to a function that identifies specific design elements from drawings and other information, and evaluates their similarities and relationships.
[0486] A "generative artificial intelligence model" is an advanced computer algorithm that generates and provides information based on user requests.
[0487] This invention uses an information system to efficiently analyze contracts and patent drawings, thereby supporting risk management. Specifically, the server, terminal, and user elements work together in coordination.
[0488] Users upload the documents they wish to analyze to the system via their terminal. This process sends the documents to the server. The server first uses recognition tools to verify the format of the received documents. If necessary, it extracts character data from the file using Optical Character Recognition (OCR) technology. Tesseract OCR is commonly used as the specific software for this purpose.
[0489] The server uses generative AI models and natural language processing (NLP) techniques on the extracted text data. This allows for the identification of risk elements, even from complex contract documents. Libraries such as SpaCy and BERT are useful for behavioral analysis and legal risk assessment.
[0490] Simultaneously, the server utilizes computer vision technology for drawing analysis, identifying design elements from patent drawings. OpenCV and TensorFlow are used in this process. The server applies machine learning algorithms to identify key design structures and features, and compares them with other similar designs.
[0491] Once the analysis is complete, the server generates a report in a user-friendly format. This report often includes identified hazards, design considerations, and recommended countermeasures. The report is typically provided in PDF format or accessible through a web interface.
[0492] Furthermore, users can interact with the server using prompts related to the generated report. This helps them extract additional information and deepen their understanding of risk management. Examples of specific prompts include inquiries such as, "Please list the risk items in this contract," or "What are the distinctive design elements in the patent drawings?"
[0493] This system streamlines document analysis in legal and intellectual property departments and can handle multilingual international risk management tasks. A key feature of this invention is its flexible analysis capabilities across diverse document formats and information languages.
[0494] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0495] Step 1:
[0496] The user selects the documents to be analyzed on their terminal and uploads them to the system. Input files include contracts and patent drawings. The files are then sent to the server.
[0497] Step 2:
[0498] The server uses a discrimination mechanism to verify the format of the received document. The received document is the input, and the format determination result is the output. At this stage, it supports various file formats such as PDF, DOCX, and JPEG. Depending on the format, it sets a flag to apply OCR technology.
[0499] Step 3:
[0500] The server extracts character data using OCR technology depending on the format. The input is the document that has been identified and the OCR application flag, and the output is the extracted character data. Specifically, it performs optical character recognition to obtain text from image-based files.
[0501] Step 4:
[0502] The server analyzes the extracted text data using a generative AI model and natural language processing technology. The input is text data, and the output is a list of risk elements. This process identifies risk items such as "payment terms" and "contract termination" from contract documents.
[0503] Step 5:
[0504] The server analyzes patent drawings using image analysis technology. The input is the data of the patent drawing, and the output is identified design elements. Computer vision algorithms are used to identify design elements within the image and to check for similarity.
[0505] Step 6:
[0506] The server compiles the results of text and image analysis and generates a report using a report generation system. The input is the analysis results, and the output is the final report. The report includes extracted risk information and key design considerations.
[0507] Step 7:
[0508] The user receives the generated report and interacts with the server using prompts as needed. Input consists of user inquiries, and output consists of additional information and advice. Through this interaction, the user gains further insights into risk management.
[0509] (Application Example 1)
[0510] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0511] In today's business environment, quickly and accurately analyzing documents such as contracts and patent drawings, and assessing risks, is crucial for smooth business operations. However, manually analyzing vast amounts of documents is time-consuming, labor-intensive, and prone to errors. Furthermore, international transactions require multilingual support, adding further complexity. Therefore, there is a need for technologies that can efficiently solve these challenges on portable devices using automated systems.
[0512] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0513] In this invention, the server includes means for receiving data, means for extracting information from the received data, and means for identifying risk elements by analyzing the extracted information. This allows users to easily upload contracts and patent drawings from portable devices and quickly understand their risks. Furthermore, by providing multilingual report generation and interactive dialogue functions, it can be used in international business environments.
[0514] "Means of receiving data" refers to devices or programs that acquire information such as contracts and patent drawings provided by the user.
[0515] "Information extraction means" refers to a technology or device that extracts necessary text or image information from received data.
[0516] "Methods for identifying elements" refer to techniques or processes for analyzing extracted information to identify risks and critical elements.
[0517] "Visual analysis means" refers to a technology that analyzes images within a document and extracts important features and information related to patent examination.
[0518] A "report generation method" refers to a device or program that summarizes analysis results in a format that is easy for the user to understand.
[0519] A "means of dialogue" refers to a function or device that allows a user to communicate interactively with a system and obtain additional information.
[0520] "Means for operating on portable devices" refers to designs and programs that run on portable devices such as smartphones and tablets.
[0521] "Optical recognition processing" is a technology that extracts text from documents and images as digital data.
[0522] "Language support means" refers to a technology or program that makes analysis results available in multiple languages.
[0523] In the system implementing this invention, a user takes a photograph of a contract or patent drawing using a portable device such as a smartphone or tablet, and the data is received. The received data is then processed on a server as follows.
[0524] The server uses Python as software to determine the type of data received and, if necessary, performs optical recognition processing using Tesseract OCR. This extracts text information from the image. This information is further analyzed using natural language processing with spaCy to identify risks and important elements.
[0525] Subsequently, visual analysis is performed using OpenCV to extract important design elements from the image information of the patent data. The analysis results are generated in a user-friendly report format and sent to the user's device.
[0526] Users can also obtain additional information using a dialogue system powered by a generative AI model. This interactive dialogue feature allows users to ask questions about the analysis and obtain further information.
[0527] As a concrete example, there is a case where a businessman handling international transactions was able to take a picture of a new contract using his smartphone while on a business trip, perform a risk assessment, and take appropriate action on the spot. In this case, the following prompt can be used:
[0528] "Please analyze the new contract and identify the risk factors. Specifically, I'd like to know about important clauses regarding payment terms and contract termination. Please display the results in a report on my smartphone."
[0529] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0530] Step 1:
[0531] Users use a portable device to photograph the contracts and patent drawings they wish to analyze and upload them to the server via the terminal. The input is an image file of the contract or patent drawing, and the output is the unprocessed document data transferred to the server.
[0532] Step 2:
[0533] The server checks the format of the received data and performs optical recognition processing using Tesseract OCR as needed. The input is image data sent to the server, and the output is information converted into text by OCR. At this stage, data processing involves extracting character information from the image and generating text data.
[0534] Step 3:
[0535] The server analyzes the text data extracted using spaCy through natural language processing to identify risk elements in the contract. The input is text data obtained by OCR, and the output is the identified risk items. At this stage, keywords and risk-related context within the text are examined, and important elements are extracted.
[0536] Step 4:
[0537] The server uses OpenCV to perform image analysis on patent drawings and extract design elements and important information. The input is image data of the patent drawing, and the output is information on design elements related to patent examination. Here, the features of the image are analyzed, and useful information is organized as data.
[0538] Step 5:
[0539] The server combines the results of text and image analysis to generate a user-friendly report. The input consists of data on risk and design elements, while the output is a detailed analysis report. At this stage, the analysis results are integrated, organized, and displayed in report form.
[0540] Step 6:
[0541] Users can use a dialogue function powered by a generative AI model to communicate interactively with the server and obtain additional information. Input is a prompt or question from the user, and output is the AI's answer or additional information. Through the dialogue, users can obtain even more detailed information.
[0542] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0543] This invention is a system that, in addition to analyzing documents, can recognize user emotions in real time and adjust the presentation of results and dialogue based on that emotional information. A specific embodiment is shown below.
[0544] Users upload contracts and patent drawings they wish to have analyzed via their terminal. The uploaded documents are sent to the server, which receives them using a receiving device.
[0545] The server first checks the format of the received document and applies OCR processing as needed. This extracts all the text information from the document.
[0546] After text extraction is complete, the server analyzes the text data to identify key risk items. Risk identification tools are then used to identify potential risks and deficiencies within the contract.
[0547] Furthermore, the server uses image analysis tools to analyze the design information contained in the patent drawings. It utilizes computer vision technology to identify important design elements and similarities with competing patents within the drawings.
[0548] These analysis results are integrated and organized by a report generation system and provided to the user in an easy-to-understand format. The report includes risk items, important design information, and related recommendations.
[0549] Furthermore, this system incorporates an emotion engine on the server that recognizes the user's emotions in real time as they review reports. This emotion engine analyzes the user's facial expressions and tone of voice to infer their current emotional state.
[0550] Based on the user's perceived emotions, the server dynamically adjusts the presentation of reports and the tone of dialogue. For example, if the user is stressed, the results will be presented more concisely and positively to provide reassurance. Furthermore, emotional feedback allows for flexible modification of responses through dialogue channels, improving the user experience.
[0551] In this way, by realizing sophisticated interactions that incorporate emotions, it is possible to promote user understanding and maximize the utilization of analysis results. For example, when using the system in the context of international contract negotiations, it becomes possible to quickly grasp legal risks while simultaneously responding in a way that takes into account the emotions of the person in charge in the other country.
[0552] The following describes the processing flow.
[0553] Step 1:
[0554] The user uses a terminal to select the contracts and patent drawings to be analyzed and uploads them to the system. The user then specifies the relevant file from the operation interface and presses the send button, at which point the document is transferred to the server.
[0555] Step 2:
[0556] The server receives the uploaded document via a receiving device. It determines the format of the received document (PDF, JPEG, etc.) and, if necessary, extracts the text from the document using OCR technology. This allows text information to be obtained from the image.
[0557] Step 3:
[0558] The server uses text extraction methods to analyze the extracted text data. Natural language processing techniques are used to identify risk items within the contract. In this process, risk clauses and inappropriate conditions are identified through machine learning algorithms.
[0559] Step 4:
[0560] The server uses image analysis techniques to analyze patent drawings. Computer vision is used to identify design features and evaluate the novelty and similarity of the patent. This clarifies the relevant technical elements.
[0561] Step 5:
[0562] The server generates a comprehensive report based on the analysis results using a report generation system. The report includes extracted risk clauses, image analysis results, and proposed action plans. This report is designed to support user decision-making.
[0563] Step 6:
[0564] The user receives and reviews a report generated through their device. Simultaneously, the emotion engine recognizes the user's emotions. Using the device's camera and microphone, it analyzes the user's facial expressions and tone of voice to evaluate their emotional state in real time.
[0565] Step 7:
[0566] The server adjusts the presentation of reports based on information from the emotion engine, according to the user's emotions. For example, if anxiety is detected, the content is modified to emphasize positive language and provide a sense of reassurance. Interactive dialogue content is also changed based on the emotional state.
[0567] Step 8:
[0568] Based on reports tailored to their emotions, users can consider necessary actions and engage in further dialogue with the AI through the server. This allows users to make the most of the analysis results and supports quick and effective decision-making.
[0569] (Example 2)
[0570] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0571] Traditional systems only analyze information within documents and do not provide information based on the user's emotions. As a result, users may experience stress and anxiety. Furthermore, providing analysis results in different languages is difficult, limiting global use.
[0572] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0573] In this invention, the server includes means for receiving information, means for recognizing the user's emotional state and adjusting the analysis results, and means for providing the analysis results in multiple languages. This enables information provision that takes the user's emotions into consideration and allows for global use.
[0574] "Means of receiving information" refers to the function of allowing a server to receive input information from an external source.
[0575] "Means for extracting character information" refers to a function that processes information to obtain string data from received information.
[0576] "Means for identifying hazards" refers to a function that analyzes extracted textual information to identify potential risks.
[0577] "Means for analyzing visual information" refers to functions for detecting and analyzing important elements from image data contained in information.
[0578] "Means of generating and providing reports" refers to a function that organizes analysis results and outputs them in a format that is easy for users to understand.
[0579] "Means for recognizing the user's emotional state" refers to a function that analyzes the user's emotions in real time and provides information appropriate to that situation.
[0580] "Means for interacting with users, providing additional information, and coordinating feedback" refers to functions that interact with users, provide additional information as needed, and adjust the content.
[0581] "Means for performing character recognition processing" refers to a function that applies OCR technology to extract characters according to the format of the received document.
[0582] "Means of providing analysis results in multiple languages" refers to a function that translates the analyzed information into multiple languages and presents it to the user.
[0583] This invention provides a system that allows users to easily analyze documents and receive emotion-based feedback. Specifically, users upload the documents they wish to have analyzed to the system using a terminal. These documents can be in the format of contracts or patent drawings.
[0584] First, the server receives the document using a receiving device and determines its format. If necessary, it extracts text information using OCR software (e.g., Tesseract). This process allows text information to be obtained even from documents uploaded in image format.
[0585] Next, the server performs analysis on the extracted text data using natural language processing. This makes it possible to identify risk items hidden in the contract and identify potential dangers. In addition, computer vision technology (e.g., OpenCV) is used to analyze design elements in patent drawings and their similarities to competing technologies.
[0586] The server generates a report based on the analysis results and provides it to users in an easy-to-understand manner through its multilingual support function. This ensures that users from different language regions can fully understand the analysis results.
[0587] Furthermore, the server uses the user's device camera and microphone to recognize the user's emotions in real time. By analyzing facial expressions and voice tone, it infers how the user is receiving the information. Based on this information, the server dynamically adjusts the presentation of reports and the dialogue style to ensure the user is comfortable receiving the information.
[0588] For example, using this system in international contract negotiations allows for the rapid identification of legal risks and facilitates smoother negotiations while taking into account the feelings of the other party's representatives.
[0589] An example of a prompt message would be, "Please identify the main risk items in this contract and suggest improvements if there are any deficiencies. Also, please summarize the key points included in the patent drawings." This allows the user to communicate specific analysis requests to the system through prompt messages.
[0590] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0591] Step 1:
[0592] The user uses a terminal to select the documents they wish to analyze and upload them to the system. The input files are often in PDF or image format, and this information is sent to the server. The server receives the documents and prepares for the next processing step.
[0593] Step 2:
[0594] The server determines the format of the received document. Based on this determination, it uses OCR software to extract text information from images or PDFs if necessary. The input is an image or PDF, and the output is text data. This process extracts character data from visual information.
[0595] Step 3:
[0596] The server analyzes the extracted text data and uses natural language processing techniques to identify key risk items. The input is text data, and the output is analytical information including identified risk items and potential hazards. Processing includes keyword extraction and pattern recognition.
[0597] Step 4:
[0598] The server analyzes images contained in received documents and uses computer vision technology to find important design elements and similarities. The input is image data, and the output is the analysis results of design information and similarity with competing technologies. Feature points are extracted and analyzed using an image processing library.
[0599] Step 5:
[0600] The server generates a report based on the analyzed data and provides it to the user. At this time, it utilizes multilingual support to create reports translated into different languages. The input is analyzed information, and the output is a report in a format that is easy for the user to understand. The report generation system systematically organizes the information.
[0601] Step 6:
[0602] The server uses the device's camera and microphone to analyze facial expressions and voice in real time to recognize the user's emotional state. The input is the user's voice and video data, and the output is an estimate of the user's emotional state. This allows the information provided to be tailored based on the user's current emotions.
[0603] Step 7:
[0604] The server appropriately adjusts the presentation of materials and the content of dialogue based on the user's emotional state. The input is the result of an analysis of the emotional state, and the output is the adjusted information presentation and dialogue content. This optimizes the user experience and provides situation-appropriate feedback.
[0605] (Application Example 2)
[0606] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0607] This invention relates to an interactive system that combines document analysis and user emotion recognition. Conventional systems fail to adequately optimize the user experience because they do not consider user emotions when providing document analysis results. Therefore, there is a challenge in realizing flexible dialogue and information provision based on user emotions and improving convenience for users.
[0608] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0609] In this invention, the server includes means for receiving information, means for extracting textual information from the received information, means for analyzing the extracted textual information and identifying risk elements, means for analyzing diagrams and extracting important elements, means for generating and providing the analysis results as a report, means for providing additional information through dialogue with the user, and emotion recognition means for estimating the user's emotional state and adjusting the content of the dialogue accordingly. This enables the provision of information and dynamic adjustment of dialogue content in accordance with the user's emotions.
[0610] "Means of receiving information" refers to devices or software that have the function of acquiring data or documents transmitted electronically from an external source and enabling processing within the system.
[0611] "Means for extracting textual information" refers to a device or program that has the function of recognizing textual characters from received documents or images and extracting them as digital data.
[0612] "Means for identifying risk factors" refers to devices or software that have the function of analyzing extracted textual information and identifying risks or defects hidden within a document.
[0613] "Means for analyzing diagrams and extracting important elements" refers to a device or software that has the function of extracting important data such as design information and similarity by analyzing the graphic information contained in a document.
[0614] "Means of generating and providing a report" refers to a device or program that has the function of organizing and integrating analysis results and presenting them as information in a format that is easy for users to understand.
[0615] "Means of providing additional information through interaction with users" refers to devices or software that have the function of enabling a system to communicate interactively with users and appropriately provide necessary information and advice.
[0616] "Emotion recognition means" refers to a device or program that analyzes the user's facial expressions and tone of voice, infers their emotional state, and appropriately adjusts the content of the dialogue and the presentation of information based on that.
[0617] The system that realizes this invention relies primarily on three elements: a server, a terminal, and a user. The server plays a central role, handling various processes such as receiving information, extracting text information, identifying risk elements, analyzing diagrams, extracting important elements, and generating and providing the results as a report.
[0618] The server processes images from received documents using OpenCV and performs OCR on text using the Google Cloud Vision API. Furthermore, it has the capability to recognize user emotions in real time from facial expressions and voice using TensorFlow. This enables flexible dialogue and information provision based on the user's emotional state.
[0619] The terminal functions as a user input device, uploading and receiving information, and collecting data using the camera and microphone. This data is sent to a server, where it is analyzed and information is provided. Users can interact with the system through the terminal and obtain any additional information they need.
[0620] As a concrete example, if a user is interested in a particular product, the server will display a detailed description of that product and related products, adjusting the way information is presented based on the user's response. For instance, if it is determined that the user is hesitant about purchasing, the server may highlight the product's advantages and reviews from other users.
[0621] The generative AI model is used to further optimize information delivery using prompt messages. The following is an example of such a prompt message.
[0622] "Please enter the name of the product that the customer is considering purchasing. Considering the product's characteristics, please generate a reassuring description."
[0623] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0624] Step 1:
[0625] The terminal receives input from the user and uploads documents and information to the server. Input may include contracts and product information. The terminal receives this data and sends it to the server. The output is the data file sent to the server.
[0626] Step 2:
[0627] The server determines the format of the received document and performs OCR processing using the Google Cloud Vision API as needed. The input is an unformatted data file, and the output is data with the text extracted. The server converts this data into digital text and prepares it for analysis.
[0628] Step 3:
[0629] The server uses OpenCV to analyze images within a document and extract important elements. The input is a document containing image data, and the output is information about the important elements obtained from the image analysis. The server identifies the image information and performs further analysis based on it.
[0630] Step 4:
[0631] The server uses TensorFlow to analyze face and voice data sent from the device and estimate the user's emotional state in real time. The input is face and voice data sent from the device, and the output is the estimated emotional state. Based on the emotion recognition, the server prepares to adjust the dialogue.
[0632] Step 5:
[0633] The server uses natural language processing technologies such as text blobs to analyze textual information and identify risk factors and recommendations. Input consists of textual information and all data obtained from analyzed images and sentiment data. Based on this, the server generates and outputs a report.
[0634] Step 6:
[0635] The server utilizes a generative AI model to provide optimal information using prompt messages. For example, it might create prompts such as, "Please enter the name of a product the customer is considering purchasing. Considering the product's characteristics, generate a reassuring description." The output is user-optimized information.
[0636] Step 7:
[0637] The user reviews the report through their device and asks additional questions or engages in dialogue as needed. The server receives real-time emotional feedback from the user and dynamically adjusts the dialogue to achieve the optimal user experience. The input is the user's feedback, and the output is the optimized dialogue.
[0638] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0639] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0640] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0641] [Fourth Embodiment]
[0642] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0643] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0644] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0645] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0646] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0647] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0648] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0649] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0650] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0651] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0652] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0653] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0654] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0655] This invention is a system for efficiently analyzing contracts and patent drawings handled by legal and intellectual property departments, and for promoting risk management. Specifically, the present invention can be implemented as follows.
[0656] Users using a terminal upload contracts or patent drawings they wish to have analyzed to the system. This sends the document files to the server.
[0657] The server begins processing the received document for analysis. It checks the document format and, if necessary, applies OCR (Optical Character Recognition) to extract text. This text extraction method ensures that all text, including text information within images, is converted into text without omission.
[0658] Next, the server analyzes the text extracted using text analysis tools with natural language processing technology. Here, important risk information is automatically extracted by identifying risk clauses within the contract. For example, important clauses such as "payment terms" and "contract termination" are identified.
[0659] In parallel, the server drives image analysis tools to analyze the patent drawings. Using computer vision technology, it identifies design elements within the images and extracts drawing features that may be relevant to patent examination. This analysis helps to eliminate design doubts and confirm similarities with competing patents.
[0660] The server aggregates the results of text and image analysis and generates user-friendly reports. This report generation method allows users to efficiently grasp important information. The reports include extracted risk information, design points of interest, and recommended countermeasures as needed.
[0661] Furthermore, users can interactively engage with the AI to ask questions about the analysis results and obtain additional information. This interactive method can deepen the user's understanding and support their decision-making.
[0662] This system, with its multilingual support on a single platform and configuration for international use, can be utilized in the risk management operations of many companies both domestically and internationally. For example, its effectiveness is demonstrated when companies conduct international trade agreements, analyzing contract documents in various languages and managing risks.
[0663] The following describes the processing flow.
[0664] Step 1:
[0665] The user operates the terminal, specifies the contracts or patent drawings they want to analyze, and uploads them to the server. The user selects files through the system interface and transfers the data to the server by clicking the send button.
[0666] Step 2:
[0667] The server receives the uploaded document. After receiving it, it automatically determines the document format (PDF, PNG, etc.) and, if necessary, extracts text from the image using OCR technology. This process makes all text information within the document available for analysis.
[0668] Step 3:
[0669] The server uses text analysis tools to perform a detailed analysis of the text data extracted by OCR processing. It employs natural language processing techniques to identify specific risk-related clauses from the contract and pinpoint the risk clauses.
[0670] Step 4:
[0671] The server simultaneously drives image analysis tools to analyze patent drawings. Using computer vision technology, it analyzes structures and design elements within the drawings, identifying key points. This process makes it possible to detect design flaws and potentially competing patents.
[0672] Step 5:
[0673] The server integrates the results of text and image analysis and generates a user-friendly report. The report generation mechanism organizes the analysis results in an easy-to-understand format and provides them to the user.
[0674] Step 6:
[0675] The user receives and verifies the report generated from their device. If necessary, the user can interact with the AI through the interface to request a detailed explanation of the analysis results or to view additional risk information.
[0676] Step 7:
[0677] The server will thoroughly store analysis results and user interaction history, and will be prepared to meet the diverse language needs of users by advancing multilingual support. At this stage, the analysis data will be managed in a way that allows for future use.
[0678] (Example 1)
[0679] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0680] Conventional document analysis systems performed text and image analysis separately, resulting in the loss of some information when integrating the results. Furthermore, analyzing international documents requiring multilingual support and optical character recognition presented challenges in balancing processing complexity and accuracy. Additionally, the lack of interactive information provision tools for efficiently utilizing analysis results made it difficult to effectively support user decision-making.
[0681] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0682] In this invention, the server includes receiving means for receiving information, discrimination means for determining the format of the information and applying optical character recognition as necessary, and multilingual support means using natural language processing technology to process the extracted character data. This enables highly accurate analysis regardless of the format of the information and can be used internationally. Furthermore, by integrating the analysis results and providing them interactively, it becomes possible to effectively support the user's decision-making.
[0683] A "receiving mechanism" is a function for acquiring information from an external source and incorporating it into the system.
[0684] A "character data extraction means" is a function that identifies a string of characters from received information and extracts it as electronic data.
[0685] A "risk factor identification method" is a function that analyzes extracted text data and automatically identifies potential risks and important clauses.
[0686] "Image analysis means" refers to a function that processes visual information and extracts important information and features contained within an image.
[0687] The "report generation method" refers to a function that integrates analysis results and creates a report in an easy-to-understand format.
[0688] A "means of dialogue" refers to a function that provides information interactively through interaction with the user, thereby deepening the user's understanding.
[0689] A "discrimination means" is a function that recognizes the format of received information and selects the appropriate processing method.
[0690] Optical Character Recognition (OCR) is a technology that analyzes character information contained in images and other data, and converts it into text data.
[0691] "Multilingual support means" refers to a function that converts analysis results into multiple languages to accommodate users who use different languages.
[0692] "Element identification means" refers to a function that identifies specific design elements from drawings and other information, and evaluates their similarities and relationships.
[0693] A "generative artificial intelligence model" is an advanced computer algorithm that generates and provides information based on user requests.
[0694] This invention uses an information system to efficiently analyze contracts and patent drawings, thereby supporting risk management. Specifically, the server, terminal, and user elements work together in coordination.
[0695] Users upload the documents they wish to analyze to the system via their terminal. This process sends the documents to the server. The server first uses recognition tools to verify the format of the received documents. If necessary, it extracts character data from the file using Optical Character Recognition (OCR) technology. Tesseract OCR is commonly used as the specific software for this purpose.
[0696] The server uses generative AI models and natural language processing (NLP) techniques on the extracted text data. This allows for the identification of risk elements, even from complex contract documents. Libraries such as SpaCy and BERT are useful for behavioral analysis and legal risk assessment.
[0697] Simultaneously, the server utilizes computer vision technology for drawing analysis, identifying design elements from patent drawings. OpenCV and TensorFlow are used in this process. The server applies machine learning algorithms to identify key design structures and features, and compares them with other similar designs.
[0698] Once the analysis is complete, the server generates a report in a user-friendly format. This report often includes identified hazards, design considerations, and recommended countermeasures. The report is typically provided in PDF format or accessible through a web interface.
[0699] Furthermore, users can interact with the server using prompts related to the generated report. This helps them extract additional information and deepen their understanding of risk management. Examples of specific prompts include inquiries such as, "Please list the risk items in this contract," or "What are the distinctive design elements in the patent drawings?"
[0700] This system streamlines document analysis in legal and intellectual property departments and can handle multilingual international risk management tasks. A key feature of this invention is its flexible analysis capabilities across diverse document formats and information languages.
[0701] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0702] Step 1:
[0703] The user selects the documents to be analyzed on their terminal and uploads them to the system. Input files include contracts and patent drawings. The files are then sent to the server.
[0704] Step 2:
[0705] The server uses a discrimination mechanism to verify the format of the received document. The received document is the input, and the format determination result is the output. At this stage, it supports various file formats such as PDF, DOCX, and JPEG. Depending on the format, it sets a flag to apply OCR technology.
[0706] Step 3:
[0707] The server extracts character data using OCR technology depending on the format. The input is the document that has been identified and the OCR application flag, and the output is the extracted character data. Specifically, it performs optical character recognition to obtain text from image-based files.
[0708] Step 4:
[0709] The server analyzes the extracted text data using a generative AI model and natural language processing technology. The input is text data, and the output is a list of risk elements. This process identifies risk items such as "payment terms" and "contract termination" from contract documents.
[0710] Step 5:
[0711] The server analyzes patent drawings using image analysis technology. The input is the data of the patent drawing, and the output is identified design elements. Computer vision algorithms are used to identify design elements within the image and to check for similarity.
[0712] Step 6:
[0713] The server compiles the results of text and image analysis and generates a report using a report generation system. The input is the analysis results, and the output is the final report. The report includes extracted risk information and key design considerations.
[0714] Step 7:
[0715] The user receives the generated report and interacts with the server using prompts as needed. Input consists of user inquiries, and output consists of additional information and advice. Through this interaction, the user gains further insights into risk management.
[0716] (Application Example 1)
[0717] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0718] In today's business environment, quickly and accurately analyzing documents such as contracts and patent drawings, and assessing risks, is crucial for smooth business operations. However, manually analyzing vast amounts of documents is time-consuming, labor-intensive, and prone to errors. Furthermore, international transactions require multilingual support, adding further complexity. Therefore, there is a need for technologies that can efficiently solve these challenges on portable devices using automated systems.
[0719] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0720] In this invention, the server includes means for receiving data, means for extracting information from the received data, and means for identifying risk elements by analyzing the extracted information. This allows users to easily upload contracts and patent drawings from portable devices and quickly understand their risks. Furthermore, by providing multilingual report generation and interactive dialogue functions, it can be used in international business environments.
[0721] "Means of receiving data" refers to devices or programs that acquire information such as contracts and patent drawings provided by the user.
[0722] "Information extraction means" refers to a technology or device that extracts necessary text or image information from received data.
[0723] "Methods for identifying elements" refer to techniques or processes for analyzing extracted information to identify risks and critical elements.
[0724] "Visual analysis means" refers to a technology that analyzes images within a document and extracts important features and information related to patent examination.
[0725] A "report generation method" refers to a device or program that summarizes analysis results in a format that is easy for the user to understand.
[0726] A "means of dialogue" refers to a function or device that allows a user to communicate interactively with a system and obtain additional information.
[0727] "Means for operating on portable devices" refers to designs and programs that run on portable devices such as smartphones and tablets.
[0728] "Optical recognition processing" is a technology that extracts text from documents and images as digital data.
[0729] "Language support means" refers to a technology or program that makes analysis results available in multiple languages.
[0730] In the system implementing this invention, a user takes a photograph of a contract or patent drawing using a portable device such as a smartphone or tablet, and the data is received. The received data is then processed on a server as follows.
[0731] The server uses Python as software to determine the type of data received and, if necessary, performs optical recognition processing using Tesseract OCR. This extracts text information from the image. This information is further analyzed using natural language processing with spaCy to identify risks and important elements.
[0732] Subsequently, visual analysis is performed using OpenCV to extract important design elements from the image information of the patent data. The analysis results are generated in a user-friendly report format and sent to the user's device.
[0733] Users can also obtain additional information using a dialogue system powered by a generative AI model. This interactive dialogue feature allows users to ask questions about the analysis and obtain further information.
[0734] As a concrete example, there is a case where a businessman handling international transactions was able to take a picture of a new contract using his smartphone while on a business trip, perform a risk assessment, and take appropriate action on the spot. In this case, the following prompt can be used:
[0735] "Please analyze the new contract and identify the risk factors. Specifically, I'd like to know about important clauses regarding payment terms and contract termination. Please display the results in a report on my smartphone."
[0736] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0737] Step 1:
[0738] Users use a portable device to photograph the contracts and patent drawings they wish to analyze and upload them to the server via the terminal. The input is an image file of the contract or patent drawing, and the output is the unprocessed document data transferred to the server.
[0739] Step 2:
[0740] The server checks the format of the received data and performs optical recognition processing using Tesseract OCR as needed. The input is image data sent to the server, and the output is information converted into text by OCR. At this stage, data processing involves extracting character information from the image and generating text data.
[0741] Step 3:
[0742] The server analyzes the text data extracted using spaCy through natural language processing to identify risk elements in the contract. The input is text data obtained by OCR, and the output is the identified risk items. At this stage, keywords and risk-related context within the text are examined, and important elements are extracted.
[0743] Step 4:
[0744] The server uses OpenCV to perform image analysis on patent drawings and extract design elements and important information. The input is image data of the patent drawing, and the output is information on design elements related to patent examination. Here, the features of the image are analyzed, and useful information is organized as data.
[0745] Step 5:
[0746] The server combines the results of text and image analysis to generate a user-friendly report. The input consists of data on risk and design elements, while the output is a detailed analysis report. At this stage, the analysis results are integrated, organized, and displayed in report form.
[0747] Step 6:
[0748] Users can use a dialogue function powered by a generative AI model to communicate interactively with the server and obtain additional information. Input is a prompt or question from the user, and output is the AI's answer or additional information. Through the dialogue, users can obtain even more detailed information.
[0749] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0750] This invention is a system that, in addition to analyzing documents, can recognize user emotions in real time and adjust the presentation of results and dialogue based on that emotional information. A specific embodiment is shown below.
[0751] Users upload contracts and patent drawings they wish to have analyzed via their terminal. The uploaded documents are sent to the server, which receives them using a receiving device.
[0752] The server first checks the format of the received document and applies OCR processing as needed. This extracts all the text information from the document.
[0753] After text extraction is complete, the server analyzes the text data to identify key risk items. Risk identification tools are then used to identify potential risks and deficiencies within the contract.
[0754] Furthermore, the server uses image analysis tools to analyze the design information contained in the patent drawings. It utilizes computer vision technology to identify important design elements and similarities with competing patents within the drawings.
[0755] These analysis results are integrated and organized by a report generation system and provided to the user in an easy-to-understand format. The report includes risk items, important design information, and related recommendations.
[0756] Furthermore, this system incorporates an emotion engine on the server that recognizes the user's emotions in real time as they review reports. This emotion engine analyzes the user's facial expressions and tone of voice to infer their current emotional state.
[0757] Based on the user's perceived emotions, the server dynamically adjusts the presentation of reports and the tone of dialogue. For example, if the user is stressed, the results will be presented more concisely and positively to provide reassurance. Furthermore, emotional feedback allows for flexible modification of responses through dialogue channels, improving the user experience.
[0758] In this way, by realizing sophisticated interactions that incorporate emotions, it is possible to promote user understanding and maximize the utilization of analysis results. For example, when using the system in the context of international contract negotiations, it becomes possible to quickly grasp legal risks while simultaneously responding in a way that takes into account the emotions of the person in charge in the other country.
[0759] The following describes the processing flow.
[0760] Step 1:
[0761] The user uses a terminal to select the contracts and patent drawings to be analyzed and uploads them to the system. The user then specifies the relevant file from the operation interface and presses the send button, at which point the document is transferred to the server.
[0762] Step 2:
[0763] The server receives the uploaded document via a receiving device. It determines the format of the received document (PDF, JPEG, etc.) and, if necessary, extracts the text from the document using OCR technology. This allows text information to be obtained from the image.
[0764] Step 3:
[0765] The server uses text extraction methods to analyze the extracted text data. Natural language processing techniques are used to identify risk items within the contract. In this process, risk clauses and inappropriate conditions are identified through machine learning algorithms.
[0766] Step 4:
[0767] The server uses image analysis techniques to analyze patent drawings. Computer vision is used to identify design features and evaluate the novelty and similarity of the patent. This clarifies the relevant technical elements.
[0768] Step 5:
[0769] The server generates a comprehensive report based on the analysis results using a report generation system. The report includes extracted risk clauses, image analysis results, and proposed action plans. This report is designed to support user decision-making.
[0770] Step 6:
[0771] The user receives and reviews a report generated through their device. Simultaneously, the emotion engine recognizes the user's emotions. Using the device's camera and microphone, it analyzes the user's facial expressions and tone of voice to evaluate their emotional state in real time.
[0772] Step 7:
[0773] The server adjusts the presentation of reports based on information from the emotion engine, according to the user's emotions. For example, if anxiety is detected, the content is modified to emphasize positive language and provide a sense of reassurance. Interactive dialogue content is also changed based on the emotional state.
[0774] Step 8:
[0775] Based on reports tailored to their emotions, users can consider necessary actions and engage in further dialogue with the AI through the server. This allows users to make the most of the analysis results and supports quick and effective decision-making.
[0776] (Example 2)
[0777] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0778] Traditional systems only analyze information within documents and do not provide information based on the user's emotions. As a result, users may experience stress and anxiety. Furthermore, providing analysis results in different languages is difficult, limiting global use.
[0779] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0780] In this invention, the server includes means for receiving information, means for recognizing the user's emotional state and adjusting the analysis results, and means for providing the analysis results in multiple languages. This enables information provision that takes the user's emotions into consideration and allows for global use.
[0781] "Means of receiving information" refers to the function of allowing a server to receive input information from an external source.
[0782] "Means for extracting character information" refers to a function that processes information to obtain string data from received information.
[0783] "Means for identifying hazards" refers to a function that analyzes extracted textual information to identify potential risks.
[0784] "Means for analyzing visual information" refers to functions for detecting and analyzing important elements from image data contained in information.
[0785] "Means of generating and providing reports" refers to a function that organizes analysis results and outputs them in a format that is easy for users to understand.
[0786] "Means for recognizing the user's emotional state" refers to a function that analyzes the user's emotions in real time and provides information appropriate to that situation.
[0787] "Means for interacting with users, providing additional information, and coordinating feedback" refers to functions that interact with users, provide additional information as needed, and adjust the content.
[0788] "Means for performing character recognition processing" refers to a function that applies OCR technology to extract characters according to the format of the received document.
[0789] "Means of providing analysis results in multiple languages" refers to a function that translates the analyzed information into multiple languages and presents it to the user.
[0790] This invention provides a system that allows users to easily analyze documents and receive emotion-based feedback. Specifically, users upload the documents they wish to have analyzed to the system using a terminal. These documents can be in the format of contracts or patent drawings.
[0791] First, the server receives the document using a receiving device and determines its format. If necessary, it extracts text information using OCR software (e.g., Tesseract). This process allows text information to be obtained even from documents uploaded in image format.
[0792] Next, the server performs analysis on the extracted text data using natural language processing. This makes it possible to identify risk items hidden in the contract and identify potential dangers. In addition, computer vision technology (e.g., OpenCV) is used to analyze design elements in patent drawings and their similarities to competing technologies.
[0793] The server generates a report based on the analysis results and provides it to users in an easy-to-understand manner through its multilingual support function. This ensures that users from different language regions can fully understand the analysis results.
[0794] Furthermore, the server uses the user's device camera and microphone to recognize the user's emotions in real time. By analyzing facial expressions and voice tone, it infers how the user is receiving the information. Based on this information, the server dynamically adjusts the presentation of reports and the dialogue style to ensure the user is comfortable receiving the information.
[0795] For example, using this system in international contract negotiations allows for the rapid identification of legal risks and facilitates smoother negotiations while taking into account the feelings of the other party's representatives.
[0796] An example of a prompt message would be, "Please identify the main risk items in this contract and suggest improvements if there are any deficiencies. Also, please summarize the key points included in the patent drawings." This allows the user to communicate specific analysis requests to the system through prompt messages.
[0797] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0798] Step 1:
[0799] The user uses a terminal to select the documents they wish to analyze and upload them to the system. The input files are often in PDF or image format, and this information is sent to the server. The server receives the documents and prepares for the next processing step.
[0800] Step 2:
[0801] The server determines the format of the received document. Based on this determination, it uses OCR software to extract text information from images or PDFs if necessary. The input is an image or PDF, and the output is text data. This process extracts character data from visual information.
[0802] Step 3:
[0803] The server analyzes the extracted text data and uses natural language processing techniques to identify key risk items. The input is text data, and the output is analytical information including identified risk items and potential hazards. Processing includes keyword extraction and pattern recognition.
[0804] Step 4:
[0805] The server analyzes images contained in received documents and uses computer vision technology to find important design elements and similarities. The input is image data, and the output is the analysis results of design information and similarity with competing technologies. Feature points are extracted and analyzed using an image processing library.
[0806] Step 5:
[0807] The server generates a report based on the analyzed data and provides it to the user. At this time, it utilizes multilingual support to create reports translated into different languages. The input is analyzed information, and the output is a report in a format that is easy for the user to understand. The report generation system systematically organizes the information.
[0808] Step 6:
[0809] The server uses the device's camera and microphone to analyze facial expressions and voice in real time to recognize the user's emotional state. The input is the user's voice and video data, and the output is an estimate of the user's emotional state. This allows the information provided to be tailored based on the user's current emotions.
[0810] Step 7:
[0811] The server appropriately adjusts the presentation of materials and the content of dialogue based on the user's emotional state. The input is the result of an analysis of the emotional state, and the output is the adjusted information presentation and dialogue content. This optimizes the user experience and provides situation-appropriate feedback.
[0812] (Application Example 2)
[0813] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0814] This invention relates to an interactive system that combines document analysis and user emotion recognition. Conventional systems fail to adequately optimize the user experience because they do not consider user emotions when providing document analysis results. Therefore, there is a challenge in realizing flexible dialogue and information provision based on user emotions and improving convenience for users.
[0815] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0816] In this invention, the server includes means for receiving information, means for extracting textual information from the received information, means for analyzing the extracted textual information and identifying risk elements, means for analyzing diagrams and extracting important elements, means for generating and providing the analysis results as a report, means for providing additional information through dialogue with the user, and emotion recognition means for estimating the user's emotional state and adjusting the content of the dialogue accordingly. This enables the provision of information and dynamic adjustment of dialogue content in accordance with the user's emotions.
[0817] "Means of receiving information" refers to devices or software that have the function of acquiring data or documents transmitted electronically from an external source and enabling processing within the system.
[0818] "Means for extracting textual information" refers to a device or program that has the function of recognizing textual characters from received documents or images and extracting them as digital data.
[0819] "Means for identifying risk factors" refers to devices or software that have the function of analyzing extracted textual information and identifying risks or defects hidden within a document.
[0820] "Means for analyzing diagrams and extracting important elements" refers to a device or software that has the function of extracting important data such as design information and similarity by analyzing the graphic information contained in a document.
[0821] "Means of generating and providing a report" refers to a device or program that has the function of organizing and integrating analysis results and presenting them as information in a format that is easy for users to understand.
[0822] "Means of providing additional information through interaction with users" refers to devices or software that have the function of enabling a system to communicate interactively with users and appropriately provide necessary information and advice.
[0823] "Emotion recognition means" refers to a device or program that analyzes the user's facial expressions and tone of voice, infers their emotional state, and appropriately adjusts the content of the dialogue and the presentation of information based on that.
[0824] The system that realizes this invention relies primarily on three elements: a server, a terminal, and a user. The server plays a central role, handling various processes such as receiving information, extracting text information, identifying risk elements, analyzing diagrams, extracting important elements, and generating and providing the results as a report.
[0825] The server processes images from received documents using OpenCV and performs OCR on text using the Google Cloud Vision API. Furthermore, it has the capability to recognize user emotions in real time from facial expressions and voice using TensorFlow. This enables flexible dialogue and information provision based on the user's emotional state.
[0826] The terminal functions as a user input device, uploading and receiving information, and collecting data using the camera and microphone. This data is sent to a server, where it is analyzed and information is provided. Users can interact with the system through the terminal and obtain any additional information they need.
[0827] As a concrete example, if a user is interested in a particular product, the server will display a detailed description of that product and related products, adjusting the way information is presented based on the user's response. For instance, if it is determined that the user is hesitant about purchasing, the server may highlight the product's advantages and reviews from other users.
[0828] The generative AI model is used to further optimize information delivery using prompt messages. The following is an example of such a prompt message.
[0829] "Please enter the name of the product that the customer is considering purchasing. Considering the product's characteristics, please generate a reassuring description."
[0830] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0831] Step 1:
[0832] The terminal receives input from the user and uploads documents and information to the server. Input may include contracts and product information. The terminal receives this data and sends it to the server. The output is the data file sent to the server.
[0833] Step 2:
[0834] The server determines the format of the received document and performs OCR processing using the Google Cloud Vision API as needed. The input is an unformatted data file, and the output is data with the text extracted. The server converts this data into digital text and prepares it for analysis.
[0835] Step 3:
[0836] The server uses OpenCV to analyze images within a document and extract important elements. The input is a document containing image data, and the output is information about the important elements obtained from the image analysis. The server identifies the image information and performs further analysis based on it.
[0837] Step 4:
[0838] The server uses TensorFlow to analyze face and voice data sent from the device and estimate the user's emotional state in real time. The input is face and voice data sent from the device, and the output is the estimated emotional state. Based on the emotion recognition, the server prepares to adjust the dialogue.
[0839] Step 5:
[0840] The server uses natural language processing technologies such as text blobs to analyze textual information and identify risk factors and recommendations. Input consists of textual information and all data obtained from analyzed images and sentiment data. Based on this, the server generates and outputs a report.
[0841] Step 6:
[0842] The server utilizes a generative AI model to provide optimal information using prompt messages. For example, it might create prompts such as, "Please enter the name of a product the customer is considering purchasing. Considering the product's characteristics, generate a reassuring description." The output is user-optimized information.
[0843] Step 7:
[0844] The user reviews the report through their device and asks additional questions or engages in dialogue as needed. The server receives real-time emotional feedback from the user and dynamically adjusts the dialogue to achieve the optimal user experience. The input is the user's feedback, and the output is the optimized dialogue.
[0845] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0846] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0847] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0848] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0849] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0850] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0851] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0852] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0853] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0854] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0855] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0856] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0857] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0858] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0859] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0860] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0861] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0862] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0863] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0864] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0865] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0866] The following is further disclosed regarding the embodiments described above.
[0867] (Claim 1)
[0868] A means of receiving a document,
[0869] A text extraction method for extracting text from a received document,
[0870] A risk identification method that analyzes extracted text and identifies risk items,
[0871] An image analysis means for analyzing images contained in a document and extracting important information,
[0872] A report generation means that generates and provides analysis results as a report,
[0873] A means of interaction that engages with the user and provides additional information,
[0874] A system that includes this.
[0875] (Claim 2)
[0876] The system according to claim 1, further comprising determination means for determining the format of a received document and applying OCR processing as necessary.
[0877] (Claim 3)
[0878] The system according to claim 1, comprising a multilingual support means capable of providing analysis results in multiple languages.
[0879] "Example 1"
[0880] (Claim 1)
[0881] A receiving means for receiving information,
[0882] A character data extraction means for extracting character data from received information,
[0883] A means for identifying hazardous elements by processing extracted text data,
[0884] Image analysis means for analyzing images contained in information and extracting important information,
[0885] A report generation means that generates and provides analysis results as a report,
[0886] A dialogue mechanism that allows for two-way interaction with users and provides additional information,
[0887] A discrimination means for determining the format of information and applying optical character recognition processing as necessary,
[0888] A multilingual support method that uses natural language processing technology to provide information in multiple languages during processing,
[0889] A system that includes this.
[0890] (Claim 2)
[0891] The system according to claim 1, comprising element identification means for analyzing drawings contained in information, identifying design elements, and confirming similarity.
[0892] (Claim 3)
[0893] The system according to claim 1, comprising a generative artificial intelligence model that processes input sentences for a user to select information and generate information.
[0894] "Application Example 1"
[0895] (Claim 1)
[0896] A means for receiving data,
[0897] An information extraction means for extracting information from received data,
[0898] A means for identifying risk factors by analyzing the extracted information,
[0899] A visual analysis means for analyzing the visual information contained in the data and extracting important information,
[0900] A report generation means that generates and provides analysis results as a report,
[0901] A dialogue means that interacts with users and provides additional information,
[0902] A means of operation on a portable device,
[0903] A system that includes this.
[0904] (Claim 2)
[0905] The system according to claim 1, further comprising determination means for determining the type of received data and applying optical recognition processing as necessary.
[0906] (Claim 3)
[0907] The system according to claim 1, further comprising language support means capable of providing analysis results in multiple languages.
[0908] "Example 2 of combining an emotion engine"
[0909] (Claim 1)
[0910] Means of receiving information,
[0911] A means of extracting textual information from received information,
[0912] A means of analyzing extracted textual information to identify potential hazards,
[0913] A means of analyzing the visual information contained in the information and extracting important elements,
[0914] A means of generating and providing the analysis results as a report,
[0915] A means of recognizing the user's emotional state and providing the analysis results in an adjusted manner,
[0916] A means of interacting with users, providing additional information, and coordinating feedback,
[0917] A system that includes this.
[0918] (Claim 2)
[0919] The system according to claim 1, comprising means for determining the format of received information and, if necessary, performing character recognition processing.
[0920] (Claim 3)
[0921] The system according to claim 1, comprising means for providing analysis results in multiple languages.
[0922] "Application example 2 when combining with an emotional engine"
[0923] (Claim 1)
[0924] Means of receiving information,
[0925] A means of extracting textual information from received information,
[0926] A means of analyzing extracted textual information and identifying risk elements,
[0927] A means of analyzing the diagrams contained in the information and extracting important elements,
[0928] A means of generating and providing the analysis results as a report,
[0929] A means of providing additional information through dialogue with users,
[0930] An emotion recognition means that estimates the user's emotional state and adjusts the content of the dialogue based on that estimate,
[0931] A system that includes this.
[0932] (Claim 2)
[0933] The system according to claim 1, further comprising means for determining the format of received information and applying optical character recognition processing as necessary.
[0934] (Claim 3)
[0935] The system according to claim 1, comprising means for providing analysis results in multiple languages. [Explanation of Symbols]
[0936] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A means for receiving data, An information extraction means for extracting information from received data, A means for identifying risk factors by analyzing the extracted information, A visual analysis means for analyzing the visual information contained in the data and extracting important information, A report generation means that generates and provides analysis results as a report, A dialogue means that interacts with users and provides additional information, A means of operation on a portable device, A system that includes this.
2. The system according to claim 1, further comprising determination means for determining the type of received data and applying optical recognition processing as necessary.
3. The system according to claim 1, further comprising language support means capable of providing analysis results in multiple languages.