system
A system automatically generates manga from image and text data, addressing the challenge of converting personal memories into visually impressive formats without requiring specialized skills, enabling easy creation and sharing of professional-quality comics.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies lack an easy and effective way for the general public to convert personal memories into visually impressive forms, such as comics, requiring advanced technology and specialized knowledge.
A system that acquires image and natural language data to automatically generate manga by analyzing recognition information and story elements using a generative model, allowing users to preserve and enjoy their memories without specialized skills.
Enables users to easily create and share professional-quality comics that visually represent their memories, improving user experience and reducing the need for manual editing.
Smart Images

Figure 2026105477000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In modern times, there are various means to save individual memories as digital data, but it is not easy to convert them into a visual and impressive form. In particular, expressing memories in a form such as a comic requires advanced technology and specialized knowledge, and it is difficult for the general public to use easily. The purpose of the present invention is to solve this problem and provide a means for a user to easily save and display their memories as comics.
Means for Solving the Problems
[0005] ,[] This invention provides a system that acquires image data and natural language data and automatically generates data in manga format. Specifically, it comprises means for analyzing image data and extracting recognition information, and means for analyzing natural language data and extracting story elements. Furthermore, it provides means for automatically generating manga based on the extracted recognition information and story elements using a generative model. In this way, users can visually preserve and enjoy their memories without requiring any special skills.
[0006] "Image data" refers to information that represents visual information stored in digital format.
[0007] "Natural language data" refers to text information based on the language that humans use on a daily basis.
[0008] "Recognition information" refers to information extracted through image data analysis that includes identifiable elements such as people, objects, and backgrounds.
[0009] "Story elements" refer to information extracted through natural language data analysis that includes elements such as characters, events, and emotional expressions necessary to construct a narrative.
[0010] A "generative model" is an algorithm and method that uses machine learning to generate new data based on input information.
[0011] "Manga data" refers to digital information that is visually and narratively structured and presented as a whole in the form of a manga.
[0012] "User terminal" refers to electronic devices such as computers, smartphones, and tablets that are directly operated by the user.
[0013] "Automatic generation" is a process in which a system itself creates new content based on information, without requiring human intervention. [Brief explanation of the drawing]
[0014] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.
Modes for Carrying Out the Invention
[0016] First, the terms used in the following description will be explained.
[0017] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0018] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0019] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0020] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0022] [First Embodiment]
[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0035] This invention is a system that uses image data and natural language data provided by the user as input to generate story-based comic data from this data. This system operates between the user's terminal and a central server.
[0036] Users first access the system via a device such as a smartphone or computer. The user's device displays an interface for inputting memorable episodes in natural language. Users can also upload related image data. At this stage, users provide specific information that constitutes their memories.
[0037] The terminal sends the input text and image data to the server. The server receives this data and uses a text analysis engine to extract story elements from the natural language data. Meanwhile, the image analysis engine scans the image data to obtain recognition information. This provides the basic elements for generating a manga from the input data.
[0038] The server integrates the generated story elements and recognition information, and uses a generative model to create comic data. This generative model utilizes machine learning techniques to automatically determine drawing styles, panel layouts, character placement, and more.
[0039] The generated manga data is sent to the user's device. The user can view the completed manga on their device and request revisions as needed. Finally, the user can save the manga data to their device and share it with friends and family.
[0040] For example, if a user wants to save memories of a family trip as a comic, they select photos taken during the trip and enter the story of the trip in text. The system then automatically generates a comic that reflects the atmosphere and episodes of the trip and delivers it to the user. This allows users to easily create professional-quality comics without having to worry about the production process.
[0041] The following describes the processing flow.
[0042] Step 1:
[0043] Users use a terminal to access the system interface and enter episodes related to their memories into a text field. They also select relevant photo data and upload it to the system.
[0044] Step 2:
[0045] The terminal converts the input text and image data into a predetermined format (e.g., JSON format) and prepares it for transfer to the server over the network in order to send it to the server.
[0046] Step 3:
[0047] The server receives data sent from the terminal, analyzes the data format, and separates it into text data and image data. During this process, it verifies that the received data is correctly analyzed.
[0048] Step 4:
[0049] The server's text analysis engine uses natural language processing techniques to analyze text data and extract narrative elements such as characters, events, and emotional expressions.
[0050] Step 5:
[0051] The server's image analysis engine processes image data and uses deep learning techniques to extract recognition information about people, backgrounds, and objects within the image. In this step, the image features are represented as numerical vectors.
[0052] Step 6:
[0053] The server integrates story elements from the text analysis engine and recognition information from the image analysis engine. Based on this integrated information, the generative model automatically generates manga data.
[0054] Step 7:
[0055] The generated comic data is laid out into multiple comic panels, visually representing the flow of the story. The server converts this into a digital data format (e.g., PDF or JPEG) and prepares it for transmission to the user's device.
[0056] Step 8:
[0057] The user's device receives the manga data sent from the server and displays it on the interface. The user can review the manga and save the final version by sending a request for corrections back to the server as needed.
[0058] (Example 1)
[0059] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0060] Traditional manga generation systems require users to manually edit image data and story elements in detail, resulting in a cumbersome and time-consuming process. Furthermore, there are challenges in guaranteeing the quality of the automatically generated manga and whether it aligns with the user's intentions.
[0061] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0062] In this invention, the server includes means for acquiring digital visual information using an information recording device, means for acquiring descriptive data using an information recording device, and means for analyzing the digital visual information and extracting recognition information. This enables the automatic generation of high-quality, consistent comics simply by the user intuitively providing data.
[0063] An "information recording device" is an electronic device used to acquire digital visual information and descriptive data, and it is a device that plays a role in recording user-input data.
[0064] "Digital visual information" refers to data that stores visual representations such as images and videos in digital format, and is information that has been converted into a format that can be processed by a computer.
[0065] "Descriptive data" refers to text information written in natural language, and is digital information that includes the user's intent and story elements.
[0066] "Recognition information" refers to data extracted as a result of analyzing digital visual information and identifying specific objects or scenes.
[0067] "Abstract concepts" refer to story elements and main themes extracted from descriptive data through natural language processing, and are fundamental information for manga generation.
[0068] "Image information" refers to a series of manga data automatically generated using a generative AI model based on the aforementioned recognition information and abstract concepts, and is information that is visually represented.
[0069] This system generates image data that visually represents the user's experiences and intentions by linking the user's digital device with a central data processing unit. The system includes an information recording device as a digital device and a central server with powerful data processing capabilities.
[0070] Users access this system using information recording devices such as smartphones and personal computers. Users can input text data related to specific memories, such as trips or events, through the interface. They can also provide digital visual information by uploading photographs they have taken or existing visual data.
[0071] The central server analyzes the received descriptive data and digital visual information. First, the server processes the descriptive data using natural language processing software to extract key themes and story elements as "abstract concepts." Simultaneously, it uses image processing software to obtain "recognition information" from the digital visual information.
[0072] In the next stage of this process, the generative AI model is launched on the server. The generative AI model utilizes deep learning techniques to integrate extracted abstract concepts and cognitive information, automatically generating image data that reflects the user's intent. The generated manga data is then promptly delivered to the user's device.
[0073] As an example, consider a user who wants to save memories of a family trip as a comic strip. The user selects photos of the beach taken during the trip and inputs the story of the trip along with a prompt such as "Please turn the memories of our fun family trip into a comic strip." The system can then provide a professional-quality comic strip based on the memories.
[0074] Thus, this system greatly improves the user experience by providing users with an easy way to obtain high-quality visual representations.
[0075] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0076] Step 1:
[0077] Users access the system using devices such as smartphones and personal computers. Through the user interface, they input prompts and memorable episodes in text format, and also select and upload related image data. The input here consists of text data and image files, which are temporarily stored on the device.
[0078] Step 2:
[0079] The terminal sends the input text and image data to the server. The input here consists of text and image data provided by the user, and the output is the data passed to the server. Encryption protocols are used throughout this process for security.
[0080] Step 3:
[0081] The server processes the received data. First, it uses a natural language processing engine to analyze the text data and extract story elements. Specifically, it extracts keywords and analyzes the structure of the sentences to extract the main elements of the narrative. This becomes the output as an "abstract concept" based on the input text.
[0082] Step 4:
[0083] In parallel, the server scans the image data using an image analysis engine. It identifies objects and landscapes within the image and generates recognition information. Here, image recognition algorithms are in operation, and key features are identified by computer vision technology. Recognition information is extracted, and output based on the image data is obtained.
[0084] Step 5:
[0085] The server uses a generative AI model to integrate previously obtained abstract concepts and cognitive information, automatically generating manga data. This process leverages a pre-trained dataset of deep learning models to automatically adjust drawing styles, panel layouts, and character placement. The output here is the completed manga data.
[0086] Step 6:
[0087] Finally, the server sends the generated manga data to the user's terminal. In this step, the manga data is converted to an appropriate format so that the user can review the results, and then transferred to the terminal via a communication protocol. The user can then review the manga on their terminal and request corrections if necessary.
[0088] (Application Example 1)
[0089] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0090] In recent years, advancements in information technology have made it commonplace to easily record personal information digitally. However, there is still a lack of convenient means to visualize everyday episodes and events as stories and save them in a way that can be shared with many people. Furthermore, creating and sharing high-quality graphic content without requiring specialized skills still requires considerable time and effort from users. Solving this problem is essential.
[0091] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0092] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for automatically generating graphic data based on recognition elements and narrative elements using a data generation model. This makes it possible for users to easily generate everyday episodes as visual stories without requiring special skills and easily share them with others through an information distribution device.
[0093] "Image information" refers to visual data recorded in digital format, which forms the basis for system analysis.
[0094] "Natural language information" refers to data written in the language that humans use orally or in writing, and is used to extract story elements.
[0095] "Recognition elements" refer to data about specific features or objects that are extracted from image information through analysis.
[0096] "Narrative elements" refer to data about the constituent elements and themes of a story, extracted through analysis of natural language information.
[0097] A "data generation model" is an algorithm that uses machine learning techniques to generate graphic data based on the results of image and natural language analysis.
[0098] "Graphic data" refers to a visual representation format generated based on information entered by the user, and is intended to visually communicate its content.
[0099] An "information distribution device" is an electronic device used to share or distribute generated graphic data to other users.
[0100] This invention is a system for generating digital graphic content based on specific information and sharing it with others. A specific embodiment of this system will be described here.
[0101] First, the user inputs image information and natural language information into a device. This device is expected to be a smartphone or computer, and the user can easily record the collected image data and related episodes in natural language. This information is then sent to the server via an application installed on the device.
[0102] The server uses image analysis software (e.g., image_lib) to extract necessary recognition elements from images in order to handle image information. At the same time, it uses a natural language processing engine (e.g., nlp_lib) to process natural language information and extract narrative elements that form the basis of the story.
[0103] Next, a data generation model (e.g., comic_generation_model) is used to integrate recognition elements and narrative elements to generate graphic data. Based on these elements, the generation AI model automatically creates visually appealing comic and story-format content.
[0104] The generated graphic data is sent to the user's device. Users can then share the generated content with others via social media, email, or content distribution devices. This makes it possible to create and widely distribute professional-quality visual stories even without advanced design skills.
[0105] A concrete example is a scenario where a user wants to generate graphic data based on memories of a holiday trip. The user imports photos taken during the trip into their device and inputs natural language information such as, "It was a wonderful trip. The sea was beautiful, and I had a great time with my family." The server generates a visually rich comic from this information and assists the user in sharing it with friends.
[0106] Examples of input prompts for a generative AI model:
[0107] "Based on the text and image information provided by the user, please generate a vivid and emotionally resonant comic. The style should be bright and colorful, conveying the joy of family."
[0108] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0109] Step 1:
[0110] The user inputs image information and natural language information into the device. The user selects photos taken during trips or events through a dedicated application on the device and describes these images and related episodes in natural language. This information is collected by the application and prepared for transmission to the system. The input data consists of selected image files and text information written by the user.
[0111] Step 2:
[0112] Data is sent from the terminal to the server. The application sends collected image and natural language information to the server via the internet. The input data consists of image files (e.g., JPEG format) and text data. The server receives these and prepares them for the next analysis step.
[0113] Step 3:
[0114] The server performs image analysis. Using image analysis software (e.g., image_lib), the server extracts recognition elements from the received image information. Specifically, it performs object recognition, scene analysis, and color analysis within the image, generating information for linking with text information. The output includes feature quantities for each part of the image and a list of recognized objects.
[0115] Step 4:
[0116] The server performs text analysis. Using a natural language processing engine (e.g., nlp_lib), the server extracts narrative elements from the received natural language information. Specifically, it performs semantic analysis, keyword extraction, and contextual understanding of the text. The output is the analysis result, including the story's themes and key points.
[0117] Step 5:
[0118] The server generates graphic data. Using a data generation model (e.g., comic_generation_model), it generates visual graphic data based on the analyzed recognition and narrative elements. Specifically, it performs image style conversion, layout determination, and automatic illustration generation. The output is graphic data in a completed story format (e.g., a comic-style image file).
[0119] Step 6:
[0120] The server sends the generated graphic data to the terminal. The server then sends the generated graphic data back to the user's terminal for the user to review. The input is the generated graphic data, and the output is the process of transferring this data to the user's terminal.
[0121] Step 7:
[0122] The user reviews the generated graphic data and requests corrections as needed. The user can view the generated graphic content on their terminal and, if necessary, send correction requests to the server via the application. The input is user feedback on corrections, and the output is an opportunity for the server to receive this feedback and reprocess it.
[0123] Step 8:
[0124] Users share the final version of the graphic data. Users can share the reviewed graphic data with others through social media or content distribution devices. The input consists of the final approved graphic data and the designated recipients for sharing, and the output is an electronic distribution process based on this.
[0125] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0126] This invention is a system that acquires user image data and natural language data, generates story-based comic data from them, and further combines this with emotion recognition functionality to realize comic expressions that reflect the user's emotions. This system consists of the user's terminal and a central server.
[0127] Users access the system via a terminal, inputting episodes related to memories and uploading associated image data. In addition, the user's natural language data may contain emotional information. At this stage, an emotion engine is activated to recognize the user's emotions. This engine analyzes the tone and emotion of the user's statements from the text data. In some cases, voice data is also used to improve the accuracy of emotion recognition.
[0128] The server analyzes the received text data through a text analysis engine to extract characters, events, and perceived emotions from the story. Similarly, image data is analyzed through an image analysis engine to obtain perceived information. This data is correlated, and generative models are used to generate manga data.
[0129] The generative model uses story elements and cognitive information to determine appropriate character expressions and tones that reflect the user's emotions. The generated comic data is automatically adjusted to create a story that aligns with the user's intentions and emotions as a whole.
[0130] The generated comic data is sent to the user's device, where they can review its contents on the interface. Users can also request revisions to the comic's story and expression, and ultimately, the comic is saved or shared by the user. For example, if a user creates a comic about a fun family trip, the emotions of joy and surprise inferred from the audio and text will be reflected in the characters' lively expressions and the tone of the story. This allows users to easily create emotionally rich, professional-quality comics.
[0131] The following describes the processing flow.
[0132] Step 1:
[0133] Users access the system via their device, input episodes related to their memories in text format, and select and upload relevant image data. Audio data can also be recorded and provided as needed.
[0134] Step 2:
[0135] The terminal converts the input text data, image data, and audio data into a predetermined format and prepares them for transmission to the server.
[0136] Step 3:
[0137] The server receives data sent from the terminal and first passes the text data to a natural language processing engine for analysis. This analysis extracts story elements, characters, events, and emotional information.
[0138] Step 4:
[0139] The emotion engine processes text and audio data to analyze the user's emotions (e.g., joy, sadness, surprise) and adds this emotional information to story elements.
[0140] Step 5:
[0141] The server uses an image analysis engine to scan image data and identify people, backgrounds, and objects within the image. This information is acquired as recognition data.
[0142] Step 6:
[0143] The server integrates story elements extracted from text, emotional information, and recognition information from images, and inputs them into a generative model. This model automatically generates the comic's storyline and visual elements.
[0144] Step 7:
[0145] The generative model meticulously adjusts character expressions and story tone to reflect user emotions, creating completed manga data.
[0146] Step 8:
[0147] The server sends the generated manga data to the user's device. The user can view the received manga on the interface and save or share it in their preferred format. They can also request further adjustments from the system as needed.
[0148] (Example 2)
[0149] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0150] Modern information processing devices lack the means to easily generate content that expresses users' memories and experiences in a unique and emotionally rich way. Furthermore, existing technologies suffer from low accuracy in emotion recognition, leading to frequent discrepancies between the generated content and the user's intentions. Additionally, there is the challenge of efficiently incorporating user feedback on the generated content.
[0151] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0152] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for automatically generating content data based on the feature information and story elements using a generative model that takes the emotional information into consideration. This enables the generation of unique content that reflects the user's emotions and intentions, and flexible content modification based on user feedback.
[0153] "Image information" refers to data perceived through vision, and includes digital images such as photographs, illustrations, and diagrams.
[0154] "Natural language information" refers to information that includes the words and sentences that humans normally use, and includes audio data and text data.
[0155] "Feature information" refers to specific attributes or patterns extracted from image information, including shape, color, and various structural features.
[0156] "Story elements" are factors and components that form the framework of a narrative, obtained by analyzing natural language information, and include characters, events, and settings.
[0157] "Emotional information" refers to information about the emotions or psychological state that a user is trying to express, and is identified through the analysis of natural language information or voice data.
[0158] A "generative model" is an algorithm or system that learns from large amounts of data to generate new data, and is particularly used for the automatic generation of content.
[0159] "Content data" refers to digitized stories and works generated based on user input, and includes visual or textual expressions.
[0160] "Output device" refers to a device or system for displaying generated content data so that a user can view it, and includes computer screens and mobile device displays.
[0161] An "edit request" is an instruction from a user to correct or change generated content data, including revisions to the story, facial expressions, and tone.
[0162] This invention is an information processing system for generating emotionally rich comics based on users' memories and experiences. The system operates primarily through the cooperation of a server and a user terminal.
[0163] Users first access the system using their device and upload text, audio, and image information as natural language data. The information users input includes stories about memories and experiences, as well as photos taken during those times. The information entered by the user on their device is immediately sent to the server.
[0164] The server uses a natural language processing engine to analyze the input natural language information. It also incorporates emotion recognition capabilities, identifying the user's emotions from the input text and audio. This emotion information is used to adjust character portrayals and narrative tone in the generated content. Furthermore, the server utilizes an image analysis engine to extract character expressions and situations as feature information from image data.
[0165] Next, the server uses a generative AI model based on these feature information and story elements to automatically generate professional-quality comic data. This generative AI model is trained on a large amount of sample data, enabling it to generate content that matches the user's emotions and intentions. The generated comic data is a story that reflects the user's intentions and includes lively character portrayals. For example, if you want to turn a fun family trip into a comic, you can use a prompt such as, "Create a comic that reflects the fun of our family trip." The comic generated based on this prompt will frequently feature smiling characters and bright colors.
[0166] Finally, the server sends the generated comic data to the user's device. The user can review the generated content on their device and, if necessary, request revisions to the expression or storyline. This feedback is sent back to the server, and the content data is re-edited. Finally, the user can save or share the edited comic data. This makes it easy to generate and revise emotionally rich, professional-quality comics.
[0167] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0168] Step 1:
[0169] The user accesses the system using a terminal. The user inputs natural language information (text and audio data) and image information. This input is then prepared for transmission to the server. Specifically, the user selects a file on the terminal and presses the send button. The input data is then sent to the server via a communication protocol.
[0170] Step 2:
[0171] The server analyzes the received natural language information. Here, a natural language processing engine runs, analyzing sentence structure and vocabulary from the text data. An emotion recognition module is also used to identify the user's emotions from their mood and tone. The output consists of story elements and emotional information. Specifically, it calculates the emotional value of each word and phrase, and then combines these values to form the basic elements of the story.
[0172] Step 3:
[0173] The server analyzes the received image information. The image analysis engine runs and extracts key features from the image data. This includes face recognition and background information processing. The output is character expressions and key visual components. Specifically, it quantifies things like the smiles of people in the image and the brightness of the scenery, generating data to be used in the story.
[0174] Step 4:
[0175] The server automatically generates manga data using a generative AI model. Here, story elements and feature information are taken as input, and content based on the user's emotional information is generated. The output is manga data of professional quality. Specifically, the generative model reflects the emotional information and adjusts the characters' expressions and color tones. For example, it might use bright colors to express joy.
[0176] Step 5:
[0177] The server sends the completed manga data to the user's device. The user then reviews the data on their device. The generated manga appears on the display screen as output. Specifically, the user can view the manga through the device's interface and request revisions as needed. If further editing is required, the data is sent back to the server.
[0178] Step 6:
[0179] The user performs a final check and saves or shares the manga data. The device provides an interface to assist these user operations. As output, the final content is saved in a file format or sent to the selected sharing platform. Specifically, the user presses the "Save" button to store the data in a selected folder or uploads it to social media, etc.
[0180] (Application Example 2)
[0181] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0182] Traditionally, reflecting on family memories and everyday events has been limited to formats such as photographs, videos, and text, making it difficult to deeply express the nuances of emotions and events. Families with young children, in particular, need to record and share memories in a way that reflects their children's expressions and emotions. Furthermore, there is a lack of means to strengthen communication among family members and share memories in a more enjoyable way.
[0183] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0184] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for analyzing emotion data to identify emotional elements. This makes it possible to analyze various events in daily life along with emotional elements, generate and share family memories as expressive content.
[0185] "Means for acquiring image information" refers to a device or function for taking in image data from an external source and converting it into a format that can be processed within the system.
[0186] "Means for acquiring natural language information" refers to devices or functions that collect text data in human language and incorporate it into a system that can understand it.
[0187] "Means for extracting recognition data" refers to a device or function that analyzes image information, grasps the characteristics of objects, people, etc., and acquires them as data.
[0188] "Means for extracting narrative elements" refers to a device or function that analyzes natural language information and identifies elements that constitute a narrative, such as context and character settings.
[0189] "Means for analyzing emotional data and identifying emotional elements" refers to a device or function that analyzes text or audio data, determines the speaker's emotions, and extracts information based on that determination.
[0190] "Means for automatically generating content data using a generation algorithm" refers to a device or function that automatically creates new content using an algorithm based on existing data.
[0191] "Means for displaying on a display device" refers to a device or function that provides generated content data to the user visually.
[0192] "Means of transmitting to an information terminal" refers to a device or function that sends generated data to a terminal via a network and makes it receivable.
[0193] "Means for receiving change requests and editing content data" refers to a device or function that reflects user requests for modification and changes the generated data.
[0194] This system consists of home information terminals and a central server. First, users access the system through their home information terminals and input image data related to their memories and descriptions in natural language. The terminals are equipped with high-performance cameras and microphones, enabling them to acquire image and audio information with high accuracy.
[0195] The device sends the acquired image information to the server, where an image analysis engine operates to extract recognition data from the image. In addition, the device sends the user's speech content to the server as text data, where a natural language processing engine operates to analyze narrative elements and emotional data. For emotional analysis, an advanced AI engine is used to precisely identify the user's emotions. In this case, software such as Amazon Web Services' Rekognition may be used.
[0196] Once recognition data, narrative elements, and emotional elements are extracted, the server invokes a generation algorithm to generate comic data with a specific storyline based on these elements. This generation process utilizes either TENSORFLOW® or PyTorch, both open-source AI frameworks. The generated comic data is automatically adjusted to align with the story's flow and the user's emotions.
[0197] The completed content data is sent back to the user's home information terminal and displayed on the terminal's screen. The user can review the displayed content and, if necessary, send a correction request to the server. If a correction request is received, the server runs the algorithm again and automatically modifies the content data.
[0198] As a concrete example, if you want to create a comic strip about your child's sports day memories, you would input a prompt into the AI model saying, "Please generate a heroic and moving comic strip based on the moment when the user felt proud of their child's performance at the sports day," and appropriate content would be generated. In this way, family memories are recorded with rich emotion, and the whole family can enjoy it.
[0199] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0200] Step 1:
[0201] Users access the system through a home information terminal and input image data and text in natural language related to specific memories. This input image data and text are temporarily stored on the terminal. The terminal uses a camera and microphone to acquire high-resolution data.
[0202] Step 2:
[0203] The terminal sends the acquired image data to the server. During this process, the image data is compressed and securely transferred over the network. Once the image data arrives at the server, the server's image analysis engine begins operation.
[0204] Step 3:
[0205] The server uses an image analysis engine to analyze image data and extract recognition data. Specifically, it recognizes people and objects within the image and organizes that information as metadata. As a result of the analysis, results with specific tags are output.
[0206] Step 4:
[0207] Based on the natural language text entered by the user through the device, the device sends that data to the server. The natural language processing engine on the server receives the text and analyzes the text data.
[0208] Step 5:
[0209] The server uses a natural language processing engine to extract narrative and emotional elements from the input text. This process analyzes emotion-related words and context within the text to understand the user's emotions. The extracted data is output as narrative elements.
[0210] Step 6:
[0211] The server uses recognition data, narrative elements, and emotional elements obtained as a result of image analysis and natural language processing to invoke a generation algorithm and use this data to generate manga data with a specific storyline. A generation AI model is responsible for the operation, outputting thoughtfully designed text and prompts.
[0212] Step 7:
[0213] The generated manga data is sent from the server to the user's home information terminal. It is displayed on the terminal's screen and becomes available for the user to view.
[0214] Step 8:
[0215] The user reviews the generated content and, if necessary, sends a correction request from their device to the server. Based on this request, the server regenerates the content data to reflect the changes. The corrected content is then sent back to the device and displayed.
[0216] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0217] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0218] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0219] [Second Embodiment]
[0220] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0221] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0222] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0223] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0224] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0225] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0226] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0227] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0228] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0229] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0230] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0231] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0232] This invention is a system that uses image data and natural language data provided by the user as input to generate story-based comic data from this data. This system operates between the user's terminal and a central server.
[0233] Users first access the system via a device such as a smartphone or computer. The user's device displays an interface for inputting memorable episodes in natural language. Users can also upload related image data. At this stage, users provide specific information that constitutes their memories.
[0234] The terminal sends the input text and image data to the server. The server receives this data and uses a text analysis engine to extract story elements from the natural language data. Meanwhile, the image analysis engine scans the image data to obtain recognition information. This provides the basic elements for generating a manga from the input data.
[0235] The server integrates the generated story elements and recognition information, and uses a generative model to create comic data. This generative model utilizes machine learning techniques to automatically determine drawing styles, panel layouts, character placement, and more.
[0236] The generated manga data is sent to the user's device. The user can view the completed manga on their device and request revisions as needed. Finally, the user can save the manga data to their device and share it with friends and family.
[0237] For example, if a user wants to save memories of a family trip as a comic, they select photos taken during the trip and enter the story of the trip in text. The system then automatically generates a comic that reflects the atmosphere and episodes of the trip and delivers it to the user. This allows users to easily create professional-quality comics without having to worry about the production process.
[0238] The following describes the processing flow.
[0239] Step 1:
[0240] Users use a terminal to access the system interface and enter episodes related to their memories into a text field. They also select relevant photo data and upload it to the system.
[0241] Step 2:
[0242] The terminal converts the input text and image data into a predetermined format (e.g., JSON format) and prepares it for transfer to the server over the network in order to send it to the server.
[0243] Step 3:
[0244] The server receives data sent from the terminal, analyzes the data format, and separates it into text data and image data. During this process, it verifies that the received data is correctly analyzed.
[0245] Step 4:
[0246] The server's text analysis engine uses natural language processing techniques to analyze text data and extract narrative elements such as characters, events, and emotional expressions.
[0247] Step 5:
[0248] The server's image analysis engine processes image data and uses deep learning techniques to extract recognition information about people, backgrounds, and objects within the image. In this step, the image features are represented as numerical vectors.
[0249] Step 6:
[0250] The server integrates story elements from the text analysis engine and recognition information from the image analysis engine. Based on this integrated information, the generative model automatically generates manga data.
[0251] Step 7:
[0252] The generated comic data is laid out into multiple comic panels, visually representing the flow of the story. The server converts this into a digital data format (e.g., PDF or JPEG) and prepares it for transmission to the user's device.
[0253] Step 8:
[0254] The user's device receives the manga data sent from the server and displays it on the interface. The user can review the manga and save the final version by sending a request for corrections back to the server as needed.
[0255] (Example 1)
[0256] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0257] Traditional manga generation systems require users to manually edit image data and story elements in detail, resulting in a cumbersome and time-consuming process. Furthermore, there are challenges in guaranteeing the quality of the automatically generated manga and whether it aligns with the user's intentions.
[0258] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0259] In this invention, the server includes means for acquiring digital visual information using an information recording device, means for acquiring descriptive data using an information recording device, and means for analyzing the digital visual information and extracting recognition information. This enables the automatic generation of high-quality, consistent comics simply by the user intuitively providing data.
[0260] An "information recording device" is an electronic device used to acquire digital visual information and descriptive data, and it is a device that plays a role in recording user-input data.
[0261] "Digital visual information" refers to data that stores visual representations such as images and videos in digital format, and is information that has been converted into a format that can be processed by a computer.
[0262] "Descriptive data" refers to text information written in natural language, and is digital information that includes the user's intent and story elements.
[0263] "Recognition information" refers to data extracted as a result of analyzing digital visual information and identifying specific objects or scenes.
[0264] "Abstract concepts" refer to story elements and main themes extracted from descriptive data through natural language processing, and are fundamental information for manga generation.
[0265] "Image information" refers to a series of manga data automatically generated using a generative AI model based on the aforementioned recognition information and abstract concepts, and is information that is visually represented.
[0266] This system generates image data that visually represents the user's experiences and intentions by linking the user's digital device with a central data processing unit. The system includes an information recording device as a digital device and a central server with powerful data processing capabilities.
[0267] Users access this system using information recording devices such as smartphones and personal computers. Users can input text data related to specific memories, such as trips or events, through the interface. They can also provide digital visual information by uploading photographs they have taken or existing visual data.
[0268] The central server analyzes the received descriptive data and digital visual information. First, the server processes the descriptive data using natural language processing software to extract key themes and story elements as "abstract concepts." Simultaneously, it uses image processing software to obtain "recognition information" from the digital visual information.
[0269] In the next stage of this process, the generative AI model is launched on the server. The generative AI model utilizes deep learning techniques to integrate extracted abstract concepts and cognitive information, automatically generating image data that reflects the user's intent. The generated manga data is then promptly delivered to the user's device.
[0270] As an example, consider a user who wants to save memories of a family trip as a comic strip. The user selects photos of the beach taken during the trip and inputs the story of the trip along with a prompt such as "Please turn the memories of our fun family trip into a comic strip." The system can then provide a professional-quality comic strip based on the memories.
[0271] Thus, this system greatly improves the user experience by providing users with an easy way to obtain high-quality visual representations.
[0272] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0273] Step 1:
[0274] Users access the system using devices such as smartphones and personal computers. Through the user interface, they input prompts and memorable episodes in text format, and also select and upload related image data. The input here consists of text data and image files, which are temporarily stored on the device.
[0275] Step 2:
[0276] The terminal sends the input text and image data to the server. The input here consists of text and image data provided by the user, and the output is the data passed to the server. Encryption protocols are used throughout this process for security.
[0277] Step 3:
[0278] The server processes the received data. First, it analyzes the text data using a natural language processing engine to extract story elements. Specifically, it extracts keywords, analyzes the sentence structure, and extracts the main elements of the story. This becomes the output as an "abstract concept" based on the input text.
[0279] Step 4:
[0280] In parallel, the server scans the image data using an image analysis engine. It identifies objects and landscapes within the image and generates recognition information. Here, the image recognition algorithm operates, and the main functions are specified by computer vision technology. The recognition information is extracted, and an output based on the image data is obtained.
[0281] Step 5:
[0282] The server uses a generative AI model to integrate the previously obtained abstract concept and recognition information and automatically generates comic data. In this process, the deep learning model utilizes a pre-trained dataset to automatically adjust the drawing style, frame division, and character placement. The output here is the completed comic data.
[0283] Step 6:
[0284] Finally, the server sends the generated comic data to the user's terminal. In this step, the comic data is converted into an appropriate format so that the user can view the results and transferred to the terminal via a communication protocol. The user can view the comic on the terminal and request corrections if necessary.
[0285] (Application Example 1)
[0286] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".
[0287] In recent years, advancements in information technology have made it commonplace to easily record personal information digitally. However, there is still a lack of convenient means to visualize everyday episodes and events as stories and save them in a way that can be shared with many people. Furthermore, creating and sharing high-quality graphic content without requiring specialized skills still requires considerable time and effort from users. Solving this problem is essential.
[0288] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0289] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for automatically generating graphic data based on recognition elements and narrative elements using a data generation model. This makes it possible for users to easily generate everyday episodes as visual stories without requiring special skills and easily share them with others through an information distribution device.
[0290] "Image information" refers to visual data recorded in digital format, which forms the basis for system analysis.
[0291] "Natural language information" refers to data written in the language that humans use orally or in writing, and is used to extract story elements.
[0292] "Recognition elements" refer to data about specific features or objects that are extracted from image information through analysis.
[0293] "Narrative elements" refer to data about the constituent elements and themes of a story, extracted through analysis of natural language information.
[0294] A "data generation model" is an algorithm that uses machine learning techniques to generate graphic data based on the results of image and natural language analysis.
[0295] "Graphic data" refers to a visual representation format generated based on information entered by the user, and is intended to visually communicate its content.
[0296] An "information distribution device" is an electronic device used to share or distribute generated graphic data to other users.
[0297] This invention is a system for generating digital graphic content based on specific information and sharing it with others. A specific embodiment of this system will be described here.
[0298] First, the user inputs image information and natural language information into a device. This device is expected to be a smartphone or computer, and the user can easily record the collected image data and related episodes in natural language. This information is then sent to the server via an application installed on the device.
[0299] The server uses image analysis software (e.g., image_lib) to extract necessary recognition elements from images in order to handle image information. At the same time, it uses a natural language processing engine (e.g., nlp_lib) to process natural language information and extract narrative elements that form the basis of the story.
[0300] Next, a data generation model (e.g., comic_generation_model) is used to integrate recognition elements and narrative elements to generate graphic data. Based on these elements, the generation AI model automatically creates visually appealing comic and story-format content.
[0301] The generated graphic data is sent to the user's device. Users can then share the generated content with others via social media, email, or content distribution devices. This makes it possible to create and widely distribute professional-quality visual stories even without advanced design skills.
[0302] As a specific example, a scenario where a user wants to generate graphic data based on memories of a holiday trip can be considered. The user imports the photos taken during the trip into the terminal and inputs natural language information such as "It was a wonderful trip. The sea was beautiful and I had a great time with my family." The server generates a visually rich comic from this information and supports the process for the user to share it with friends.
[0303] Example of an input prompt sentence for the generation AI model:
[0304] "Please generate a comic rich in scenarios and evoking emotions based on the text and image information provided by the user. The style should be bright and colorful, and convey the joy of the family."
[0305] The flow of the specific process in Application Example 1 will be described using FIG. 12.
[0306] Step 1:
[0307] The user inputs image information and natural language information into the terminal. The user selects the photos taken during the trip or event through a dedicated application on the terminal and describes the episodes related to these images in natural language. This information is collected by the application and prepared for transmission to the system. The data to be input is the selected image file and the text information described by the user.
[0308] Step 2:
[0309] Data is transmitted from the terminal to the server. The application transmits the collected image information and natural language information to the server via the Internet. The data to be input is an image file (e.g., in JPEG format) and text data. The server receives these and prepares them for the next analysis step.
[0310] Step 3:
[0311] The server performs image analysis. Using image analysis software (e.g., image_lib), the server extracts recognition elements from the received image information. Specifically, it performs object recognition, scene analysis, and color analysis within the image, generating information for linking with text information. The output includes feature quantities for each part of the image and a list of recognized objects.
[0312] Step 4:
[0313] The server performs text analysis. Using a natural language processing engine (e.g., nlp_lib), the server extracts narrative elements from the received natural language information. Specifically, it performs semantic analysis, keyword extraction, and contextual understanding of the text. The output is the analysis result, including the story's themes and key points.
[0314] Step 5:
[0315] The server generates graphic data. Using a data generation model (e.g., comic_generation_model), it generates visual graphic data based on the analyzed recognition and narrative elements. Specifically, it performs image style conversion, layout determination, and automatic illustration generation. The output is graphic data in a completed story format (e.g., a comic-style image file).
[0316] Step 6:
[0317] The server sends the generated graphic data to the terminal. The server then sends the generated graphic data back to the user's terminal for the user to review. The input is the generated graphic data, and the output is the process of transferring this data to the user's terminal.
[0318] Step 7:
[0319] The user reviews the generated graphic data and requests corrections as needed. The user can view the generated graphic content on their terminal and, if necessary, send correction requests to the server via the application. The input is user feedback on corrections, and the output is an opportunity for the server to receive this feedback and reprocess it.
[0320] Step 8:
[0321] Users share the final version of the graphic data. Users can share the reviewed graphic data with others through social media or content distribution devices. The input consists of the final approved graphic data and the designated recipients for sharing, and the output is an electronic distribution process based on this.
[0322] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0323] This invention is a system that acquires user image data and natural language data, generates story-based comic data from them, and further combines this with emotion recognition functionality to realize comic expressions that reflect the user's emotions. This system consists of the user's terminal and a central server.
[0324] Users access the system via a terminal, inputting episodes related to memories and uploading associated image data. In addition, the user's natural language data may contain emotional information. At this stage, an emotion engine is activated to recognize the user's emotions. This engine analyzes the tone and emotion of the user's statements from the text data. In some cases, voice data is also used to improve the accuracy of emotion recognition.
[0325] The server analyzes the received text data through a text analysis engine to extract characters, events, and perceived emotions from the story. Similarly, image data is analyzed through an image analysis engine to obtain perceived information. This data is correlated, and generative models are used to generate manga data.
[0326] The generative model uses story elements and cognitive information to determine appropriate character expressions and tones that reflect the user's emotions. The generated comic data is automatically adjusted to create a story that aligns with the user's intentions and emotions as a whole.
[0327] The generated comic data is sent to the user's device, where they can review its contents on the interface. Users can also request revisions to the comic's story and expression, and ultimately, the comic is saved or shared by the user. For example, if a user creates a comic about a fun family trip, the emotions of joy and surprise inferred from the audio and text will be reflected in the characters' lively expressions and the tone of the story. This allows users to easily create emotionally rich, professional-quality comics.
[0328] The following describes the processing flow.
[0329] Step 1:
[0330] Users access the system via their device, input episodes related to their memories in text format, and select and upload relevant image data. Audio data can also be recorded and provided as needed.
[0331] Step 2:
[0332] The terminal converts the input text data, image data, and audio data into a predetermined format and prepares them for transmission to the server.
[0333] Step 3:
[0334] The server receives data sent from the terminal and first passes the text data to a natural language processing engine for analysis. This analysis extracts story elements, characters, events, and emotional information.
[0335] Step 4:
[0336] The emotion engine processes text and audio data to analyze the user's emotions (e.g., joy, sadness, surprise) and adds this emotional information to story elements.
[0337] Step 5:
[0338] The server uses an image analysis engine to scan image data and identify people, backgrounds, and objects within the image. This information is acquired as recognition data.
[0339] Step 6:
[0340] The server integrates story elements extracted from text, emotional information, and recognition information from images, and inputs them into a generative model. This model automatically generates the comic's storyline and visual elements.
[0341] Step 7:
[0342] The generative model meticulously adjusts character expressions and story tone to reflect user emotions, creating completed manga data.
[0343] Step 8:
[0344] The server sends the generated manga data to the user's device. The user can view the received manga on the interface and save or share it in their preferred format. They can also request further adjustments from the system as needed.
[0345] (Example 2)
[0346] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0347] Modern information processing devices lack the means to easily generate content that expresses users' memories and experiences in a unique and emotionally rich way. Furthermore, existing technologies suffer from low accuracy in emotion recognition, leading to frequent discrepancies between the generated content and the user's intentions. Additionally, there is the challenge of efficiently incorporating user feedback on the generated content.
[0348] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0349] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for automatically generating content data based on the feature information and story elements using a generative model that takes the emotional information into consideration. This enables the generation of unique content that reflects the user's emotions and intentions, and flexible content modification based on user feedback.
[0350] "Image information" refers to data perceived through vision, and includes digital images such as photographs, illustrations, and diagrams.
[0351] "Natural language information" refers to information that includes the words and sentences that humans normally use, and includes audio data and text data.
[0352] "Feature information" refers to specific attributes or patterns extracted from image information, including shape, color, and various structural features.
[0353] "Story elements" are factors and components that form the framework of a narrative, obtained by analyzing natural language information, and include characters, events, and settings.
[0354] "Emotional information" refers to information about the emotions or psychological state that a user is trying to express, and is identified through the analysis of natural language information or voice data.
[0355] A "generative model" is an algorithm or system that learns from large amounts of data to generate new data, and is particularly used for the automatic generation of content.
[0356] "Content data" refers to digitized stories and works generated based on user input, and includes visual or textual expressions.
[0357] "Output device" refers to a device or system for displaying generated content data so that a user can view it, and includes computer screens and mobile device displays.
[0358] An "edit request" is an instruction from a user to correct or change generated content data, including revisions to the story, facial expressions, and tone.
[0359] This invention is an information processing system for generating emotionally rich comics based on users' memories and experiences. The system operates primarily through the cooperation of a server and a user terminal.
[0360] Users first access the system using their device and upload text, audio, and image information as natural language data. The information users input includes stories about memories and experiences, as well as photos taken during those times. The information entered by the user on their device is immediately sent to the server.
[0361] The server uses a natural language processing engine to analyze the input natural language information. It also incorporates emotion recognition capabilities, identifying the user's emotions from the input text and audio. This emotion information is used to adjust character portrayals and narrative tone in the generated content. Furthermore, the server utilizes an image analysis engine to extract character expressions and situations as feature information from image data.
[0362] Next, the server uses a generative AI model based on these feature information and story elements to automatically generate professional-quality comic data. This generative AI model is trained on a large amount of sample data, enabling it to generate content that matches the user's emotions and intentions. The generated comic data is a story that reflects the user's intentions and includes lively character portrayals. For example, if you want to turn a fun family trip into a comic, you can use a prompt such as, "Create a comic that reflects the fun of our family trip." The comic generated based on this prompt will frequently feature smiling characters and bright colors.
[0363] Finally, the server sends the generated comic data to the user's device. The user can review the generated content on their device and, if necessary, request revisions to the expression or storyline. This feedback is sent back to the server, and the content data is re-edited. Finally, the user can save or share the edited comic data. This makes it easy to generate and revise emotionally rich, professional-quality comics.
[0364] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0365] Step 1:
[0366] The user accesses the system using a terminal. The user inputs natural language information (text and audio data) and image information. This input is then prepared for transmission to the server. Specifically, the user selects a file on the terminal and presses the send button. The input data is then sent to the server via a communication protocol.
[0367] Step 2:
[0368] The server analyzes the received natural language information. Here, a natural language processing engine runs, analyzing sentence structure and vocabulary from the text data. An emotion recognition module is also used to identify the user's emotions from their mood and tone. The output consists of story elements and emotional information. Specifically, it calculates the emotional value of each word and phrase, and then combines these values to form the basic elements of the story.
[0369] Step 3:
[0370] The server analyzes the received image information. The image analysis engine runs and extracts key features from the image data. This includes face recognition and background information processing. The output is character expressions and key visual components. Specifically, it quantifies things like the smiles of people in the image and the brightness of the scenery, generating data to be used in the story.
[0371] Step 4:
[0372] The server automatically generates manga data using a generative AI model. Here, story elements and feature information are taken as input, and content based on the user's emotional information is generated. The output is manga data of professional quality. Specifically, the generative model reflects the emotional information and adjusts the characters' expressions and color tones. For example, it might use bright colors to express joy.
[0373] Step 5:
[0374] The server sends the completed manga data to the user's device. The user then reviews the data on their device. The generated manga appears on the display screen as output. Specifically, the user can view the manga through the device's interface and request revisions as needed. If further editing is required, the data is sent back to the server.
[0375] Step 6:
[0376] The user performs a final check and saves or shares the manga data. The device provides an interface to assist these user operations. As output, the final content is saved in a file format or sent to the selected sharing platform. Specifically, the user presses the "Save" button to store the data in a selected folder or uploads it to social media, etc.
[0377] (Application Example 2)
[0378] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0379] Traditionally, reflecting on family memories and everyday events has been limited to formats such as photographs, videos, and text, making it difficult to deeply express the nuances of emotions and events. Families with young children, in particular, need to record and share memories in a way that reflects their children's expressions and emotions. Furthermore, there is a lack of means to strengthen communication among family members and share memories in a more enjoyable way.
[0380] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0381] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for analyzing emotion data to identify emotional elements. This makes it possible to analyze various events in daily life along with emotional elements, generate and share family memories as expressive content.
[0382] "Means for acquiring image information" refers to a device or function for taking in image data from an external source and converting it into a format that can be processed within the system.
[0383] "Means for acquiring natural language information" refers to devices or functions that collect text data in human language and incorporate it into a system that can understand it.
[0384] "Means for extracting recognition data" refers to a device or function that analyzes image information, grasps the characteristics of objects, people, etc., and acquires them as data.
[0385] "Means for extracting narrative elements" refers to a device or function that analyzes natural language information and identifies elements that constitute a narrative, such as context and character settings.
[0386] "Means for analyzing emotional data and identifying emotional elements" refers to a device or function that analyzes text or audio data, determines the speaker's emotions, and extracts information based on that determination.
[0387] "Means for automatically generating content data using a generation algorithm" refers to a device or function that automatically creates new content using an algorithm based on existing data.
[0388] "Means for displaying on a display device" refers to a device or function that provides generated content data to the user visually.
[0389] "Means of transmitting to an information terminal" refers to a device or function that sends generated data to a terminal via a network and makes it receivable.
[0390] "Means for receiving change requests and editing content data" refers to a device or function that reflects user requests for modification and changes the generated data.
[0391] This system consists of home information terminals and a central server. First, users access the system through their home information terminals and input image data related to their memories and descriptions in natural language. The terminals are equipped with high-performance cameras and microphones, enabling them to acquire image and audio information with high accuracy.
[0392] The device sends the acquired image information to the server, where an image analysis engine operates to extract recognition data from the image. In addition, the device sends the user's speech content to the server as text data, where a natural language processing engine operates to analyze narrative elements and emotional data. For emotional analysis, an advanced AI engine is used to precisely identify the user's emotions. In this case, software such as Amazon Web Services' Rekognition may be used.
[0393] Once recognition data, narrative elements, and emotional elements are extracted, the server invokes a generation algorithm to generate comic data with a specific story based on these elements. This generation process utilizes open-source AI frameworks such as TensorFlow or PyTorch. The generated comic data is automatically adjusted to align with the story's flow and the user's emotions.
[0394] The completed content data is sent back to the user's home information terminal and displayed on the terminal's screen. The user can review the displayed content and, if necessary, send a correction request to the server. If a correction request is received, the server runs the algorithm again and automatically modifies the content data.
[0395] As a concrete example, if you want to create a comic strip about your child's sports day memories, you would input a prompt into the AI model saying, "Please generate a heroic and moving comic strip based on the moment when the user felt proud of their child's performance at the sports day," and appropriate content would be generated. In this way, family memories are recorded with rich emotion, and the whole family can enjoy it.
[0396] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0397] Step 1:
[0398] Users access the system through a home information terminal and input image data and text in natural language related to specific memories. This input image data and text are temporarily stored on the terminal. The terminal uses a camera and microphone to acquire high-resolution data.
[0399] Step 2:
[0400] The terminal sends the acquired image data to the server. During this process, the image data is compressed and securely transferred over the network. Once the image data arrives at the server, the server's image analysis engine begins operation.
[0401] Step 3:
[0402] The server uses an image analysis engine to analyze image data and extract recognition data. Specifically, it recognizes people and objects within the image and organizes that information as metadata. As a result of the analysis, results with specific tags are output.
[0403] Step 4:
[0404] Based on the natural language text entered by the user through the device, the device sends that data to the server. The natural language processing engine on the server receives the text and analyzes the text data.
[0405] Step 5:
[0406] The server uses a natural language processing engine to extract narrative and emotional elements from the input text. This process analyzes emotion-related words and context within the text to understand the user's emotions. The extracted data is output as narrative elements.
[0407] Step 6:
[0408] The server uses recognition data, narrative elements, and emotional elements obtained as a result of image analysis and natural language processing to invoke a generation algorithm and use this data to generate manga data with a specific storyline. A generation AI model is responsible for the operation, outputting thoughtfully designed text and prompts.
[0409] Step 7:
[0410] The generated manga data is sent from the server to the user's home information terminal. It is displayed on the terminal's screen and becomes available for the user to view.
[0411] Step 8:
[0412] The user reviews the generated content and, if necessary, sends a correction request from their device to the server. Based on this request, the server regenerates the content data to reflect the changes. The corrected content is then sent back to the device and displayed.
[0413] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0414] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0415] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0416] [Third Embodiment]
[0417] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0418] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0419] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0420] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0421] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0422] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0423] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0424] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0425] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0426] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0427] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0428] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0429] This invention is a system that uses image data and natural language data provided by the user as input to generate story-based comic data from this data. This system operates between the user's terminal and a central server.
[0430] Users first access the system via a device such as a smartphone or computer. The user's device displays an interface for inputting memorable episodes in natural language. Users can also upload related image data. At this stage, users provide specific information that constitutes their memories.
[0431] The terminal sends the input text and image data to the server. The server receives this data and uses a text analysis engine to extract story elements from the natural language data. Meanwhile, the image analysis engine scans the image data to obtain recognition information. This provides the basic elements for generating a manga from the input data.
[0432] The server integrates the generated story elements and recognition information, and uses a generative model to create comic data. This generative model utilizes machine learning techniques to automatically determine drawing styles, panel layouts, character placement, and more.
[0433] The generated manga data is sent to the user's device. The user can view the completed manga on their device and request revisions as needed. Finally, the user can save the manga data to their device and share it with friends and family.
[0434] For example, if a user wants to save memories of a family trip as a comic, they select photos taken during the trip and enter the story of the trip in text. The system then automatically generates a comic that reflects the atmosphere and episodes of the trip and delivers it to the user. This allows users to easily create professional-quality comics without having to worry about the production process.
[0435] The following describes the processing flow.
[0436] Step 1:
[0437] Users use a terminal to access the system interface and enter episodes related to their memories into a text field. They also select relevant photo data and upload it to the system.
[0438] Step 2:
[0439] The terminal converts the input text and image data into a predetermined format (e.g., JSON format) and prepares it for transfer to the server over the network in order to send it to the server.
[0440] Step 3:
[0441] The server receives data sent from the terminal, analyzes the data format, and separates it into text data and image data. During this process, it verifies that the received data is correctly analyzed.
[0442] Step 4:
[0443] The server's text analysis engine uses natural language processing techniques to analyze text data and extract narrative elements such as characters, events, and emotional expressions.
[0444] Step 5:
[0445] The server's image analysis engine processes image data and uses deep learning techniques to extract recognition information about people, backgrounds, and objects within the image. In this step, the image features are represented as numerical vectors.
[0446] Step 6:
[0447] The server integrates story elements from the text analysis engine and recognition information from the image analysis engine. Based on this integrated information, the generative model automatically generates manga data.
[0448] Step 7:
[0449] The generated comic data is laid out into multiple comic panels, visually representing the flow of the story. The server converts this into a digital data format (e.g., PDF or JPEG) and prepares it for transmission to the user's device.
[0450] Step 8:
[0451] The user's device receives the manga data sent from the server and displays it on the interface. The user can review the manga and save the final version by sending a request for corrections back to the server as needed.
[0452] (Example 1)
[0453] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0454] Traditional manga generation systems require users to manually edit image data and story elements in detail, resulting in a cumbersome and time-consuming process. Furthermore, there are challenges in guaranteeing the quality of the automatically generated manga and whether it aligns with the user's intentions.
[0455] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0456] In this invention, the server includes means for acquiring digital visual information using an information recording device, means for acquiring descriptive data using an information recording device, and means for analyzing the digital visual information and extracting recognition information. This enables the automatic generation of high-quality, consistent comics simply by the user intuitively providing data.
[0457] An "information recording device" is an electronic device used to acquire digital visual information and descriptive data, and it is a device that plays a role in recording user-input data.
[0458] "Digital visual information" refers to data that stores visual representations such as images and videos in digital format, and is information that has been converted into a format that can be processed by a computer.
[0459] "Descriptive data" refers to text information written in natural language, and is digital information that includes the user's intent and story elements.
[0460] "Recognition information" refers to data extracted as a result of analyzing digital visual information and identifying specific objects or scenes.
[0461] "Abstract concepts" refer to story elements and main themes extracted from descriptive data through natural language processing, and are fundamental information for manga generation.
[0462] "Image information" refers to a series of manga data automatically generated using a generative AI model based on the aforementioned recognition information and abstract concepts, and is information that is visually represented.
[0463] This system generates image data that visually represents the user's experiences and intentions by linking the user's digital device with a central data processing unit. The system includes an information recording device as a digital device and a central server with powerful data processing capabilities.
[0464] Users access this system using information recording devices such as smartphones and personal computers. Users can input text data related to specific memories, such as trips or events, through the interface. They can also provide digital visual information by uploading photographs they have taken or existing visual data.
[0465] The central server analyzes the received descriptive data and digital visual information. First, the server processes the descriptive data using natural language processing software to extract key themes and story elements as "abstract concepts." Simultaneously, it uses image processing software to obtain "recognition information" from the digital visual information.
[0466] In the next stage of this process, the generative AI model is launched on the server. The generative AI model utilizes deep learning techniques to integrate extracted abstract concepts and cognitive information, automatically generating image data that reflects the user's intent. The generated manga data is then promptly delivered to the user's device.
[0467] As an example, consider a user who wants to save memories of a family trip as a comic strip. The user selects photos of the beach taken during the trip and inputs the story of the trip along with a prompt such as "Please turn the memories of our fun family trip into a comic strip." The system can then provide a professional-quality comic strip based on the memories.
[0468] Thus, this system greatly improves the user experience by providing users with an easy way to obtain high-quality visual representations.
[0469] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0470] Step 1:
[0471] Users access the system using devices such as smartphones and personal computers. Through the user interface, they input prompts and memorable episodes in text format, and also select and upload related image data. The input here consists of text data and image files, which are temporarily stored on the device.
[0472] Step 2:
[0473] The terminal sends the input text and image data to the server. The input here consists of text and image data provided by the user, and the output is the data passed to the server. Encryption protocols are used throughout this process for security.
[0474] Step 3:
[0475] The server processes the received data. First, it uses a natural language processing engine to analyze the text data and extract story elements. Specifically, it extracts keywords and analyzes the structure of the sentences to extract the main elements of the narrative. This becomes the output as an "abstract concept" based on the input text.
[0476] Step 4:
[0477] In parallel, the server scans the image data using an image analysis engine. It identifies objects and landscapes within the image and generates recognition information. Here, image recognition algorithms are in operation, and key features are identified by computer vision technology. Recognition information is extracted, and output based on the image data is obtained.
[0478] Step 5:
[0479] The server uses a generative AI model to integrate previously obtained abstract concepts and cognitive information, automatically generating manga data. This process leverages a pre-trained dataset of deep learning models to automatically adjust drawing styles, panel layouts, and character placement. The output here is the completed manga data.
[0480] Step 6:
[0481] Finally, the server sends the generated manga data to the user's terminal. In this step, the manga data is converted to an appropriate format so that the user can review the results, and then transferred to the terminal via a communication protocol. The user can then review the manga on their terminal and request corrections if necessary.
[0482] (Application Example 1)
[0483] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0484] In recent years, advancements in information technology have made it commonplace to easily record personal information digitally. However, there is still a lack of convenient means to visualize everyday episodes and events as stories and save them in a way that can be shared with many people. Furthermore, creating and sharing high-quality graphic content without requiring specialized skills still requires considerable time and effort from users. Solving this problem is essential.
[0485] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0486] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for automatically generating graphic data based on recognition elements and narrative elements using a data generation model. This makes it possible for users to easily generate everyday episodes as visual stories without requiring special skills and easily share them with others through an information distribution device.
[0487] "Image information" refers to visual data recorded in digital format, which forms the basis for system analysis.
[0488] "Natural language information" refers to data written in the language that humans use orally or in writing, and is used to extract story elements.
[0489] "Recognition elements" refer to data about specific features or objects that are extracted from image information through analysis.
[0490] "Narrative elements" refer to data about the constituent elements and themes of a story, extracted through analysis of natural language information.
[0491] A "data generation model" is an algorithm that uses machine learning techniques to generate graphic data based on the results of image and natural language analysis.
[0492] "Graphic data" refers to a visual representation format generated based on information entered by the user, and is intended to visually communicate its content.
[0493] An "information distribution device" is an electronic device used to share or distribute generated graphic data to other users.
[0494] This invention is a system for generating digital graphic content based on specific information and sharing it with others. A specific embodiment of this system will be described here.
[0495] First, the user inputs image information and natural language information into a device. This device is expected to be a smartphone or computer, and the user can easily record the collected image data and related episodes in natural language. This information is then sent to the server via an application installed on the device.
[0496] The server uses image analysis software (e.g., image_lib) to extract necessary recognition elements from images in order to handle image information. At the same time, it uses a natural language processing engine (e.g., nlp_lib) to process natural language information and extract narrative elements that form the basis of the story.
[0497] Next, a data generation model (e.g., comic_generation_model) is used to integrate recognition elements and narrative elements to generate graphic data. Based on these elements, the generation AI model automatically creates visually appealing comic and story-format content.
[0498] The generated graphic data is sent to the user's device. Users can then share the generated content with others via social media, email, or content distribution devices. This makes it possible to create and widely distribute professional-quality visual stories even without advanced design skills.
[0499] A concrete example is a scenario where a user wants to generate graphic data based on memories of a holiday trip. The user imports photos taken during the trip into their device and inputs natural language information such as, "It was a wonderful trip. The sea was beautiful, and I had a great time with my family." The server generates a visually rich comic from this information and assists the user in sharing it with friends.
[0500] Examples of input prompts for a generative AI model:
[0501] "Based on the text and image information provided by the user, please generate a vivid and emotionally resonant comic. The style should be bright and colorful, conveying the joy of family."
[0502] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0503] Step 1:
[0504] The user inputs image information and natural language information into the device. The user selects photos taken during trips or events through a dedicated application on the device and describes these images and related episodes in natural language. This information is collected by the application and prepared for transmission to the system. The input data consists of selected image files and text information written by the user.
[0505] Step 2:
[0506] Data is sent from the terminal to the server. The application sends collected image and natural language information to the server via the internet. The input data consists of image files (e.g., JPEG format) and text data. The server receives these and prepares them for the next analysis step.
[0507] Step 3:
[0508] The server performs image analysis. Using image analysis software (e.g., image_lib), the server extracts recognition elements from the received image information. Specifically, it performs object recognition, scene analysis, and color analysis within the image, generating information for linking with text information. The output includes feature quantities for each part of the image and a list of recognized objects.
[0509] Step 4:
[0510] The server performs text analysis. Using a natural language processing engine (e.g., nlp_lib), the server extracts narrative elements from the received natural language information. Specifically, it performs semantic analysis, keyword extraction, and contextual understanding of the text. The output is the analysis result, including the story's themes and key points.
[0511] Step 5:
[0512] The server generates graphic data. Using a data generation model (e.g., comic_generation_model), it generates visual graphic data based on the analyzed recognition and narrative elements. Specifically, it performs image style conversion, layout determination, and automatic illustration generation. The output is graphic data in a completed story format (e.g., a comic-style image file).
[0513] Step 6:
[0514] The server sends the generated graphic data to the terminal. The server then sends the generated graphic data back to the user's terminal for the user to review. The input is the generated graphic data, and the output is the process of transferring this data to the user's terminal.
[0515] Step 7:
[0516] The user reviews the generated graphic data and requests corrections as needed. The user can view the generated graphic content on their terminal and, if necessary, send correction requests to the server via the application. The input is user feedback on corrections, and the output is an opportunity for the server to receive this feedback and reprocess it.
[0517] Step 8:
[0518] Users share the final version of the graphic data. Users can share the reviewed graphic data with others through social media or content distribution devices. The input consists of the final approved graphic data and the designated recipients for sharing, and the output is an electronic distribution process based on this.
[0519] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0520] This invention is a system that acquires user image data and natural language data, generates story-based comic data from them, and further combines this with emotion recognition functionality to realize comic expressions that reflect the user's emotions. This system consists of the user's terminal and a central server.
[0521] Users access the system via a terminal, inputting episodes related to memories and uploading associated image data. In addition, the user's natural language data may contain emotional information. At this stage, an emotion engine is activated to recognize the user's emotions. This engine analyzes the tone and emotion of the user's statements from the text data. In some cases, voice data is also used to improve the accuracy of emotion recognition.
[0522] The server analyzes the received text data through a text analysis engine to extract characters, events, and perceived emotions from the story. Similarly, image data is analyzed through an image analysis engine to obtain perceived information. This data is correlated, and generative models are used to generate manga data.
[0523] The generative model uses story elements and cognitive information to determine appropriate character expressions and tones that reflect the user's emotions. The generated comic data is automatically adjusted to create a story that aligns with the user's intentions and emotions as a whole.
[0524] The generated comic data is sent to the user's device, where they can review its contents on the interface. Users can also request revisions to the comic's story and expression, and ultimately, the comic is saved or shared by the user. For example, if a user creates a comic about a fun family trip, the emotions of joy and surprise inferred from the audio and text will be reflected in the characters' lively expressions and the tone of the story. This allows users to easily create emotionally rich, professional-quality comics.
[0525] The following describes the processing flow.
[0526] Step 1:
[0527] Users access the system via their device, input episodes related to their memories in text format, and select and upload relevant image data. Audio data can also be recorded and provided as needed.
[0528] Step 2:
[0529] The terminal converts the input text data, image data, and audio data into a predetermined format and prepares them for transmission to the server.
[0530] Step 3:
[0531] The server receives data sent from the terminal and first passes the text data to a natural language processing engine for analysis. This analysis extracts story elements, characters, events, and emotional information.
[0532] Step 4:
[0533] The emotion engine processes text and audio data to analyze the user's emotions (e.g., joy, sadness, surprise) and adds this emotional information to story elements.
[0534] Step 5:
[0535] The server uses an image analysis engine to scan image data and identify people, backgrounds, and objects within the image. This information is acquired as recognition data.
[0536] Step 6:
[0537] The server integrates story elements extracted from text, emotional information, and recognition information from images, and inputs them into a generative model. This model automatically generates the comic's storyline and visual elements.
[0538] Step 7:
[0539] The generative model meticulously adjusts character expressions and story tone to reflect user emotions, creating completed manga data.
[0540] Step 8:
[0541] The server sends the generated manga data to the user's device. The user can view the received manga on the interface and save or share it in their preferred format. They can also request further adjustments from the system as needed.
[0542] (Example 2)
[0543] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0544] Modern information processing devices lack the means to easily generate content that expresses users' memories and experiences in a unique and emotionally rich way. Furthermore, existing technologies suffer from low accuracy in emotion recognition, leading to frequent discrepancies between the generated content and the user's intentions. Additionally, there is the challenge of efficiently incorporating user feedback on the generated content.
[0545] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0546] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for automatically generating content data based on the feature information and story elements using a generative model that takes the emotional information into consideration. This enables the generation of unique content that reflects the user's emotions and intentions, and flexible content modification based on user feedback.
[0547] "Image information" refers to data perceived through vision, and includes digital images such as photographs, illustrations, and diagrams.
[0548] "Natural language information" refers to information that includes the words and sentences that humans normally use, and includes audio data and text data.
[0549] "Feature information" refers to specific attributes or patterns extracted from image information, including shape, color, and various structural features.
[0550] "Story elements" are factors and components that form the framework of a narrative, obtained by analyzing natural language information, and include characters, events, and settings.
[0551] "Emotional information" refers to information about the emotions or psychological state that a user is trying to express, and is identified through the analysis of natural language information or voice data.
[0552] A "generative model" is an algorithm or system that learns from large amounts of data to generate new data, and is particularly used for the automatic generation of content.
[0553] "Content data" refers to digitized stories and works generated based on user input, and includes visual or textual expressions.
[0554] "Output device" refers to a device or system for displaying generated content data so that a user can view it, and includes computer screens and mobile device displays.
[0555] An "edit request" is an instruction from a user to correct or change generated content data, including revisions to the story, facial expressions, and tone.
[0556] This invention is an information processing system for generating emotionally rich comics based on users' memories and experiences. The system operates primarily through the cooperation of a server and a user terminal.
[0557] Users first access the system using their device and upload text, audio, and image information as natural language data. The information users input includes stories about memories and experiences, as well as photos taken during those times. The information entered by the user on their device is immediately sent to the server.
[0558] The server uses a natural language processing engine to analyze the input natural language information. It also incorporates emotion recognition capabilities, identifying the user's emotions from the input text and audio. This emotion information is used to adjust character portrayals and narrative tone in the generated content. Furthermore, the server utilizes an image analysis engine to extract character expressions and situations as feature information from image data.
[0559] Next, the server uses a generative AI model based on these feature information and story elements to automatically generate professional-quality comic data. This generative AI model is trained on a large amount of sample data, enabling it to generate content that matches the user's emotions and intentions. The generated comic data is a story that reflects the user's intentions and includes lively character portrayals. For example, if you want to turn a fun family trip into a comic, you can use a prompt such as, "Create a comic that reflects the fun of our family trip." The comic generated based on this prompt will frequently feature smiling characters and bright colors.
[0560] Finally, the server sends the generated comic data to the user's device. The user can review the generated content on their device and, if necessary, request revisions to the expression or storyline. This feedback is sent back to the server, and the content data is re-edited. Finally, the user can save or share the edited comic data. This makes it easy to generate and revise emotionally rich, professional-quality comics.
[0561] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0562] Step 1:
[0563] The user accesses the system using a terminal. The user inputs natural language information (text and audio data) and image information. This input is then prepared for transmission to the server. Specifically, the user selects a file on the terminal and presses the send button. The input data is then sent to the server via a communication protocol.
[0564] Step 2:
[0565] The server analyzes the received natural language information. Here, a natural language processing engine runs, analyzing sentence structure and vocabulary from the text data. An emotion recognition module is also used to identify the user's emotions from their mood and tone. The output consists of story elements and emotional information. Specifically, it calculates the emotional value of each word and phrase, and then combines these values to form the basic elements of the story.
[0566] Step 3:
[0567] The server analyzes the received image information. The image analysis engine runs and extracts key features from the image data. This includes face recognition and background information processing. The output is character expressions and key visual components. Specifically, it quantifies things like the smiles of people in the image and the brightness of the scenery, generating data to be used in the story.
[0568] Step 4:
[0569] The server automatically generates manga data using a generative AI model. Here, story elements and feature information are taken as input, and content based on the user's emotional information is generated. The output is manga data of professional quality. Specifically, the generative model reflects the emotional information and adjusts the characters' expressions and color tones. For example, it might use bright colors to express joy.
[0570] Step 5:
[0571] The server sends the completed manga data to the user's device. The user then reviews the data on their device. The generated manga appears on the display screen as output. Specifically, the user can view the manga through the device's interface and request revisions as needed. If further editing is required, the data is sent back to the server.
[0572] Step 6:
[0573] The user performs a final check and saves or shares the manga data. The device provides an interface to assist these user operations. As output, the final content is saved in a file format or sent to the selected sharing platform. Specifically, the user presses the "Save" button to store the data in a selected folder or uploads it to social media, etc.
[0574] (Application Example 2)
[0575] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0576] Traditionally, reflecting on family memories and everyday events has been limited to formats such as photographs, videos, and text, making it difficult to deeply express the nuances of emotions and events. Families with young children, in particular, need to record and share memories in a way that reflects their children's expressions and emotions. Furthermore, there is a lack of means to strengthen communication among family members and share memories in a more enjoyable way.
[0577] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0578] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for analyzing emotion data to identify emotional elements. This makes it possible to analyze various events in daily life along with emotional elements, generate and share family memories as expressive content.
[0579] "Means for acquiring image information" refers to a device or function for taking in image data from an external source and converting it into a format that can be processed within the system.
[0580] "Means for acquiring natural language information" refers to devices or functions that collect text data in human language and incorporate it into a system that can understand it.
[0581] "Means for extracting recognition data" refers to a device or function that analyzes image information, grasps the characteristics of objects, people, etc., and acquires them as data.
[0582] "Means for extracting narrative elements" refers to a device or function that analyzes natural language information and identifies elements that constitute a narrative, such as context and character settings.
[0583] "Means for analyzing emotional data and identifying emotional elements" refers to a device or function that analyzes text or audio data, determines the speaker's emotions, and extracts information based on that determination.
[0584] "Means for automatically generating content data using a generation algorithm" refers to a device or function that automatically creates new content using an algorithm based on existing data.
[0585] "Means for displaying on a display device" refers to a device or function that provides generated content data to the user visually.
[0586] "Means of transmitting to an information terminal" refers to a device or function that sends generated data to a terminal via a network and makes it receivable.
[0587] "Means for receiving change requests and editing content data" refers to a device or function that reflects user requests for modification and changes the generated data.
[0588] This system consists of home information terminals and a central server. First, users access the system through their home information terminals and input image data related to their memories and descriptions in natural language. The terminals are equipped with high-performance cameras and microphones, enabling them to acquire image and audio information with high accuracy.
[0589] The device sends the acquired image information to the server, where an image analysis engine operates to extract recognition data from the image. In addition, the device sends the user's speech content to the server as text data, where a natural language processing engine operates to analyze narrative elements and emotional data. For emotional analysis, an advanced AI engine is used to precisely identify the user's emotions. In this case, software such as Amazon Web Services' Rekognition may be used.
[0590] Once recognition data, narrative elements, and emotional elements are extracted, the server invokes a generation algorithm to generate comic data with a specific story based on these elements. This generation process utilizes open-source AI frameworks such as TensorFlow or PyTorch. The generated comic data is automatically adjusted to align with the story's flow and the user's emotions.
[0591] The completed content data is sent back to the user's home information terminal and displayed on the terminal's screen. The user can review the displayed content and, if necessary, send a correction request to the server. If a correction request is received, the server runs the algorithm again and automatically modifies the content data.
[0592] As a concrete example, if you want to create a comic strip about your child's sports day memories, you would input a prompt into the AI model saying, "Please generate a heroic and moving comic strip based on the moment when the user felt proud of their child's performance at the sports day," and appropriate content would be generated. In this way, family memories are recorded with rich emotion, and the whole family can enjoy it.
[0593] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0594] Step 1:
[0595] Users access the system through a home information terminal and input image data and text in natural language related to specific memories. This input image data and text are temporarily stored on the terminal. The terminal uses a camera and microphone to acquire high-resolution data.
[0596] Step 2:
[0597] The terminal sends the acquired image data to the server. During this process, the image data is compressed and securely transferred over the network. Once the image data arrives at the server, the server's image analysis engine begins operation.
[0598] Step 3:
[0599] The server uses an image analysis engine to analyze image data and extract recognition data. Specifically, it recognizes people and objects within the image and organizes that information as metadata. As a result of the analysis, results with specific tags are output.
[0600] Step 4:
[0601] Based on the natural language text entered by the user through the device, the device sends that data to the server. The natural language processing engine on the server receives the text and analyzes the text data.
[0602] Step 5:
[0603] The server uses a natural language processing engine to extract narrative and emotional elements from the input text. This process analyzes emotion-related words and context within the text to understand the user's emotions. The extracted data is output as narrative elements.
[0604] Step 6:
[0605] The server uses recognition data, narrative elements, and emotional elements obtained as a result of image analysis and natural language processing to invoke a generation algorithm and use this data to generate manga data with a specific storyline. A generation AI model is responsible for the operation, outputting thoughtfully designed text and prompts.
[0606] Step 7:
[0607] The generated manga data is sent from the server to the user's home information terminal. It is displayed on the terminal's screen and becomes available for the user to view.
[0608] Step 8:
[0609] The user reviews the generated content and, if necessary, sends a correction request from their device to the server. Based on this request, the server regenerates the content data to reflect the changes. The corrected content is then sent back to the device and displayed.
[0610] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0611] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0612] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0613] [Fourth Embodiment]
[0614] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0615] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0616] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0617] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0618] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0619] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0620] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0621] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0622] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0623] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0624] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0625] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0626] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0627] This invention is a system that uses image data and natural language data provided by the user as input to generate story-based comic data from this data. This system operates between the user's terminal and a central server.
[0628] Users first access the system via a device such as a smartphone or computer. The user's device displays an interface for inputting memorable episodes in natural language. Users can also upload related image data. At this stage, users provide specific information that constitutes their memories.
[0629] The terminal sends the input text and image data to the server. The server receives this data and uses a text analysis engine to extract story elements from the natural language data. Meanwhile, the image analysis engine scans the image data to obtain recognition information. This provides the basic elements for generating a manga from the input data.
[0630] The server integrates the generated story elements and recognition information, and uses a generative model to create comic data. This generative model utilizes machine learning techniques to automatically determine drawing styles, panel layouts, character placement, and more.
[0631] The generated manga data is sent to the user's device. The user can view the completed manga on their device and request revisions as needed. Finally, the user can save the manga data to their device and share it with friends and family.
[0632] For example, if a user wants to save memories of a family trip as a comic, they select photos taken during the trip and enter the story of the trip in text. The system then automatically generates a comic that reflects the atmosphere and episodes of the trip and delivers it to the user. This allows users to easily create professional-quality comics without having to worry about the production process.
[0633] The following describes the processing flow.
[0634] Step 1:
[0635] Users use a terminal to access the system interface and enter episodes related to their memories into a text field. They also select relevant photo data and upload it to the system.
[0636] Step 2:
[0637] The terminal converts the input text and image data into a predetermined format (e.g., JSON format) and prepares it for transfer to the server over the network in order to send it to the server.
[0638] Step 3:
[0639] The server receives data sent from the terminal, analyzes the data format, and separates it into text data and image data. During this process, it verifies that the received data is correctly analyzed.
[0640] Step 4:
[0641] The server's text analysis engine uses natural language processing techniques to analyze text data and extract narrative elements such as characters, events, and emotional expressions.
[0642] Step 5:
[0643] The server's image analysis engine processes image data and uses deep learning techniques to extract recognition information about people, backgrounds, and objects within the image. In this step, the image features are represented as numerical vectors.
[0644] Step 6:
[0645] The server integrates story elements from the text analysis engine and recognition information from the image analysis engine. Based on this integrated information, the generative model automatically generates manga data.
[0646] Step 7:
[0647] The generated comic data is laid out into multiple comic panels, visually representing the flow of the story. The server converts this into a digital data format (e.g., PDF or JPEG) and prepares it for transmission to the user's device.
[0648] Step 8:
[0649] The user's device receives the manga data sent from the server and displays it on the interface. The user can review the manga and save the final version by sending a request for corrections back to the server as needed.
[0650] (Example 1)
[0651] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0652] Traditional manga generation systems require users to manually edit image data and story elements in detail, resulting in a cumbersome and time-consuming process. Furthermore, there are challenges in guaranteeing the quality of the automatically generated manga and whether it aligns with the user's intentions.
[0653] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0654] In this invention, the server includes means for acquiring digital visual information using an information recording device, means for acquiring descriptive data using an information recording device, and means for analyzing the digital visual information and extracting recognition information. This enables the automatic generation of high-quality, consistent comics simply by the user intuitively providing data.
[0655] An "information recording device" is an electronic device used to acquire digital visual information and descriptive data, and it is a device that plays a role in recording user-input data.
[0656] "Digital visual information" refers to data that stores visual representations such as images and videos in digital format, and is information that has been converted into a format that can be processed by a computer.
[0657] "Descriptive data" refers to text information written in natural language, and is digital information that includes the user's intent and story elements.
[0658] "Recognition information" refers to data extracted as a result of analyzing digital visual information and identifying specific objects or scenes.
[0659] "Abstract concepts" refer to story elements and main themes extracted from descriptive data through natural language processing, and are fundamental information for manga generation.
[0660] "Image information" refers to a series of manga data automatically generated using a generative AI model based on the aforementioned recognition information and abstract concepts, and is information that is visually represented.
[0661] This system generates image data that visually represents the user's experiences and intentions by linking the user's digital device with a central data processing unit. The system includes an information recording device as a digital device and a central server with powerful data processing capabilities.
[0662] Users access this system using information recording devices such as smartphones and personal computers. Users can input text data related to specific memories, such as trips or events, through the interface. They can also provide digital visual information by uploading photographs they have taken or existing visual data.
[0663] The central server analyzes the received descriptive data and digital visual information. First, the server processes the descriptive data using natural language processing software to extract key themes and story elements as "abstract concepts." Simultaneously, it uses image processing software to obtain "recognition information" from the digital visual information.
[0664] In the next stage of this process, the generative AI model is launched on the server. The generative AI model utilizes deep learning techniques to integrate extracted abstract concepts and cognitive information, automatically generating image data that reflects the user's intent. The generated manga data is then promptly delivered to the user's device.
[0665] As an example, consider a user who wants to save memories of a family trip as a comic strip. The user selects photos of the beach taken during the trip and inputs the story of the trip along with a prompt such as "Please turn the memories of our fun family trip into a comic strip." The system can then provide a professional-quality comic strip based on the memories.
[0666] Thus, this system greatly improves the user experience by providing users with an easy way to obtain high-quality visual representations.
[0667] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0668] Step 1:
[0669] Users access the system using devices such as smartphones and personal computers. Through the user interface, they input prompts and memorable episodes in text format, and also select and upload related image data. The input here consists of text data and image files, which are temporarily stored on the device.
[0670] Step 2:
[0671] The terminal sends the input text and image data to the server. The input here consists of text and image data provided by the user, and the output is the data passed to the server. Encryption protocols are used throughout this process for security.
[0672] Step 3:
[0673] The server processes the received data. First, it uses a natural language processing engine to analyze the text data and extract story elements. Specifically, it extracts keywords and analyzes the structure of the sentences to extract the main elements of the narrative. This becomes the output as an "abstract concept" based on the input text.
[0674] Step 4:
[0675] In parallel, the server scans the image data using an image analysis engine. It identifies objects and landscapes within the image and generates recognition information. Here, image recognition algorithms are in operation, and key features are identified by computer vision technology. Recognition information is extracted, and output based on the image data is obtained.
[0676] Step 5:
[0677] The server uses a generative AI model to integrate previously obtained abstract concepts and cognitive information, automatically generating manga data. This process leverages a pre-trained dataset of deep learning models to automatically adjust drawing styles, panel layouts, and character placement. The output here is the completed manga data.
[0678] Step 6:
[0679] Finally, the server sends the generated manga data to the user's terminal. In this step, the manga data is converted to an appropriate format so that the user can review the results, and then transferred to the terminal via a communication protocol. The user can then review the manga on their terminal and request corrections if necessary.
[0680] (Application Example 1)
[0681] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0682] In recent years, advancements in information technology have made it commonplace to easily record personal information digitally. However, there is still a lack of convenient means to visualize everyday episodes and events as stories and save them in a way that can be shared with many people. Furthermore, creating and sharing high-quality graphic content without requiring specialized skills still requires considerable time and effort from users. Solving this problem is essential.
[0683] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0684] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for automatically generating graphic data based on recognition elements and narrative elements using a data generation model. This makes it possible for users to easily generate everyday episodes as visual stories without requiring special skills and easily share them with others through an information distribution device.
[0685] "Image information" refers to visual data recorded in digital format, which forms the basis for system analysis.
[0686] "Natural language information" refers to data written in the language that humans use orally or in writing, and is used to extract story elements.
[0687] "Recognition elements" refer to data about specific features or objects that are extracted from image information through analysis.
[0688] "Narrative elements" refer to data about the constituent elements and themes of a story, extracted through analysis of natural language information.
[0689] A "data generation model" is an algorithm that uses machine learning techniques to generate graphic data based on the results of image and natural language analysis.
[0690] "Graphic data" refers to a visual representation format generated based on information entered by the user, and is intended to visually communicate its content.
[0691] An "information distribution device" is an electronic device used to share or distribute generated graphic data to other users.
[0692] This invention is a system for generating digital graphic content based on specific information and sharing it with others. A specific embodiment of this system will be described here.
[0693] First, the user inputs image information and natural language information into a device. This device is expected to be a smartphone or computer, and the user can easily record the collected image data and related episodes in natural language. This information is then sent to the server via an application installed on the device.
[0694] The server uses image analysis software (e.g., image_lib) to extract necessary recognition elements from images in order to handle image information. At the same time, it uses a natural language processing engine (e.g., nlp_lib) to process natural language information and extract narrative elements that form the basis of the story.
[0695] Next, a data generation model (e.g., comic_generation_model) is used to integrate recognition elements and narrative elements to generate graphic data. Based on these elements, the generation AI model automatically creates visually appealing comic and story-format content.
[0696] The generated graphic data is sent to the user's device. Users can then share the generated content with others via social media, email, or content distribution devices. This makes it possible to create and widely distribute professional-quality visual stories even without advanced design skills.
[0697] A concrete example is a scenario where a user wants to generate graphic data based on memories of a holiday trip. The user imports photos taken during the trip into their device and inputs natural language information such as, "It was a wonderful trip. The sea was beautiful, and I had a great time with my family." The server generates a visually rich comic from this information and assists the user in sharing it with friends.
[0698] Examples of input prompts for a generative AI model:
[0699] "Based on the text and image information provided by the user, please generate a vivid and emotionally resonant comic. The style should be bright and colorful, conveying the joy of family."
[0700] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0701] Step 1:
[0702] The user inputs image information and natural language information into the device. The user selects photos taken during trips or events through a dedicated application on the device and describes these images and related episodes in natural language. This information is collected by the application and prepared for transmission to the system. The input data consists of selected image files and text information written by the user.
[0703] Step 2:
[0704] Data is sent from the terminal to the server. The application sends collected image and natural language information to the server via the internet. The input data consists of image files (e.g., JPEG format) and text data. The server receives these and prepares them for the next analysis step.
[0705] Step 3:
[0706] The server performs image analysis. Using image analysis software (e.g., image_lib), the server extracts recognition elements from the received image information. Specifically, it performs object recognition, scene analysis, and color analysis within the image, generating information for linking with text information. The output includes feature quantities for each part of the image and a list of recognized objects.
[0707] Step 4:
[0708] The server performs text analysis. Using a natural language processing engine (e.g., nlp_lib), the server extracts narrative elements from the received natural language information. Specifically, it performs semantic analysis, keyword extraction, and contextual understanding of the text. The output is the analysis result, including the story's themes and key points.
[0709] Step 5:
[0710] The server generates graphic data. Using a data generation model (e.g., comic_generation_model), it generates visual graphic data based on the analyzed recognition and narrative elements. Specifically, it performs image style conversion, layout determination, and automatic illustration generation. The output is graphic data in a completed story format (e.g., a comic-style image file).
[0711] Step 6:
[0712] The server sends the generated graphic data to the terminal. The server then sends the generated graphic data back to the user's terminal for the user to review. The input is the generated graphic data, and the output is the process of transferring this data to the user's terminal.
[0713] Step 7:
[0714] The user reviews the generated graphic data and requests corrections as needed. The user can view the generated graphic content on their terminal and, if necessary, send correction requests to the server via the application. The input is user feedback on corrections, and the output is an opportunity for the server to receive this feedback and reprocess it.
[0715] Step 8:
[0716] Users share the final version of the graphic data. Users can share the reviewed graphic data with others through social media or content distribution devices. The input consists of the final approved graphic data and the designated recipients for sharing, and the output is an electronic distribution process based on this.
[0717] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0718] This invention is a system that acquires user image data and natural language data, generates story-based comic data from them, and further combines this with emotion recognition functionality to realize comic expressions that reflect the user's emotions. This system consists of the user's terminal and a central server.
[0719] Users access the system via a terminal, inputting episodes related to memories and uploading associated image data. In addition, the user's natural language data may contain emotional information. At this stage, an emotion engine is activated to recognize the user's emotions. This engine analyzes the tone and emotion of the user's statements from the text data. In some cases, voice data is also used to improve the accuracy of emotion recognition.
[0720] The server analyzes the received text data through a text analysis engine to extract characters, events, and perceived emotions from the story. Similarly, image data is analyzed through an image analysis engine to obtain perceived information. This data is correlated, and generative models are used to generate manga data.
[0721] The generative model uses story elements and cognitive information to determine appropriate character expressions and tones that reflect the user's emotions. The generated comic data is automatically adjusted to create a story that aligns with the user's intentions and emotions as a whole.
[0722] The generated comic data is sent to the user's device, where they can review its contents on the interface. Users can also request revisions to the comic's story and expression, and ultimately, the comic is saved or shared by the user. For example, if a user creates a comic about a fun family trip, the emotions of joy and surprise inferred from the audio and text will be reflected in the characters' lively expressions and the tone of the story. This allows users to easily create emotionally rich, professional-quality comics.
[0723] The following describes the processing flow.
[0724] Step 1:
[0725] Users access the system via their device, input episodes related to their memories in text format, and select and upload relevant image data. Audio data can also be recorded and provided as needed.
[0726] Step 2:
[0727] The terminal converts the input text data, image data, and audio data into a predetermined format and prepares them for transmission to the server.
[0728] Step 3:
[0729] The server receives data sent from the terminal and first passes the text data to a natural language processing engine for analysis. This analysis extracts story elements, characters, events, and emotional information.
[0730] Step 4:
[0731] The emotion engine processes text and audio data to analyze the user's emotions (e.g., joy, sadness, surprise) and adds this emotional information to story elements.
[0732] Step 5:
[0733] The server uses an image analysis engine to scan image data and identify people, backgrounds, and objects within the image. This information is acquired as recognition data.
[0734] Step 6:
[0735] The server integrates story elements extracted from text, emotional information, and recognition information from images, and inputs them into a generative model. This model automatically generates the comic's storyline and visual elements.
[0736] Step 7:
[0737] The generative model meticulously adjusts character expressions and story tone to reflect user emotions, creating completed manga data.
[0738] Step 8:
[0739] The server sends the generated manga data to the user's device. The user can view the received manga on the interface and save or share it in their preferred format. They can also request further adjustments from the system as needed.
[0740] (Example 2)
[0741] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0742] Modern information processing devices lack the means to easily generate content that expresses users' memories and experiences in a unique and emotionally rich way. Furthermore, existing technologies suffer from low accuracy in emotion recognition, leading to frequent discrepancies between the generated content and the user's intentions. Additionally, there is the challenge of efficiently incorporating user feedback on the generated content.
[0743] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0744] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for automatically generating content data based on the feature information and story elements using a generative model that takes the emotional information into consideration. This enables the generation of unique content that reflects the user's emotions and intentions, and flexible content modification based on user feedback.
[0745] "Image information" refers to data perceived through vision, and includes digital images such as photographs, illustrations, and diagrams.
[0746] "Natural language information" refers to information that includes the words and sentences that humans normally use, and includes audio data and text data.
[0747] "Feature information" refers to specific attributes or patterns extracted from image information, including shape, color, and various structural features.
[0748] "Story elements" are factors and components that form the framework of a narrative, obtained by analyzing natural language information, and include characters, events, and settings.
[0749] "Emotional information" refers to information about the emotions or psychological state that a user is trying to express, and is identified through the analysis of natural language information or voice data.
[0750] A "generative model" is an algorithm or system that learns from large amounts of data to generate new data, and is particularly used for the automatic generation of content.
[0751] "Content data" refers to digitized stories and works generated based on user input, and includes visual or textual expressions.
[0752] "Output device" refers to a device or system for displaying generated content data so that a user can view it, and includes computer screens and mobile device displays.
[0753] An "edit request" is an instruction from a user to correct or change generated content data, including revisions to the story, facial expressions, and tone.
[0754] This invention is an information processing system for generating emotionally rich comics based on users' memories and experiences. The system operates primarily through the cooperation of a server and a user terminal.
[0755] Users first access the system using their device and upload text, audio, and image information as natural language data. The information users input includes stories about memories and experiences, as well as photos taken during those times. The information entered by the user on their device is immediately sent to the server.
[0756] The server uses a natural language processing engine to analyze the input natural language information. It also incorporates emotion recognition capabilities, identifying the user's emotions from the input text and audio. This emotion information is used to adjust character portrayals and narrative tone in the generated content. Furthermore, the server utilizes an image analysis engine to extract character expressions and situations as feature information from image data.
[0757] Next, the server uses a generative AI model based on these feature information and story elements to automatically generate professional-quality comic data. This generative AI model is trained on a large amount of sample data, enabling it to generate content that matches the user's emotions and intentions. The generated comic data is a story that reflects the user's intentions and includes lively character portrayals. For example, if you want to turn a fun family trip into a comic, you can use a prompt such as, "Create a comic that reflects the fun of our family trip." The comic generated based on this prompt will frequently feature smiling characters and bright colors.
[0758] Finally, the server sends the generated comic data to the user's device. The user can review the generated content on their device and, if necessary, request revisions to the expression or storyline. This feedback is sent back to the server, and the content data is re-edited. Finally, the user can save or share the edited comic data. This makes it easy to generate and revise emotionally rich, professional-quality comics.
[0759] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0760] Step 1:
[0761] The user accesses the system using a terminal. The user inputs natural language information (text and audio data) and image information. This input is then prepared for transmission to the server. Specifically, the user selects a file on the terminal and presses the send button. The input data is then sent to the server via a communication protocol.
[0762] Step 2:
[0763] The server analyzes the received natural language information. Here, a natural language processing engine runs, analyzing sentence structure and vocabulary from the text data. An emotion recognition module is also used to identify the user's emotions from their mood and tone. The output consists of story elements and emotional information. Specifically, it calculates the emotional value of each word and phrase, and then combines these values to form the basic elements of the story.
[0764] Step 3:
[0765] The server analyzes the received image information. The image analysis engine runs and extracts key features from the image data. This includes face recognition and background information processing. The output is character expressions and key visual components. Specifically, it quantifies things like the smiles of people in the image and the brightness of the scenery, generating data to be used in the story.
[0766] Step 4:
[0767] The server automatically generates manga data using a generative AI model. Here, story elements and feature information are taken as input, and content based on the user's emotional information is generated. The output is manga data of professional quality. Specifically, the generative model reflects the emotional information and adjusts the characters' expressions and color tones. For example, it might use bright colors to express joy.
[0768] Step 5:
[0769] The server sends the completed manga data to the user's device. The user then reviews the data on their device. The generated manga appears on the display screen as output. Specifically, the user can view the manga through the device's interface and request revisions as needed. If further editing is required, the data is sent back to the server.
[0770] Step 6:
[0771] The user performs a final check and saves or shares the manga data. The device provides an interface to assist these user operations. As output, the final content is saved in a file format or sent to the selected sharing platform. Specifically, the user presses the "Save" button to store the data in a selected folder or uploads it to social media, etc.
[0772] (Application Example 2)
[0773] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0774] Traditionally, reflecting on family memories and everyday events has been limited to formats such as photographs, videos, and text, making it difficult to deeply express the nuances of emotions and events. Families with young children, in particular, need to record and share memories in a way that reflects their children's expressions and emotions. Furthermore, there is a lack of means to strengthen communication among family members and share memories in a more enjoyable way.
[0775] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0776] In this invention, the server includes means for acquiring image information, means for acquiring natural language information, and means for analyzing emotion data to identify emotional elements. This makes it possible to analyze various events in daily life along with emotional elements, generate and share family memories as expressive content.
[0777] "Means for acquiring image information" refers to a device or function for taking in image data from an external source and converting it into a format that can be processed within the system.
[0778] "Means for acquiring natural language information" refers to devices or functions that collect text data in human language and incorporate it into a system that can understand it.
[0779] "Means for extracting recognition data" refers to a device or function that analyzes image information, grasps the characteristics of objects, people, etc., and acquires them as data.
[0780] "Means for extracting narrative elements" refers to a device or function that analyzes natural language information and identifies elements that constitute a narrative, such as context and character settings.
[0781] "Means for analyzing emotional data and identifying emotional elements" refers to a device or function that analyzes text or audio data, determines the speaker's emotions, and extracts information based on that determination.
[0782] "Means for automatically generating content data using a generation algorithm" refers to a device or function that automatically creates new content using an algorithm based on existing data.
[0783] "Means for displaying on a display device" refers to a device or function that provides generated content data to the user visually.
[0784] "Means of transmitting to an information terminal" refers to a device or function that sends generated data to a terminal via a network and makes it receivable.
[0785] "Means for receiving change requests and editing content data" refers to a device or function that reflects user requests for modification and changes the generated data.
[0786] This system consists of home information terminals and a central server. First, users access the system through their home information terminals and input image data related to their memories and descriptions in natural language. The terminals are equipped with high-performance cameras and microphones, enabling them to acquire image and audio information with high accuracy.
[0787] The device sends the acquired image information to the server, where an image analysis engine operates to extract recognition data from the image. In addition, the device sends the user's speech content to the server as text data, where a natural language processing engine operates to analyze narrative elements and emotional data. For emotional analysis, an advanced AI engine is used to precisely identify the user's emotions. In this case, software such as Amazon Web Services' Rekognition may be used.
[0788] Once recognition data, narrative elements, and emotional elements are extracted, the server invokes a generation algorithm to generate comic data with a specific story based on these elements. This generation process utilizes open-source AI frameworks such as TensorFlow or PyTorch. The generated comic data is automatically adjusted to align with the story's flow and the user's emotions.
[0789] The completed content data is sent back to the user's home information terminal and displayed on the terminal's screen. The user can review the displayed content and, if necessary, send a correction request to the server. If a correction request is received, the server runs the algorithm again and automatically modifies the content data.
[0790] As a concrete example, if you want to create a comic strip about your child's sports day memories, you would input a prompt into the AI model saying, "Please generate a heroic and moving comic strip based on the moment when the user felt proud of their child's performance at the sports day," and appropriate content would be generated. In this way, family memories are recorded with rich emotion, and the whole family can enjoy it.
[0791] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0792] Step 1:
[0793] Users access the system through a home information terminal and input image data and text in natural language related to specific memories. This input image data and text are temporarily stored on the terminal. The terminal uses a camera and microphone to acquire high-resolution data.
[0794] Step 2:
[0795] The terminal sends the acquired image data to the server. During this process, the image data is compressed and securely transferred over the network. Once the image data arrives at the server, the server's image analysis engine begins operation.
[0796] Step 3:
[0797] The server uses an image analysis engine to analyze image data and extract recognition data. Specifically, it recognizes people and objects within the image and organizes that information as metadata. As a result of the analysis, results with specific tags are output.
[0798] Step 4:
[0799] Based on the natural language text entered by the user through the device, the device sends that data to the server. The natural language processing engine on the server receives the text and analyzes the text data.
[0800] Step 5:
[0801] The server uses a natural language processing engine to extract narrative and emotional elements from the input text. This process analyzes emotion-related words and context within the text to understand the user's emotions. The extracted data is output as narrative elements.
[0802] Step 6:
[0803] The server uses recognition data, narrative elements, and emotional elements obtained as a result of image analysis and natural language processing to invoke a generation algorithm and use this data to generate manga data with a specific storyline. A generation AI model is responsible for the operation, outputting thoughtfully designed text and prompts.
[0804] Step 7:
[0805] The generated manga data is sent from the server to the user's home information terminal. It is displayed on the terminal's screen and becomes available for the user to view.
[0806] Step 8:
[0807] The user reviews the generated content and, if necessary, sends a correction request from their device to the server. Based on this request, the server regenerates the content data to reflect the changes. The corrected content is then sent back to the device and displayed.
[0808] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0809] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0810] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0811] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0812] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0813] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0814] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0815] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0816] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0817] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0818] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0819] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0820] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0821] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0822] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0823] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0824] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0825] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0826] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0827] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0828] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0829] The following is further disclosed regarding the embodiments described above.
[0830] (Claim 1)
[0831] Means for acquiring image data,
[0832] Methods for acquiring natural language data,
[0833] A means for analyzing the aforementioned image data and extracting recognition information,
[0834] A means for analyzing the aforementioned natural language data and extracting story elements,
[0835] A means for automatically generating manga data based on the recognition information and story elements using a generative model,
[0836] A system that includes this.
[0837] (Claim 2)
[0838] The system according to claim 1, further comprising means for transmitting the generated manga data to a user terminal.
[0839] (Claim 3)
[0840] The system according to claim 1, further comprising means for receiving modification requests from the user terminal and editing the manga data.
[0841] "Example 1"
[0842] (Claim 1)
[0843] A means for acquiring digital visual information using an information recording device,
[0844] A means for acquiring descriptive data using an information recording device,
[0845] A means for analyzing the aforementioned digital visual information and extracting recognition information,
[0846] A means for analyzing the aforementioned descriptive data and extracting abstract concepts,
[0847] A means for automatically generating image information based on the recognition information and the abstract concept using machine learning,
[0848] A system that includes this.
[0849] (Claim 2)
[0850] The system according to claim 1, further comprising means for transmitting the generated image information to a user's information processing device.
[0851] (Claim 3)
[0852] The system according to claim 1, further comprising means for receiving a modification request from the user's information processing device and editing the image information.
[0853] "Application Example 1"
[0854] (Claim 1)
[0855] Means for acquiring image information,
[0856] Means for obtaining natural language information,
[0857] Means for analyzing the aforementioned image information and extracting recognition elements,
[0858] A means for analyzing the aforementioned natural language information and extracting narrative elements,
[0859] A means for automatically generating graphic data based on the recognition elements and narrative elements using a data generation model,
[0860] A means for transmitting the generated graphic data to an information distribution device via a communication network and for users to share it with others,
[0861] A system that includes this.
[0862] (Claim 2)
[0863] The system according to claim 1, further comprising means for transmitting the generated graphic data to a user's terminal.
[0864] (Claim 3)
[0865] The system according to claim 1, further comprising means for receiving a modification request from the user terminal and modifying the graphic data.
[0866] "Example 2 of combining an emotion engine"
[0867] (Claim 1)
[0868] Means for acquiring image information,
[0869] Means for obtaining natural language information,
[0870] A means for analyzing the aforementioned image information and extracting feature information,
[0871] A means for analyzing the aforementioned natural language information to extract story elements and emotional information,
[0872] A means for automatically generating content data based on the feature information and story elements, using a generative model that takes the aforementioned emotional information into consideration,
[0873] Information processing device including
[0874] (Claim 2)
[0875] The information processing apparatus according to claim 1, further comprising means for transmitting the generated content data to an output device.
[0876] (Claim 3)
[0877] The information processing apparatus according to claim 1, further comprising means for receiving an editing request from the output device and modifying the content data.
[0878] "Application example 2 when combining with an emotional engine"
[0879] (Claim 1)
[0880] Means for acquiring image information,
[0881] Means for obtaining natural language information,
[0882] A means for analyzing the aforementioned image information and extracting recognition data,
[0883] A means for analyzing the aforementioned natural language information and extracting narrative elements,
[0884] A means of analyzing emotional data to identify emotional elements,
[0885] A means for automatically generating content data based on the recognition data, the narrative elements, and the emotional elements using a generation algorithm,
[0886] Means for displaying the generated content data on a display device,
[0887] A system that includes this.
[0888] (Claim 2)
[0889] The system according to claim 1, further comprising means for transmitting the generated content data to an information terminal.
[0890] (Claim 3)
[0891] The system according to claim 1, further comprising means for receiving change requests from the information terminal and editing the content data. [Explanation of Symbols]
[0892] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. Means for acquiring image information, Means for obtaining natural language information, Means for analyzing the aforementioned image information and extracting recognition elements, A means for analyzing the aforementioned natural language information and extracting narrative elements, A means for automatically generating graphic data based on the recognition elements and narrative elements using a data generation model, A means for transmitting the generated graphic data to an information distribution device via a communication network and for users to share it with others, A system that includes this.
2. The system according to claim 1, further comprising means for transmitting the generated graphic data to the user's terminal.
3. The system according to claim 1, further comprising means for receiving a modification request from the user terminal and modifying the graphic data.