system

A system using natural language processing and computer vision to generate comics from user input data automatically addresses the challenge of creating memory comics, allowing users to easily and accurately visualize their experiences.

JP2026100688APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies require specialized skills and time to accurately reflect user-intended scenes and emotions in comics, making it difficult for ordinary people to easily save their memories in comic form.

Method used

A system that allows users to input text and image data, which is analyzed using natural language processing and computer vision to generate a storyboard, and then uses AI to automatically create a comic that faithfully reproduces their memories.

Benefits of technology

Enables users to easily and accurately visualize their memories in comic format without technical expertise, ensuring the comic reflects the intended scenes and emotions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100688000001_ABST
    Figure 2026100688000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] Means for receiving text data and image data provided by the user, A means for analyzing the aforementioned text data and extracting important information, A means for analyzing the aforementioned image data to identify visual elements, Means for generating a storyboard based on the aforementioned important information and visual elements, A means for automatically generating a comic based on the aforementioned storyboard, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot performed by at least one processor, the method including the steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern times, there is an increasing demand to effectively and visually save memories, but in order for ordinary people to cartoonize their own memories, specialized skills and time are required. Therefore, there is a need for a method that allows anyone to easily save their memories as comics. However, it is difficult for existing technologies to accurately reflect the scenes and emotions intended by the user in the comics. It is an object of the present invention to solve this technical problem and provide a memory comic generation system that can be easily used by anyone.

Means for Solving the Problems

[0005] This invention provides a system that allows anyone to easily automatically generate comics using text and image data provided by the user. This system includes means for analyzing received text data and extracting important information. It also includes means for analyzing image data and identifying visual elements. Based on these analysis results, it includes means for generating a storyboard and automatically generating a comic based on that storyboard. This makes it possible to faithfully reproduce the user's memories and easily save them in comic format.

[0006] "User-provided text data" refers to text information in which users describe memories and events.

[0007] "Means for receiving image data" refers to the system's function for receiving image files such as photos and illustrations uploaded by users.

[0008] "Methods for analyzing text data and extracting important information" refers to a function that analyzes text data using natural language processing technology to extract important elements and emotional expressions from a narrative.

[0009] "Means of analyzing image data to identify visual elements" refers to the function of identifying features such as people, objects, and backgrounds within an image using computer vision technology.

[0010] "Methods for generating storyboards" refers to a function that uses analyzed text and image data to plan the composition of each panel of a manga.

[0011] "Methods for automatically generating manga" refers to a function that uses AI technology to draw and complete the visual content of a manga based on a storyboard. [Brief explanation of the drawing]

[0012] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0013] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0014] First, the terms used in the following description will be explained.

[0015] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0016] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0017] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0018] In the following embodiments, the numbered communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0019] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0020] [First Embodiment]

[0021] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0022] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0023] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0024] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0025] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0026] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0027] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0028] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0029] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0030] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0031] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0032] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0033] The system according to the present invention consists of an application executed via the user's terminal and a server responsible for data processing and analysis. The purpose of this system is to enable users to easily save their memories in comic book format.

[0034] Users input and upload text and image data related to their memories through an interface on their device. The text data includes specific events and emotions, such as "memories of a family trip." The data specified by the user is sent to the server via the internet.

[0035] The server applies natural language processing algorithms to the received text data to extract key events, characters, and emotional expressions from the story. This provides crucial information for narrating the user's memories. Meanwhile, image data is analyzed using computer vision technology. This identifies visual elements within the image and extracts features of people, scenes, and emotions.

[0036] Based on these analysis results, the server generates a storyboard for the manga. This storyboard plans how each scene will be represented as a manga panel. Once the storyboard is established, the manga is automatically created using a generation AI. During this process, an art style reflecting the analyzed visual elements and extracted emotions is applied, resulting in a manga that faithfully recreates the user's memories.

[0037] As a concrete example, when a user inputs memories of a mountain hike with friends into the system, the server extracts the hiking route and anecdotes about the friends from the text, and analyzes the scenery and people in group photos from the pictures. As a result, the user can receive a comic strip focusing on key points such as "a group photo at the summit" and "interesting conversations along the way." This comic strip is provided in digital format, and the user can download and share it from their device.

[0038] As described above, the present invention facilitates the visual preservation of memories and realizes a form that is easy for general users without technical expertise to use.

[0039] The following describes the processing flow.

[0040] Step 1:

[0041] The user uses their device to input memorable episodes as text data and selects and uploads related image data. The user then clicks a button to send this data, preparing it to be sent to the server.

[0042] Step 2:

[0043] The terminal packages the text data entered by the user and the image data uploaded, and sends it to the server via the internet. The data is encrypted and transmitted to ensure security.

[0044] Step 3:

[0045] The server decodes the received data and stores it appropriately. It verifies that the data format and size are correct and returns an error message to the user if there are any problems. If successful, it starts the analysis process.

[0046] Step 4:

[0047] The server uses natural language processing algorithms to analyze text data. While understanding the context, it extracts important information such as key events, relevant characters, and emotional expressions. This information is then used to generate the story.

[0048] Step 5:

[0049] The server applies computer vision technology to analyze image data. It recognizes features of people, objects, and backgrounds within the image, and identifies visual elements. The obtained information is integrated with text data to assist in the generation of storyboards.

[0050] Step 6:

[0051] The server generates a storyboard based on the analysis results. It plans how to illustrate each scene in comic form, taking into account the flow of the episode and its climax. It creates a structure that effectively conveys the user's memories, considering panel layouts and visual effects.

[0052] Step 7:

[0053] The server initiates the process of drawing the comic using a generative AI, following the storyboard. Visual elements derived from images and the text storyline are integrated, and an art style is applied. This results in a comic that faithfully reflects the memories.

[0054] Step 8:

[0055] The server converts the completed manga into a file format and prepares it for distribution to the user. It makes it available for download via a link or file format and sends a notification to the user when the generation is complete.

[0056] Step 9:

[0057] Users receive notifications and download or view comics generated on their devices. This allows users to enjoy content that effectively visualizes their memories.

[0058] (Example 1)

[0059] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0060] There is a need for a way for users to visually preserve their memories and experiences and easily share them with others. However, there is a lack of means for ordinary users to generate professional-quality visual works without requiring advanced technology or a great deal of time. Furthermore, extracting the flow of a story and emotions from digital data containing a wealth of information and accurately reproducing them as a visual work is a difficult challenge.

[0061] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0062] In this invention, the server includes means for receiving digital data, means for analyzing the digital data and extracting key information, and means for generating a narrative structure. This makes it possible for users to easily create and share works that visually enrich their memories, even without technical knowledge.

[0063] "Digital data" refers to electronically stored information that a user provides to a system, and includes data in various formats, such as text and images.

[0064] "Key information" refers to information extracted from digital data that is central to the story, such as events, characters, and emotional expressions.

[0065] "Narrative structure" refers to the framework of the flow and development of a story, designed based on key information.

[0066] "Visual works" are visual content in the form of comics or illustrations that express the user's memories, automatically generated based on a narrative structure.

[0067] A "server" refers to a computing device that receives digital data and performs analysis and generation processes.

[0068] This invention is a system that generates visual works based on information provided by the user. The user inputs their memories as digital data using a terminal. Specifically, this digital data consists of text and images, and could include text data in the form of "memories of a family trip" or image data taken during the trip.

[0069] The user's device transmits the input digital data to the server via the internet. The server uses natural language processing (NLP) algorithms on the text data. Specifically, it uses NLP libraries (e.g., spaCy and BERT) to extract key events, characters, and emotional expressions from the text. The server also analyzes image data using computer vision technology, identifying visual elements within the image using libraries such as OpenCV and TENSORFLOW®.

[0070] The server generates a narrative structure based on the key information obtained from these analyses, and then generates a comic as a visual work. In this process, a generative AI model is used, which takes prompt sentences as input and applies art styles to automatically generate the work. For example, prompt sentences such as "highlight scenes from a family trip" or "memories of hiking in the mountains with friends" are used. As a result, users can easily obtain visually rich works of art without any technical knowledge.

[0071] The completed visual works are provided to users in digital format, which can be downloaded to their devices and easily shared with others. This system aims to improve the user experience by automating everything from receiving and analyzing digital data to generating visual works.

[0072] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0073] Step 1:

[0074] The user inputs digital data using a device. Specifically, the user inputs text data about their memories (e.g., "Memories of a family trip") and uploads corresponding image data using the file selection function. This digital data becomes the input data sent to the next step.

[0075] Step 2:

[0076] The user's device transmits the entered digital data to the server. Here, the data is encrypted over the internet and securely uploaded to the server. The device's role is to properly prepare the data and ensure its reliable transfer to the server.

[0077] Step 3:

[0078] The server applies a natural language processing algorithm to the received text data. In this process, the server uses an NLP library (e.g., spaCy or BERT) to analyze the text data and extract key information that forms the core of the story. The output of this process is information about the events, characters, and emotional expressions of the story.

[0079] Step 4:

[0080] The server analyzes image data using computer vision technology. During this process, the server utilizes libraries such as OpenCV and TensorFlow to identify visual elements within the image and extract features such as people and scenes. The output obtained from this process represents the primary visual information expressed in the image.

[0081] Step 5:

[0082] The server combines information extracted from text and images to generate a narrative structure. Based on the key information obtained, the server designs the narrative flow and scene layout of the visual work, and this plan is output.

[0083] Step 6:

[0084] The server automatically generates visual works using a generative AI model. In this process, the AI ​​applies an art style based on a prompt (e.g., "highlight scenes from a family trip") to generate a visual work that reflects the user's memories. The output of this process is a completed visual work in digital format.

[0085] Step 7:

[0086] The server sends the generated visual artwork to the user's device. The user can then view, download, and share the visual artwork on their device. This final output visually preserves the user's memories.

[0087] (Application Example 1)

[0088] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0089] In modern society, many individuals want to preserve and share their daily memories, but there is a lack of visual means to do so. Furthermore, there is a need for a method that is easy for even tech-savvy general users to use and can automatically visualize memories. In this context, the challenge is to provide a way for individual users to more intuitively record their daily lives and visually enjoy their personal memories.

[0090] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0091] In this invention, the server includes a device for receiving text and image information provided by the user, a device for analyzing the text information and extracting important information, and a device for analyzing the image information and identifying visual elements. This makes it possible for users without specialized skills to easily record memories of their daily lives using a device in their home and to automatically generate comics based on that data.

[0092] A "user" refers to an individual who uses the system to record and visually preserve memories.

[0093] "Textual information" refers to text or data in text format provided by the user.

[0094] "Image information" refers to data in the form of photographs or graphics provided by the user.

[0095] "Device" refers to a configuration of hardware and software used to perform processes such as receiving, analyzing, and generating information.

[0096] "Important information" refers to the core data and concepts of a story, extracted by analyzing textual information.

[0097] "Visual elements" refer to the constituent elements and features within an image that are identified through the analysis of image information.

[0098] "Scene composition" refers to the structure of each scene in a manga, planned based on important information and visual elements.

[0099] "Manga" refers to a type of artwork that uses automatically generated images to visually represent a user's memories and develop them into a story.

[0100] A "household automated device" refers to an autonomously operating device that records a user's daily life and provides services tailored to the user's needs based on the recorded data.

[0101] One embodiment of this invention provides a novel method for users to save memories as comics. First, the user uses a data input terminal to input and transmit textual and image information related to the memories. This data is then sent to a server in the cloud via the internet.

[0102] The server uses a natural language processing program to analyze textual information. This program can utilize natural language processing libraries such as Hugging Face Transformers to extract important storylines, characters, and emotional expressions from the provided textual information. In parallel, image information is analyzed using computer vision technologies such as OpenCV to identify visual elements. This analysis makes it possible to extract features such as people, scenes, and backgrounds.

[0103] Based on the extracted textual information and visual elements, the server generates a scene structure and plans how to represent the comic in each scene. A generative AI model uses this information to automatically create the comic, resulting in a visually engaging and emotionally resonant story that faithfully reflects the user's memories. Finally, the generated comic is provided to the user in digital format, which they can download from their device or share.

[0104] As a concrete example, a user might input a request into the device such as, "I'd like you to turn my memories of last week's family mountain climbing trip into a comic strip." An example of a prompt to the generation AI model in this case might be, "Please create a touching and fun story based on the user's memories of mountain climbing." This allows the user to visually revisit their memories and preserve them in an emotionally rich way.

[0105] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0106] Step 1:

[0107] The terminal accepts text and image information from the user as input. This allows the user to select and send text and images related to specific memories. The output from the terminal is the text and image files selected by the user.

[0108] Step 2:

[0109] The server performs natural language processing on text information received from the terminal. This processing uses the Hugging Face Transformers' NER (Named Entity Recognition) model to extract important storylines and emotional expressions from the text. The output is a data structure containing this important information.

[0110] Step 3:

[0111] The server performs computer vision analysis using the received image information as input. Using the OpenCV library, it identifies visual features such as people, backgrounds, and scenes within the image. The analysis results are output as visual elements of the image information.

[0112] Step 4:

[0113] The server uses the key information extracted in step 2 and the visual elements from step 3 as input to generate a scene composition. This is a plan for turning the user's memories into a story and determines how to materialize each scene of the comic. The output is a storyboard that describes the scenes in detail.

[0114] Step 5:

[0115] The server uses the generated storyboard as input to run an AI model that automatically generates the final comic. The AI ​​model used here reflects the specified art style and the user's emotions. The output is a comic in digital format.

[0116] Step 6:

[0117] Users download the manga generated by the server to their devices and then view or share it. This final output is provided as an electronic data file, allowing users to visually enjoy their memories at any time.

[0118] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0119] The system according to the present invention aims to visualize user-inputted memories in a comic book format and generate content that is more faithful to the user's emotions by using an emotion engine. The system consists of an application running on the user's terminal and a server.

[0120] Users input and upload text-based memory stories and related photos through their devices. Once user input is complete, the device sends this data to the server. At this point, the emotion engine receives the text and image data and begins analyzing them.

[0121] The emotion engine on the server analyzes the user's text using natural language processing to extract emotional elements. Simultaneously, it performs visual emotion analysis on image data to identify emotional elements in scenes and facial expressions. This allows the system to clearly understand the emotions the user intended.

[0122] The server uses the extracted emotional information to generate storyboards. In the storyboards, scenes are selected and structured according to the emotional elements, and the flow of the episode is expressed with rich emotion. Once the storyboards are complete, the generating AI draws the comic based on them. The results of the emotion engine are reflected, so the characters' expressions and the atmosphere of the scenes match the user's experience.

[0123] For example, if a user enters memories of a fun trip with friends, the emotion engine extracts emotions such as "fun" and "friendship" from the text and images. The server creates an emotion-infused storyboard and generates a comic that vividly portrays the highlights of the trip and intimate moments with friends. Finally, the user can download the comic and enjoy a work that vividly visualizes their memories.

[0124] Thus, the system of the present invention, through emotion recognition, realizes a more faithful and emotionally resonant comic adaptation of memories, providing users with a richer experience.

[0125] The following describes the processing flow.

[0126] Step 1:

[0127] Users use their devices to input text and image data related to their memories and prepare to send them. First, they enter a story about their memory in text, and then they select and upload related photos.

[0128] Step 2:

[0129] The terminal bundles the data entered by the user and sends it to the server via the internet. During transmission, the data's integrity is checked, and encryption is applied to ensure security.

[0130] Step 3:

[0131] The server first passes the received data to the emotion engine to begin analysis. The text data is analyzed using natural language processing techniques to identify emotional keywords and phrases within the text and determine the associated emotions.

[0132] Step 4:

[0133] The server uses an emotion engine to analyze image data, identifying facial expressions and the emotions of the scene from the images. Based on visual features, it determines what kind of emotion is being expressed.

[0134] Step 5:

[0135] The server generates a storyboard based on emotional information extracted by the emotion engine. It selects and arranges scenes that make up the episode's narrative, designing how they appear emotionally rich.

[0136] Step 6:

[0137] The server uses a generation AI based on the storyboard to automatically generate the comic. In this process, visual elements such as facial expressions and backgrounds are reflected to faithfully reproduce the emotions extracted from the characters and scenes.

[0138] Step 7:

[0139] The server creates the completed manga as a digital file and prepares it for user access. It also notifies the user that the manga is complete.

[0140] Step 8:

[0141] Users receive notifications and download or view the generated comics via their devices. At this time, users can enjoy their own memories as emotionally richly expressed comic works.

[0142] (Example 2)

[0143] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0144] Traditional content generation systems have struggled to emotionally recreate user experiences, often resulting in visualized content that fails to meet user expectations. Such systems merely visualize data without faithfully reflecting emotional nuances or complex feelings. Consequently, users often lack content that accurately visualizes their memories, making it difficult to enjoy a comfortable experience.

[0145] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0146] In this invention, the server includes means for receiving a data format for a user to provide information, means for performing natural language manipulation to process the data format and obtaining information including emotions, and means for performing analysis on visual information and identifying special elements associated with emotions. This enables the automatic generation of emotionally rich content that is faithful to the user's experience.

[0147] "Data formats for users to provide information" refers to the format of digital data, such as text and images, that users use to record their own experiences and memories.

[0148] "Natural language processing" refers to the technology that enables computers to understand and process human language, particularly the process of extracting information from text and analyzing its meaning.

[0149] "Information containing emotions" refers to information that indicates elements related to emotions such as joy and sadness, extracted from data provided by the user.

[0150] "Visual information" refers to digital data that is presented visually, such as images and videos provided by the user.

[0151] "Special elements related to emotions" are elements extracted from visual information that represent emotional characteristics related to the scene or facial expression.

[0152] "Methods for generating scenarios" refer to techniques for constructing the flow and arrangement of a story based on extracted emotional information and special elements.

[0153] "Methods for automatically generating illustrations" refer to technologies that automatically create visual content such as images and comics based on a generated scenario.

[0154] The system according to this invention is a platform for visualizing the user's experience in an emotionally rich way using the user's digital data. The system mainly consists of an application that runs on the user's terminal and a server located remotely.

[0155] Users launch a web browser or dedicated application on their device, enter a memorable episode as text, and select related photos. JPEG images and plain text files are commonly used as data formats for collecting this information. Once the user has finished entering the information, the device sends this data to the server via the internet.

[0156] The server utilizes natural language processing (NLP) techniques to analyze the received text data. Specifically, open-source NLP libraries such as SpaCy and NLTK are used to extract information about emotions and storylines from the text. Simultaneously, the server uses OpenCV and TensorFlow to perform visual analysis of image data, identifying visual features and emotional elements.

[0157] By integrating these two analysis results, the system generates a storyboard that reflects the emotions intended by the user. Based on this storyboard, a generating AI model automatically draws the comic. The generation process utilizes existing AI frameworks, such as PyTorch and Keras, to maximize the quality of the illustrations and the accuracy of emotional expression.

[0158] As a concrete example, if a user wants to visualize "memories of a summer trip with friends," they might enter a prompt like this: "I want to turn my memories of a summer beach trip with friends into a comic strip. The photo is a group shot taken on the beach. Please visualize this fun moment, emphasizing friendship and enjoyment." Based on this prompt, the system can provide emotionally rich visual content that meets the user's expectations.

[0159] Ultimately, users can download the generated comics in digital format and visually enjoy the memories and experiences recreated through the system. This makes it possible to deliver works that vividly and emotionally reconstruct memories to users.

[0160] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0161] Step 1:

[0162] The user uses an application on their device to input their memories as text data and upload related photos. The input in this step consists of text and image data provided by the user. The device formats this data and prepares it for transmission to the server. Specifically, it checks the encoding of the text file and converts the image data to JPEG or PNG format so that it can be sent to the server in the correct format.

[0163] Step 2:

[0164] The terminal sends the data entered by the user to the server. The input consists of text and image data formatted in step 1. The server receives this data and verifies its integrity. It also manages the data by storing it in a database with a user-specific ID. The output is data stored in a parseable format.

[0165] Step 3:

[0166] The server analyzes the received text data using natural language processing techniques. The input is text data stored in a database. Specifically, libraries such as SpaCy and NLTK are used to extract keywords and phrases that indicate emotion. Through this analysis, emotion categories and important events within the text are identified, and an emotion profile is generated based on these. The output is the emotion profile.

[0167] Step 4:

[0168] The server visually analyzes image data. The input is image data from a database. Using OpenCV and TensorFlow, it analyzes facial expressions of people and scene features within the images, identifying visual emotional elements. During this process, it recognizes emotional changes and location from frame to frame. The output is the result of the visual emotion analysis.

[0169] Step 5:

[0170] The server integrates sentiment information extracted from text and images. The input is the output of steps 3 and 4. This is the process of combining each dataset to generate a storyboard that reflects the user's intended emotions. This storyboard includes the flow of emotions and key points of the narrative. The output is the generated storyboard.

[0171] Step 6:

[0172] The generative AI model automatically generates comics based on storyboards. The input is a completed storyboard. Specifically, it utilizes PyTorch and Keras to depict characters and backgrounds in detail for each scene. The AI ​​makes contextually appropriate creative decisions to improve the quality of the illustrations. The output is a visually rich, emotionally reproducible comic-style content.

[0173] Step 7:

[0174] The user downloads the final comic and checks its contents. The input is the completed comic file generated on the server. The user can save this to their device and enjoy the visualized memories. The output is the digital comic saved on the user's device.

[0175] (Application Example 2)

[0176] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0177] Traditional technologies have had limitations in expressing users' memories and experiences in an emotionally rich and visually stimulating way. In particular, visualizing everyday episodes with genuine emotion is difficult, and the need for advanced skills on the part of the user to express them is a problem. Furthermore, the lack of mechanisms for easily sharing such content has prevented many users from creatively recording and sharing their experiences.

[0178] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0179] In this invention, the server includes means for receiving text data and still image data provided by the user, means for analyzing the text data to extract important information, and means for analyzing the still image data to identify visual elements. This makes it possible for users to generate comics that visualize their memories in an emotionally rich and faithful way, and to easily view and distribute them through information terminals.

[0180] A "user" is an entity that uses the system to input memories and generate emotionally rich, visualized content.

[0181] "Text data" refers to information in text format that users provide to the system, such as memories and experiences expressed in written form.

[0182] "Still image data" refers to image-format information provided by the user to the system, including visual content associated with text data.

[0183] "Visual elements" refer to features and emotions identified from still image data, and are elements that influence the composition and expression of visual content.

[0184] A "storyboard" is a blueprint for a visual story, generated based on important information and visual elements, and provides a framework for the comic.

[0185] "Manga" is a type of graphical content automatically generated based on storyboards, which expresses the user's memories and experiences in a rich and emotional way.

[0186] An "information terminal" refers to a device used by users to view and distribute generated content, and is the hardware on which applications run.

[0187] This invention is a system for users to visualize their memories in an emotionally rich way. The system is primarily implemented via a user terminal and a server.

[0188] The user first inputs text and still image data using an information terminal. This data is then transmitted to the server via a communication interface such as Bluetooth or Wi-Fi.

[0189] The server uses a natural language processing library (e.g., Spacy) to analyze the received text data. Based on the analysis results, it extracts emotions and important information contained in the text. Meanwhile, for still image data, an image analysis module (e.g., OpenCV) is used to extract visual elements. At this time, facial expressions and the atmosphere of the scene in the image are identified.

[0190] Next, the server generates a storyboard based on the extracted information. This storyboard serves as a framework for automatically generating comics using a generation AI model (for example, the DALL-E model).

[0191] The generated comics can be viewed by users on their information terminals and shared with others. Furthermore, emotion recognition technology enables rich emotional expression that faithfully reflects the user's experience.

[0192] For example, if a user enters memories of a holiday with their family, a comic strip will be generated that includes scenes of enjoyable conversations and beautiful scenery. An example of a prompt message would be, "I want to turn my fun family holiday memories into a comic strip. I want you to capture the moments of smiles and the beautiful scenery."

[0193] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0194] Step 1:

[0195] The user uses an information terminal to input text and still image data. The input data is temporarily stored on the terminal and waits until it is ready to be sent.

[0196] Step 2:

[0197] The terminal transmits the entered text data and still image data to the server via a communication interface. The server then receives the data and stores it in data storage for initial processing.

[0198] Step 3:

[0199] The server uses a natural language processing library to analyze received text data and extract important information. The input is text data, and the output generates sentiment tags and important keywords. The analysis process calculates the sentiment value of each word and phrase to grasp the overall tone of the text.

[0200] Step 4:

[0201] The server uses an image analysis module to analyze still image data and identify visual elements. The input is image data, and the output includes emotional features and object information within the image. The analysis process visually evaluates facial expressions and the atmosphere of a scene, and converts them into visualized data.

[0202] Step 5:

[0203] The server generates storyboards based on the extracted key information and visual elements. In this step, it connects the emotions in the text with the visual elements of the images to create the scene composition. The output is storyboard data showing the composition of each page of the manga.

[0204] Step 6:

[0205] The server automatically generates comics based on storyboards using a generation AI model. Storyboard data is provided as input, and the completed comic images are generated as output. The AI ​​model automatically designs the visual content to reflect the expressed emotions.

[0206] Step 7:

[0207] Finally, the server sends the generated comic to the information terminal, making it available for users to view and share. This allows users to relive their memories in an emotionally rich, visualized form and share them with others.

[0208] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0209] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0210] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0211] [Second Embodiment]

[0212] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0213] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0214] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0215] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0216] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0217] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0218] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0219] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0220] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0221] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0222] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0223] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0224] The system according to the present invention consists of an application executed via the user's terminal and a server responsible for data processing and analysis. The purpose of this system is to enable users to easily save their memories in comic book format.

[0225] Users input and upload text and image data related to their memories through an interface on their device. The text data includes specific events and emotions, such as "memories of a family trip." The data specified by the user is sent to the server via the internet.

[0226] The server applies natural language processing algorithms to the received text data to extract key events, characters, and emotional expressions from the story. This provides crucial information for narrating the user's memories. Meanwhile, image data is analyzed using computer vision technology. This identifies visual elements within the image and extracts features of people, scenes, and emotions.

[0227] Based on these analysis results, the server generates a storyboard for the manga. This storyboard plans how each scene will be represented as a manga panel. Once the storyboard is established, the manga is automatically created using a generation AI. During this process, an art style reflecting the analyzed visual elements and extracted emotions is applied, resulting in a manga that faithfully recreates the user's memories.

[0228] As a concrete example, when a user inputs memories of a mountain hike with friends into the system, the server extracts the hiking route and anecdotes about the friends from the text, and analyzes the scenery and people in group photos from the pictures. As a result, the user can receive a comic strip focusing on key points such as "a group photo at the summit" and "interesting conversations along the way." This comic strip is provided in digital format, and the user can download and share it from their device.

[0229] As described above, the present invention facilitates the visual preservation of memories and realizes a form that is easy for general users without technical expertise to use.

[0230] The following describes the processing flow.

[0231] Step 1:

[0232] The user uses their device to input memorable episodes as text data and selects and uploads related image data. The user then clicks a button to send this data, preparing it to be sent to the server.

[0233] Step 2:

[0234] The terminal packages the text data entered by the user and the image data uploaded, and sends it to the server via the internet. The data is encrypted and transmitted to ensure security.

[0235] Step 3:

[0236] The server decodes the received data and stores it appropriately. It verifies that the data format and size are correct and returns an error message to the user if there are any problems. If successful, it starts the analysis process.

[0237] Step 4:

[0238] The server uses natural language processing algorithms to analyze text data. While understanding the context, it extracts important information such as key events, relevant characters, and emotional expressions. This information is then used to generate the story.

[0239] Step 5:

[0240] The server applies computer vision technology to analyze image data. It recognizes features of people, objects, and backgrounds within the image, and identifies visual elements. The obtained information is integrated with text data to assist in the generation of storyboards.

[0241] Step 6:

[0242] The server generates a storyboard based on the analysis results. It plans how to illustrate each scene in comic form, taking into account the flow of the episode and its climax. It creates a structure that effectively conveys the user's memories, considering panel layouts and visual effects.

[0243] Step 7:

[0244] The server initiates the process of drawing the comic using a generative AI, following the storyboard. Visual elements derived from images and the text storyline are integrated, and an art style is applied. This results in a comic that faithfully reflects the memories.

[0245] Step 8:

[0246] The server converts the completed manga into a file format and prepares it for distribution to the user. It makes it available for download via a link or file format and sends a notification to the user when the generation is complete.

[0247] Step 9:

[0248] Users receive notifications and download or view comics generated on their devices. This allows users to enjoy content that effectively visualizes their memories.

[0249] (Example 1)

[0250] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0251] There is a need for a way for users to visually preserve their memories and experiences and easily share them with others. However, there is a lack of means for ordinary users to generate professional-quality visual works without requiring advanced technology or a great deal of time. Furthermore, extracting the flow of a story and emotions from digital data containing a wealth of information and accurately reproducing them as a visual work is a difficult challenge.

[0252] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0253] In this invention, the server includes means for receiving digital data, means for analyzing the digital data and extracting key information, and means for generating a narrative structure. This makes it possible for users to easily create and share works that visually enrich their memories, even without technical knowledge.

[0254] "Digital data" refers to electronically stored information that a user provides to a system, and includes data in various formats, such as text and images.

[0255] "Key information" refers to information extracted from digital data that is central to the story, such as events, characters, and emotional expressions.

[0256] "Narrative structure" refers to the framework of the flow and development of a story, designed based on key information.

[0257] "Visual works" are visual content in the form of comics or illustrations that express the user's memories, automatically generated based on a narrative structure.

[0258] A "server" refers to a computing device that receives digital data and performs analysis and generation processes.

[0259] This invention is a system that generates visual works based on information provided by the user. The user inputs their memories as digital data using a terminal. Specifically, this digital data consists of text and images, and could include text data in the form of "memories of a family trip" or image data taken during the trip.

[0260] The user's device transmits the input digital data to the server via the internet. The server uses natural language processing (NLP) algorithms on the text data. Specifically, it uses NLP libraries (e.g., spaCy and BERT) to extract key events, characters, and emotional expressions from the text. The server also analyzes image data using computer vision techniques, identifying visual elements within the image using libraries such as OpenCV and TensorFlow.

[0261] The server generates a narrative structure based on the key information obtained from these analyses, and then generates a comic as a visual work. In this process, a generative AI model is used, which takes prompt sentences as input and applies art styles to automatically generate the work. For example, prompt sentences such as "highlight scenes from a family trip" or "memories of hiking in the mountains with friends" are used. As a result, users can easily obtain visually rich works of art without any technical knowledge.

[0262] The completed visual works are provided to users in digital format, which can be downloaded to their devices and easily shared with others. This system aims to improve the user experience by automating everything from receiving and analyzing digital data to generating visual works.

[0263] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0264] Step 1:

[0265] The user inputs digital data using a device. Specifically, the user inputs text data about their memories (e.g., "Memories of a family trip") and uploads corresponding image data using the file selection function. This digital data becomes the input data sent to the next step.

[0266] Step 2:

[0267] The user's device transmits the entered digital data to the server. Here, the data is encrypted over the internet and securely uploaded to the server. The device's role is to properly prepare the data and ensure its reliable transfer to the server.

[0268] Step 3:

[0269] The server applies a natural language processing algorithm to the received text data. In this process, the server uses an NLP library (e.g., spaCy or BERT) to analyze the text data and extract key information that forms the core of the story. The output of this process is information about the events, characters, and emotional expressions of the story.

[0270] Step 4:

[0271] The server analyzes image data using computer vision technology. During this process, the server utilizes libraries such as OpenCV and TensorFlow to identify visual elements within the image and extract features such as people and scenes. The output obtained from this process represents the primary visual information expressed in the image.

[0272] Step 5:

[0273] The server combines information extracted from text and images to generate a narrative structure. Based on the key information obtained, the server designs the narrative flow and scene layout of the visual work, and this plan is output.

[0274] Step 6:

[0275] The server automatically generates visual works using a generative AI model. In this process, the AI ​​applies an art style based on a prompt (e.g., "highlight scenes from a family trip") to generate a visual work that reflects the user's memories. The output of this process is a completed visual work in digital format.

[0276] Step 7:

[0277] The server sends the generated visual work to the user's terminal. The user can view the visual work on the terminal and perform downloads and sharing. Through this final output, the user's memories are visually preserved.

[0278] (Application Example 1)

[0279] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0280] In modern society, many individuals want to retain and share their daily memories, but there is a lack of visual preservation means for this purpose. Also, there is a need for means that are easy to use for ordinary users who are not familiar with technology and can be automatically visualized. In such circumstances, it is an issue to provide a method by which individual users can record their daily lives more intuitively and enjoy their individual memories visually.

[0281] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following respective means.

[0282] In this invention, the server includes a device that receives character information and image information provided by the user, a device that analyzes the character information to extract important information, and a device that analyzes the image information to identify visual elements. As a result, even users without specialized skills can easily record their daily life memories using in-home devices and automatically generate comics based on that data.

[0283] "User" refers to an individual who uses the system to record memories and visually preserve them.

[0284] "Character information" refers to data in the form of text or articles provided by the user.

[0285] "Image information" refers to data in the form of photos or graphics provided by the user.

[0286] "Device" refers to the hardware and software configuration for executing processes such as information reception, analysis, and generation.

[0287] "Important information" refers to the data and concepts that are the core of the story, extracted by analyzing character information.

[0288] "Visual element" refers to the components and features within an image identified by analyzing image information.

[0289] "Scene composition" refers to the composition of each scene in a comic planned based on important information and visual elements.

[0290] "Comic" refers to a pictorial work automatically generated to visually represent the user's memories and unfold as a story.

[0291] "Household automatic device" refers to a device that operates autonomously to record the user's daily life and provide services according to the user's desires based on the recorded data.

[0292] The embodiments for implementing this invention provide a new method for the user to save memories as comics. First, the user uses a terminal for data input to input and transmit character information and image information related to the memories. These data are sent to a server on the cloud via the Internet.

[0293] The server uses a natural language processing program to analyze textual information. This program can utilize natural language processing libraries such as Hugging Face Transformers to extract important storylines, characters, and emotional expressions from the provided textual information. In parallel, image information is analyzed using computer vision technologies such as OpenCV to identify visual elements. This analysis makes it possible to extract features such as people, scenes, and backgrounds.

[0294] Based on the extracted textual information and visual elements, the server generates a scene structure and plans how to represent the comic in each scene. A generative AI model uses this information to automatically create the comic, resulting in a visually engaging and emotionally resonant story that faithfully reflects the user's memories. Finally, the generated comic is provided to the user in digital format, which they can download from their device or share.

[0295] As a concrete example, a user might input a request into the device such as, "I'd like you to turn my memories of last week's family mountain climbing trip into a comic strip." An example of a prompt to the generation AI model in this case might be, "Please create a touching and fun story based on the user's memories of mountain climbing." This allows the user to visually revisit their memories and preserve them in an emotionally rich way.

[0296] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0297] Step 1:

[0298] The terminal accepts text and image information from the user as input. This allows the user to select and send text and images related to specific memories. The output from the terminal is the text and image files selected by the user.

[0299] Step 2:

[0300] The server performs natural language processing using the character information received from the terminal as input. For this processing, the NER (Named Entity Recognition) model of Hugging Face Transformers is used to extract important storylines and emotional expressions from the text. The output is a data structure containing this important information.

[0301] Step 3:

[0302] The server performs computer vision analysis using the received image information as input. The OpenCV library is utilized to identify visual features such as people, backgrounds, and scenarios within the image. The analysis results are output as visual elements of the image information.

[0303] Step 4:

[0304] The server generates a scene composition using the important information extracted in Step 2 and the visual elements from Step 3 as input. This is a plan for storytelling the user's memories and determines how to materialize each scene of the comic. The output is a storyboard that describes the scenes in detail.

[0305] Step 5:

[0306] The server executes a generative AI model using the generated storyboard as input to automatically generate the final comic. The AI model used here reflects the specified art style and the user's emotions. The output is a digital format comic.

[0307] Step 6:

[0308] The user downloads the comic generated by the server to the terminal and browses or shares it. This final output is provided as an electronic data file, and the user can visually enjoy their memories at any time.

[0309] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0310] The system according to the present invention aims to visualize user-inputted memories in a comic book format and generate content that is more faithful to the user's emotions by using an emotion engine. The system consists of an application running on the user's terminal and a server.

[0311] Users input and upload text-based memory stories and related photos through their devices. Once user input is complete, the device sends this data to the server. At this point, the emotion engine receives the text and image data and begins analyzing them.

[0312] The emotion engine on the server analyzes the user's text using natural language processing to extract emotional elements. Simultaneously, it performs visual emotion analysis on image data to identify emotional elements in scenes and facial expressions. This allows the system to clearly understand the emotions the user intended.

[0313] The server uses the extracted emotional information to generate storyboards. In the storyboards, scenes are selected and structured according to the emotional elements, and the flow of the episode is expressed with rich emotion. Once the storyboards are complete, the generating AI draws the comic based on them. The results of the emotion engine are reflected, so the characters' expressions and the atmosphere of the scenes match the user's experience.

[0314] For example, if a user enters memories of a fun trip with friends, the emotion engine extracts emotions such as "fun" and "friendship" from the text and images. The server creates an emotion-infused storyboard and generates a comic that vividly portrays the highlights of the trip and intimate moments with friends. Finally, the user can download the comic and enjoy a work that vividly visualizes their memories.

[0315] Thus, the system of the present invention, through emotion recognition, realizes a more faithful and emotionally resonant comic adaptation of memories, providing users with a richer experience.

[0316] The following describes the processing flow.

[0317] Step 1:

[0318] Users use their devices to input text and image data related to their memories and prepare to send them. First, they enter a story about their memory in text, and then they select and upload related photos.

[0319] Step 2:

[0320] The terminal bundles the data entered by the user and sends it to the server via the internet. During transmission, the data's integrity is checked, and encryption is applied to ensure security.

[0321] Step 3:

[0322] The server first passes the received data to the emotion engine to begin analysis. The text data is analyzed using natural language processing techniques to identify emotional keywords and phrases within the text and determine the associated emotions.

[0323] Step 4:

[0324] The server uses an emotion engine to analyze image data, identifying facial expressions and the emotions of the scene from the images. Based on visual features, it determines what kind of emotion is being expressed.

[0325] Step 5:

[0326] The server generates a storyboard based on emotional information extracted by the emotion engine. It selects and arranges scenes that make up the episode's narrative, designing how they appear emotionally rich.

[0327] Step 6:

[0328] The server uses a generation AI based on the storyboard to automatically generate the comic. In this process, visual elements such as facial expressions and backgrounds are reflected to faithfully reproduce the emotions extracted from the characters and scenes.

[0329] Step 7:

[0330] The server creates the completed manga as a digital file and prepares it for user access. It also notifies the user that the manga is complete.

[0331] Step 8:

[0332] Users receive notifications and download or view the generated comics via their devices. At this time, users can enjoy their own memories as emotionally richly expressed comic works.

[0333] (Example 2)

[0334] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0335] Traditional content generation systems have struggled to emotionally recreate user experiences, often resulting in visualized content that fails to meet user expectations. Such systems merely visualize data without faithfully reflecting emotional nuances or complex feelings. Consequently, users often lack content that accurately visualizes their memories, making it difficult to enjoy a comfortable experience.

[0336] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0337] In this invention, the server includes means for receiving a data format for a user to provide information, means for performing natural language manipulation to process the data format and obtaining information including emotions, and means for performing analysis on visual information and identifying special elements associated with emotions. This enables the automatic generation of emotionally rich content that is faithful to the user's experience.

[0338] "Data formats for users to provide information" refers to the format of digital data, such as text and images, that users use to record their own experiences and memories.

[0339] "Natural language processing" refers to the technology that enables computers to understand and process human language, particularly the process of extracting information from text and analyzing its meaning.

[0340] "Information containing emotions" refers to information that indicates elements related to emotions such as joy and sadness, extracted from data provided by the user.

[0341] "Visual information" refers to digital data that is presented visually, such as images and videos provided by the user.

[0342] "Special elements related to emotions" are elements extracted from visual information that represent emotional characteristics related to the scene or facial expression.

[0343] "Methods for generating scenarios" refer to techniques for constructing the flow and arrangement of a story based on extracted emotional information and special elements.

[0344] "Methods for automatically generating illustrations" refer to technologies that automatically create visual content such as images and comics based on a generated scenario.

[0345] The system according to this invention is a platform for visualizing the user's experience in an emotionally rich way using the user's digital data. The system mainly consists of an application that runs on the user's terminal and a server located remotely.

[0346] Users launch a web browser or dedicated application on their device, enter a memorable episode as text, and select related photos. JPEG images and plain text files are commonly used as data formats for collecting this information. Once the user has finished entering the information, the device sends this data to the server via the internet.

[0347] The server utilizes natural language processing (NLP) techniques to analyze the received text data. Specifically, open-source NLP libraries such as SpaCy and NLTK are used to extract information about emotions and storylines from the text. Simultaneously, the server uses OpenCV and TensorFlow to perform visual analysis of image data, identifying visual features and emotional elements.

[0348] By integrating these two analysis results, the system generates a storyboard that reflects the emotions intended by the user. Based on this storyboard, a generating AI model automatically draws the comic. The generation process utilizes existing AI frameworks, such as PyTorch and Keras, to maximize the quality of the illustrations and the accuracy of emotional expression.

[0349] As a concrete example, if a user wants to visualize "memories of a summer trip with friends," they might enter a prompt like this: "I want to turn my memories of a summer beach trip with friends into a comic strip. The photo is a group shot taken on the beach. Please visualize this fun moment, emphasizing friendship and enjoyment." Based on this prompt, the system can provide emotionally rich visual content that meets the user's expectations.

[0350] Ultimately, users can download the generated comics in digital format and visually enjoy the memories and experiences recreated through the system. This makes it possible to deliver works that vividly and emotionally reconstruct memories to users.

[0351] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0352] Step 1:

[0353] The user uses an application on their device to input their memories as text data and upload related photos. The input in this step consists of text and image data provided by the user. The device formats this data and prepares it for transmission to the server. Specifically, it checks the encoding of the text file and converts the image data to JPEG or PNG format so that it can be sent to the server in the correct format.

[0354] Step 2:

[0355] The terminal sends the data entered by the user to the server. The input consists of text and image data formatted in step 1. The server receives this data and verifies its integrity. It also manages the data by storing it in a database with a user-specific ID. The output is data stored in a parseable format.

[0356] Step 3:

[0357] The server analyzes the received text data using natural language processing techniques. The input is text data stored in a database. Specifically, libraries such as SpaCy and NLTK are used to extract keywords and phrases that indicate emotion. Through this analysis, emotion categories and important events within the text are identified, and an emotion profile is generated based on these. The output is the emotion profile.

[0358] Step 4:

[0359] The server visually analyzes image data. The input is image data from a database. Using OpenCV and TensorFlow, it analyzes facial expressions of people and scene features within the images, identifying visual emotional elements. During this process, it recognizes emotional changes and location from frame to frame. The output is the result of the visual emotion analysis.

[0360] Step 5:

[0361] The server integrates sentiment information extracted from text and images. The input is the output of steps 3 and 4. This is the process of combining each dataset to generate a storyboard that reflects the user's intended emotions. This storyboard includes the flow of emotions and key points of the narrative. The output is the generated storyboard.

[0362] Step 6:

[0363] The generative AI model automatically generates comics based on storyboards. The input is a completed storyboard. Specifically, it utilizes PyTorch and Keras to depict characters and backgrounds in detail for each scene. The AI ​​makes contextually appropriate creative decisions to improve the quality of the illustrations. The output is a visually rich, emotionally reproducible comic-style content.

[0364] Step 7:

[0365] The user downloads the final comic and checks its contents. The input is the completed comic file generated on the server. The user can save this to their device and enjoy the visualized memories. The output is the digital comic saved on the user's device.

[0366] (Application Example 2)

[0367] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0368] Traditional technologies have had limitations in expressing users' memories and experiences in an emotionally rich and visually stimulating way. In particular, visualizing everyday episodes with genuine emotion is difficult, and the need for advanced skills on the part of the user to express them is a problem. Furthermore, the lack of mechanisms for easily sharing such content has prevented many users from creatively recording and sharing their experiences.

[0369] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0370] In this invention, the server includes means for receiving text data and still image data provided by the user, means for analyzing the text data to extract important information, and means for analyzing the still image data to identify visual elements. This makes it possible for users to generate comics that visualize their memories in an emotionally rich and faithful way, and to easily view and distribute them through information terminals.

[0371] A "user" is an entity that uses the system to input memories and generate emotionally rich, visualized content.

[0372] "Text data" refers to information in text format that users provide to the system, such as memories and experiences expressed in written form.

[0373] "Still image data" refers to image-format information provided by the user to the system, including visual content associated with text data.

[0374] "Visual elements" refer to features and emotions identified from still image data, and are elements that influence the composition and expression of visual content.

[0375] A "storyboard" is a blueprint for a visual story, generated based on important information and visual elements, and provides a framework for the comic.

[0376] "Manga" is a type of graphical content automatically generated based on storyboards, which expresses the user's memories and experiences in a rich and emotional way.

[0377] An "information terminal" refers to a device used by users to view and distribute generated content, and is the hardware on which applications run.

[0378] This invention is a system for users to visualize their memories in an emotionally rich way. The system is primarily implemented via a user terminal and a server.

[0379] The user first inputs text and still image data using an information terminal. This data is then transmitted to the server via a communication interface such as Bluetooth or Wi-Fi.

[0380] The server uses a natural language processing library (e.g., Spacy) to analyze the received text data. Based on the analysis results, it extracts emotions and important information contained in the text. Meanwhile, for still image data, an image analysis module (e.g., OpenCV) is used to extract visual elements. At this time, facial expressions and the atmosphere of the scene in the image are identified.

[0381] Next, the server generates a storyboard based on the extracted information. This storyboard serves as a framework for automatically generating comics using a generation AI model (for example, the DALL-E model).

[0382] The generated comics can be viewed by users on their information terminals and shared with others. Furthermore, emotion recognition technology enables rich emotional expression that faithfully reflects the user's experience.

[0383] For example, if a user enters memories of a holiday with their family, a comic strip will be generated that includes scenes of enjoyable conversations and beautiful scenery. An example of a prompt message would be, "I want to turn my fun family holiday memories into a comic strip. I want you to capture the moments of smiles and the beautiful scenery."

[0384] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0385] Step 1:

[0386] The user uses an information terminal to input text and still image data. The input data is temporarily stored on the terminal and waits until it is ready to be sent.

[0387] Step 2:

[0388] The terminal transmits the entered text data and still image data to the server via a communication interface. The server then receives the data and stores it in data storage for initial processing.

[0389] Step 3:

[0390] The server uses a natural language processing library to analyze received text data and extract important information. The input is text data, and the output generates sentiment tags and important keywords. The analysis process calculates the sentiment value of each word and phrase to grasp the overall tone of the text.

[0391] Step 4:

[0392] The server uses an image analysis module to analyze still image data and identify visual elements. The input is image data, and the output includes emotional features and object information within the image. The analysis process visually evaluates facial expressions and the atmosphere of a scene, and converts them into visualized data.

[0393] Step 5:

[0394] The server generates storyboards based on the extracted key information and visual elements. In this step, it connects the emotions in the text with the visual elements of the images to create the scene composition. The output is storyboard data showing the composition of each page of the manga.

[0395] Step 6:

[0396] The server automatically generates comics based on storyboards using a generation AI model. Storyboard data is provided as input, and the completed comic images are generated as output. The AI ​​model automatically designs the visual content to reflect the expressed emotions.

[0397] Step 7:

[0398] Finally, the server sends the generated comic to the information terminal, making it available for users to view and share. This allows users to relive their memories in an emotionally rich, visualized form and share them with others.

[0399] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0400] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0401] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0402] [Third Embodiment]

[0403] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0404] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0405] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0406] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0407] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0408] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0409] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0410] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0411] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0412] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0413] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0414] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0415] The system according to the present invention consists of an application executed via the user's terminal and a server responsible for data processing and analysis. The purpose of this system is to enable users to easily save their memories in comic book format.

[0416] Users input and upload text and image data related to their memories through an interface on their device. The text data includes specific events and emotions, such as "memories of a family trip." The data specified by the user is sent to the server via the internet.

[0417] The server applies natural language processing algorithms to the received text data to extract key events, characters, and emotional expressions from the story. This provides crucial information for narrating the user's memories. Meanwhile, image data is analyzed using computer vision technology. This identifies visual elements within the image and extracts features of people, scenes, and emotions.

[0418] Based on these analysis results, the server generates a storyboard for the manga. This storyboard plans how each scene will be represented as a manga panel. Once the storyboard is established, the manga is automatically created using a generation AI. During this process, an art style reflecting the analyzed visual elements and extracted emotions is applied, resulting in a manga that faithfully recreates the user's memories.

[0419] As a concrete example, when a user inputs memories of a mountain hike with friends into the system, the server extracts the hiking route and anecdotes about the friends from the text, and analyzes the scenery and people in group photos from the pictures. As a result, the user can receive a comic strip focusing on key points such as "a group photo at the summit" and "interesting conversations along the way." This comic strip is provided in digital format, and the user can download and share it from their device.

[0420] As described above, the present invention facilitates the visual preservation of memories and realizes a form that is easy for general users without technical expertise to use.

[0421] The following describes the processing flow.

[0422] Step 1:

[0423] The user uses their device to input memorable episodes as text data and selects and uploads related image data. The user then clicks a button to send this data, preparing it to be sent to the server.

[0424] Step 2:

[0425] The terminal packages the text data entered by the user and the image data uploaded, and sends it to the server via the internet. The data is encrypted and transmitted to ensure security.

[0426] Step 3:

[0427] The server decodes the received data and stores it appropriately. It verifies that the data format and size are correct and returns an error message to the user if there are any problems. If successful, it starts the analysis process.

[0428] Step 4:

[0429] The server uses natural language processing algorithms to analyze text data. While understanding the context, it extracts important information such as key events, relevant characters, and emotional expressions. This information is then used to generate the story.

[0430] Step 5:

[0431] The server applies computer vision technology to analyze image data. It recognizes features of people, objects, and backgrounds within the image, and identifies visual elements. The obtained information is integrated with text data to assist in the generation of storyboards.

[0432] Step 6:

[0433] The server generates a storyboard based on the analysis results. It plans how to illustrate each scene in comic form, taking into account the flow of the episode and its climax. It creates a structure that effectively conveys the user's memories, considering panel layouts and visual effects.

[0434] Step 7:

[0435] The server initiates the process of drawing the comic using a generative AI, following the storyboard. Visual elements derived from images and the text storyline are integrated, and an art style is applied. This results in a comic that faithfully reflects the memories.

[0436] Step 8:

[0437] The server converts the completed manga into a file format and prepares it for distribution to the user. It makes it available for download via a link or file format and sends a notification to the user when the generation is complete.

[0438] Step 9:

[0439] Users receive notifications and download or view comics generated on their devices. This allows users to enjoy content that effectively visualizes their memories.

[0440] (Example 1)

[0441] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0442] There is a need for a way for users to visually preserve their memories and experiences and easily share them with others. However, there is a lack of means for ordinary users to generate professional-quality visual works without requiring advanced technology or a great deal of time. Furthermore, extracting the flow of a story and emotions from digital data containing a wealth of information and accurately reproducing them as a visual work is a difficult challenge.

[0443] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0444] In this invention, the server includes means for receiving digital data, means for analyzing the digital data and extracting key information, and means for generating a narrative structure. This makes it possible for users to easily create and share works that visually enrich their memories, even without technical knowledge.

[0445] "Digital data" refers to electronically stored information that a user provides to a system, and includes data in various formats, such as text and images.

[0446] "Key information" refers to information extracted from digital data that is central to the story, such as events, characters, and emotional expressions.

[0447] "Narrative structure" refers to the framework of the flow and development of a story, designed based on key information.

[0448] "Visual works" are visual content in the form of comics or illustrations that express the user's memories, automatically generated based on a narrative structure.

[0449] A "server" refers to a computing device that receives digital data and performs analysis and generation processes.

[0450] This invention is a system that generates visual works based on information provided by the user. The user inputs their memories as digital data using a terminal. Specifically, this digital data consists of text and images, and could include text data in the form of "memories of a family trip" or image data taken during the trip.

[0451] The user's device transmits the input digital data to the server via the internet. The server uses natural language processing (NLP) algorithms on the text data. Specifically, it uses NLP libraries (e.g., spaCy and BERT) to extract key events, characters, and emotional expressions from the text. The server also analyzes image data using computer vision techniques, identifying visual elements within the image using libraries such as OpenCV and TensorFlow.

[0452] The server generates a narrative structure based on the key information obtained from these analyses, and then generates a comic as a visual work. In this process, a generative AI model is used, which takes prompt sentences as input and applies art styles to automatically generate the work. For example, prompt sentences such as "highlight scenes from a family trip" or "memories of hiking in the mountains with friends" are used. As a result, users can easily obtain visually rich works of art without any technical knowledge.

[0453] The completed visual works are provided to users in digital format, which can be downloaded to their devices and easily shared with others. This system aims to improve the user experience by automating everything from receiving and analyzing digital data to generating visual works.

[0454] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0455] Step 1:

[0456] The user inputs digital data using a device. Specifically, the user inputs text data about their memories (e.g., "Memories of a family trip") and uploads corresponding image data using the file selection function. This digital data becomes the input data sent to the next step.

[0457] Step 2:

[0458] The user's device transmits the entered digital data to the server. Here, the data is encrypted over the internet and securely uploaded to the server. The device's role is to properly prepare the data and ensure its reliable transfer to the server.

[0459] Step 3:

[0460] The server applies a natural language processing algorithm to the received text data. In this process, the server uses an NLP library (e.g., spaCy or BERT) to analyze the text data and extract key information that forms the core of the story. The output of this process is information about the events, characters, and emotional expressions of the story.

[0461] Step 4:

[0462] The server analyzes image data using computer vision technology. During this process, the server utilizes libraries such as OpenCV and TensorFlow to identify visual elements within the image and extract features such as people and scenes. The output obtained from this process represents the primary visual information expressed in the image.

[0463] Step 5:

[0464] The server combines information extracted from text and images to generate a narrative structure. Based on the key information obtained, the server designs the narrative flow and scene layout of the visual work, and this plan is output.

[0465] Step 6:

[0466] The server automatically generates visual works using a generative AI model. In this process, the AI ​​applies an art style based on a prompt (e.g., "highlight scenes from a family trip") to generate a visual work that reflects the user's memories. The output of this process is a completed visual work in digital format.

[0467] Step 7:

[0468] The server sends the generated visual artwork to the user's device. The user can then view, download, and share the visual artwork on their device. This final output visually preserves the user's memories.

[0469] (Application Example 1)

[0470] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0471] In modern society, many individuals want to preserve and share their daily memories, but there is a lack of visual means to do so. Furthermore, there is a need for a method that is easy for even tech-savvy general users to use and can automatically visualize memories. In this context, the challenge is to provide a way for individual users to more intuitively record their daily lives and visually enjoy their personal memories.

[0472] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0473] In this invention, the server includes a device for receiving text and image information provided by the user, a device for analyzing the text information and extracting important information, and a device for analyzing the image information and identifying visual elements. This makes it possible for users without specialized skills to easily record memories of their daily lives using a device in their home and to automatically generate comics based on that data.

[0474] A "user" refers to an individual who uses the system to record and visually preserve memories.

[0475] "Textual information" refers to text or data in text format provided by the user.

[0476] "Image information" refers to data in the form of photographs or graphics provided by the user.

[0477] "Device" refers to a configuration of hardware and software used to perform processes such as receiving, analyzing, and generating information.

[0478] "Important information" refers to the core data and concepts of a story, extracted by analyzing textual information.

[0479] "Visual elements" refer to the constituent elements and features within an image that are identified through the analysis of image information.

[0480] "Scene composition" refers to the structure of each scene in a manga, planned based on important information and visual elements.

[0481] "Manga" refers to a type of artwork that uses automatically generated images to visually represent a user's memories and develop them into a story.

[0482] A "household automated device" refers to an autonomously operating device that records a user's daily life and provides services tailored to the user's needs based on the recorded data.

[0483] One embodiment of this invention provides a novel method for users to save memories as comics. First, the user uses a data input terminal to input and transmit textual and image information related to the memories. This data is then sent to a server in the cloud via the internet.

[0484] The server uses a natural language processing program to analyze textual information. This program can utilize natural language processing libraries such as Hugging Face Transformers to extract important storylines, characters, and emotional expressions from the provided textual information. In parallel, image information is analyzed using computer vision technologies such as OpenCV to identify visual elements. This analysis makes it possible to extract features such as people, scenes, and backgrounds.

[0485] Based on the extracted textual information and visual elements, the server generates a scene structure and plans how to represent the comic in each scene. A generative AI model uses this information to automatically create the comic, resulting in a visually engaging and emotionally resonant story that faithfully reflects the user's memories. Finally, the generated comic is provided to the user in digital format, which they can download from their device or share.

[0486] As a concrete example, a user might input a request into the device such as, "I'd like you to turn my memories of last week's family mountain climbing trip into a comic strip." An example of a prompt to the generation AI model in this case might be, "Please create a touching and fun story based on the user's memories of mountain climbing." This allows the user to visually revisit their memories and preserve them in an emotionally rich way.

[0487] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0488] Step 1:

[0489] The terminal accepts text and image information from the user as input. This allows the user to select and send text and images related to specific memories. The output from the terminal is the text and image files selected by the user.

[0490] Step 2:

[0491] The server performs natural language processing on text information received from the terminal. This processing uses the Hugging Face Transformers' NER (Named Entity Recognition) model to extract important storylines and emotional expressions from the text. The output is a data structure containing this important information.

[0492] Step 3:

[0493] The server performs computer vision analysis using the received image information as input. Using the OpenCV library, it identifies visual features such as people, backgrounds, and scenes within the image. The analysis results are output as visual elements of the image information.

[0494] Step 4:

[0495] The server uses the key information extracted in step 2 and the visual elements from step 3 as input to generate a scene composition. This is a plan for turning the user's memories into a story and determines how to materialize each scene of the comic. The output is a storyboard that describes the scenes in detail.

[0496] Step 5:

[0497] The server uses the generated storyboard as input to run an AI model that automatically generates the final comic. The AI ​​model used here reflects the specified art style and the user's emotions. The output is a comic in digital format.

[0498] Step 6:

[0499] Users download the manga generated by the server to their devices and then view or share it. This final output is provided as an electronic data file, allowing users to visually enjoy their memories at any time.

[0500] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0501] The system according to the present invention aims to visualize user-inputted memories in a comic book format and generate content that is more faithful to the user's emotions by using an emotion engine. The system consists of an application running on the user's terminal and a server.

[0502] Users input and upload text-based memory stories and related photos through their devices. Once user input is complete, the device sends this data to the server. At this point, the emotion engine receives the text and image data and begins analyzing them.

[0503] The emotion engine on the server analyzes the user's text using natural language processing to extract emotional elements. Simultaneously, it performs visual emotion analysis on image data to identify emotional elements in scenes and facial expressions. This allows the system to clearly understand the emotions the user intended.

[0504] The server uses the extracted emotional information to generate storyboards. In the storyboards, scenes are selected and structured according to the emotional elements, and the flow of the episode is expressed with rich emotion. Once the storyboards are complete, the generating AI draws the comic based on them. The results of the emotion engine are reflected, so the characters' expressions and the atmosphere of the scenes match the user's experience.

[0505] For example, if a user enters memories of a fun trip with friends, the emotion engine extracts emotions such as "fun" and "friendship" from the text and images. The server creates an emotion-infused storyboard and generates a comic that vividly portrays the highlights of the trip and intimate moments with friends. Finally, the user can download the comic and enjoy a work that vividly visualizes their memories.

[0506] Thus, the system of the present invention, through emotion recognition, realizes a more faithful and emotionally resonant comic adaptation of memories, providing users with a richer experience.

[0507] The following describes the processing flow.

[0508] Step 1:

[0509] Users use their devices to input text and image data related to their memories and prepare to send them. First, they enter a story about their memory in text, and then they select and upload related photos.

[0510] Step 2:

[0511] The terminal bundles the data entered by the user and sends it to the server via the internet. During transmission, the data's integrity is checked, and encryption is applied to ensure security.

[0512] Step 3:

[0513] The server first passes the received data to the emotion engine to begin analysis. The text data is analyzed using natural language processing techniques to identify emotional keywords and phrases within the text and determine the associated emotions.

[0514] Step 4:

[0515] The server uses an emotion engine to analyze image data, identifying facial expressions and the emotions of the scene from the images. Based on visual features, it determines what kind of emotion is being expressed.

[0516] Step 5:

[0517] The server generates a storyboard based on emotional information extracted by the emotion engine. It selects and arranges scenes that make up the episode's narrative, designing how they appear emotionally rich.

[0518] Step 6:

[0519] The server uses a generation AI based on the storyboard to automatically generate the comic. In this process, visual elements such as facial expressions and backgrounds are reflected to faithfully reproduce the emotions extracted from the characters and scenes.

[0520] Step 7:

[0521] The server creates the completed manga as a digital file and prepares it for user access. It also notifies the user that the manga is complete.

[0522] Step 8:

[0523] Users receive notifications and download or view the generated comics via their devices. At this time, users can enjoy their own memories as emotionally richly expressed comic works.

[0524] (Example 2)

[0525] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0526] Traditional content generation systems have struggled to emotionally recreate user experiences, often resulting in visualized content that fails to meet user expectations. Such systems merely visualize data without faithfully reflecting emotional nuances or complex feelings. Consequently, users often lack content that accurately visualizes their memories, making it difficult to enjoy a comfortable experience.

[0527] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0528] In this invention, the server includes means for receiving a data format for a user to provide information, means for performing natural language manipulation to process the data format and obtaining information including emotions, and means for performing analysis on visual information and identifying special elements associated with emotions. This enables the automatic generation of emotionally rich content that is faithful to the user's experience.

[0529] "Data formats for users to provide information" refers to the format of digital data, such as text and images, that users use to record their own experiences and memories.

[0530] "Natural language processing" refers to the technology that enables computers to understand and process human language, particularly the process of extracting information from text and analyzing its meaning.

[0531] "Information containing emotions" refers to information that indicates elements related to emotions such as joy and sadness, extracted from data provided by the user.

[0532] "Visual information" refers to digital data that is presented visually, such as images and videos provided by the user.

[0533] "Special elements related to emotions" are elements extracted from visual information that represent emotional characteristics related to the scene or facial expression.

[0534] "Methods for generating scenarios" refer to techniques for constructing the flow and arrangement of a story based on extracted emotional information and special elements.

[0535] "Methods for automatically generating illustrations" refer to technologies that automatically create visual content such as images and comics based on a generated scenario.

[0536] The system according to this invention is a platform for visualizing the user's experience in an emotionally rich way using the user's digital data. The system mainly consists of an application that runs on the user's terminal and a server located remotely.

[0537] Users launch a web browser or dedicated application on their device, enter a memorable episode as text, and select related photos. JPEG images and plain text files are commonly used as data formats for collecting this information. Once the user has finished entering the information, the device sends this data to the server via the internet.

[0538] The server utilizes natural language processing (NLP) techniques to analyze the received text data. Specifically, open-source NLP libraries such as SpaCy and NLTK are used to extract information about emotions and storylines from the text. Simultaneously, the server uses OpenCV and TensorFlow to perform visual analysis of image data, identifying visual features and emotional elements.

[0539] By integrating these two analysis results, the system generates a storyboard that reflects the emotions intended by the user. Based on this storyboard, a generating AI model automatically draws the comic. The generation process utilizes existing AI frameworks, such as PyTorch and Keras, to maximize the quality of the illustrations and the accuracy of emotional expression.

[0540] As a concrete example, if a user wants to visualize "memories of a summer trip with friends," they might enter a prompt like this: "I want to turn my memories of a summer beach trip with friends into a comic strip. The photo is a group shot taken on the beach. Please visualize this fun moment, emphasizing friendship and enjoyment." Based on this prompt, the system can provide emotionally rich visual content that meets the user's expectations.

[0541] Ultimately, users can download the generated comics in digital format and visually enjoy the memories and experiences recreated through the system. This makes it possible to deliver works that vividly and emotionally reconstruct memories to users.

[0542] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0543] Step 1:

[0544] The user uses an application on their device to input their memories as text data and upload related photos. The input in this step consists of text and image data provided by the user. The device formats this data and prepares it for transmission to the server. Specifically, it checks the encoding of the text file and converts the image data to JPEG or PNG format so that it can be sent to the server in the correct format.

[0545] Step 2:

[0546] The terminal sends the data entered by the user to the server. The input consists of text and image data formatted in step 1. The server receives this data and verifies its integrity. It also manages the data by storing it in a database with a user-specific ID. The output is data stored in a parseable format.

[0547] Step 3:

[0548] The server analyzes the received text data using natural language processing techniques. The input is text data stored in a database. Specifically, libraries such as SpaCy and NLTK are used to extract keywords and phrases that indicate emotion. Through this analysis, emotion categories and important events within the text are identified, and an emotion profile is generated based on these. The output is the emotion profile.

[0549] Step 4:

[0550] The server visually analyzes image data. The input is image data from a database. Using OpenCV and TensorFlow, it analyzes facial expressions of people and scene features within the images, identifying visual emotional elements. During this process, it recognizes emotional changes and location from frame to frame. The output is the result of the visual emotion analysis.

[0551] Step 5:

[0552] The server integrates sentiment information extracted from text and images. The input is the output of steps 3 and 4. This is the process of combining each dataset to generate a storyboard that reflects the user's intended emotions. This storyboard includes the flow of emotions and key points of the narrative. The output is the generated storyboard.

[0553] Step 6:

[0554] The generative AI model automatically generates comics based on storyboards. The input is a completed storyboard. Specifically, it utilizes PyTorch and Keras to depict characters and backgrounds in detail for each scene. The AI ​​makes contextually appropriate creative decisions to improve the quality of the illustrations. The output is a visually rich, emotionally reproducible comic-style content.

[0555] Step 7:

[0556] The user downloads the final comic and checks its contents. The input is the completed comic file generated on the server. The user can save this to their device and enjoy the visualized memories. The output is the digital comic saved on the user's device.

[0557] (Application Example 2)

[0558] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0559] Traditional technologies have had limitations in expressing users' memories and experiences in an emotionally rich and visually stimulating way. In particular, visualizing everyday episodes with genuine emotion is difficult, and the need for advanced skills on the part of the user to express them is a problem. Furthermore, the lack of mechanisms for easily sharing such content has prevented many users from creatively recording and sharing their experiences.

[0560] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0561] In this invention, the server includes means for receiving text data and still image data provided by the user, means for analyzing the text data to extract important information, and means for analyzing the still image data to identify visual elements. This makes it possible for users to generate comics that visualize their memories in an emotionally rich and faithful way, and to easily view and distribute them through information terminals.

[0562] A "user" is an entity that uses the system to input memories and generate emotionally rich, visualized content.

[0563] "Text data" refers to information in text format that users provide to the system, such as memories and experiences expressed in written form.

[0564] "Still image data" refers to image-format information provided by the user to the system, including visual content associated with text data.

[0565] "Visual elements" refer to features and emotions identified from still image data, and are elements that influence the composition and expression of visual content.

[0566] A "storyboard" is a blueprint for a visual story, generated based on important information and visual elements, and provides a framework for the comic.

[0567] "Manga" is a type of graphical content automatically generated based on storyboards, which expresses the user's memories and experiences in a rich and emotional way.

[0568] An "information terminal" refers to a device used by users to view and distribute generated content, and is the hardware on which applications run.

[0569] This invention is a system for users to visualize their memories in an emotionally rich way. The system is primarily implemented via a user terminal and a server.

[0570] The user first inputs text and still image data using an information terminal. This data is then transmitted to the server via a communication interface such as Bluetooth or Wi-Fi.

[0571] The server uses a natural language processing library (e.g., Spacy) to analyze the received text data. Based on the analysis results, it extracts emotions and important information contained in the text. Meanwhile, for still image data, an image analysis module (e.g., OpenCV) is used to extract visual elements. At this time, facial expressions and the atmosphere of the scene in the image are identified.

[0572] Next, the server generates a storyboard based on the extracted information. This storyboard serves as a framework for automatically generating comics using a generation AI model (for example, the DALL-E model).

[0573] The generated comics can be viewed by users on their information terminals and shared with others. Furthermore, emotion recognition technology enables rich emotional expression that faithfully reflects the user's experience.

[0574] For example, if a user enters memories of a holiday with their family, a comic strip will be generated that includes scenes of enjoyable conversations and beautiful scenery. An example of a prompt message would be, "I want to turn my fun family holiday memories into a comic strip. I want you to capture the moments of smiles and the beautiful scenery."

[0575] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0576] Step 1:

[0577] The user uses an information terminal to input text and still image data. The input data is temporarily stored on the terminal and waits until it is ready to be sent.

[0578] Step 2:

[0579] The terminal transmits the entered text data and still image data to the server via a communication interface. The server then receives the data and stores it in data storage for initial processing.

[0580] Step 3:

[0581] The server uses a natural language processing library to analyze received text data and extract important information. The input is text data, and the output generates sentiment tags and important keywords. The analysis process calculates the sentiment value of each word and phrase to grasp the overall tone of the text.

[0582] Step 4:

[0583] The server uses an image analysis module to analyze still image data and identify visual elements. The input is image data, and the output includes emotional features and object information within the image. The analysis process visually evaluates facial expressions and the atmosphere of a scene, and converts them into visualized data.

[0584] Step 5:

[0585] The server generates storyboards based on the extracted key information and visual elements. In this step, it connects the emotions in the text with the visual elements of the images to create the scene composition. The output is storyboard data showing the composition of each page of the manga.

[0586] Step 6:

[0587] The server automatically generates comics based on storyboards using a generation AI model. Storyboard data is provided as input, and the completed comic images are generated as output. The AI ​​model automatically designs the visual content to reflect the expressed emotions.

[0588] Step 7:

[0589] Finally, the server sends the generated comic to the information terminal, making it available for users to view and share. This allows users to relive their memories in an emotionally rich, visualized form and share them with others.

[0590] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0591] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0592] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0593] [Fourth Embodiment]

[0594] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0595] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0596] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0597] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0598] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0599] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0600] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0601] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0602] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0603] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0604] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0605] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0606] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0607] The system according to the present invention consists of an application executed via the user's terminal and a server responsible for data processing and analysis. The purpose of this system is to enable users to easily save their memories in comic book format.

[0608] Users input and upload text and image data related to their memories through an interface on their device. The text data includes specific events and emotions, such as "memories of a family trip." The data specified by the user is sent to the server via the internet.

[0609] The server applies natural language processing algorithms to the received text data to extract key events, characters, and emotional expressions from the story. This provides crucial information for narrating the user's memories. Meanwhile, image data is analyzed using computer vision technology. This identifies visual elements within the image and extracts features of people, scenes, and emotions.

[0610] Based on these analysis results, the server generates a storyboard for the manga. This storyboard plans how each scene will be represented as a manga panel. Once the storyboard is established, the manga is automatically created using a generation AI. During this process, an art style reflecting the analyzed visual elements and extracted emotions is applied, resulting in a manga that faithfully recreates the user's memories.

[0611] As a concrete example, when a user inputs memories of a mountain hike with friends into the system, the server extracts the hiking route and anecdotes about the friends from the text, and analyzes the scenery and people in group photos from the pictures. As a result, the user can receive a comic strip focusing on key points such as "a group photo at the summit" and "interesting conversations along the way." This comic strip is provided in digital format, and the user can download and share it from their device.

[0612] As described above, the present invention facilitates the visual preservation of memories and realizes a form that is easy for general users without technical expertise to use.

[0613] The following describes the processing flow.

[0614] Step 1:

[0615] The user uses their device to input memorable episodes as text data and selects and uploads related image data. The user then clicks a button to send this data, preparing it to be sent to the server.

[0616] Step 2:

[0617] The terminal packages the text data entered by the user and the image data uploaded, and sends it to the server via the internet. The data is encrypted and transmitted to ensure security.

[0618] Step 3:

[0619] The server decodes the received data and stores it appropriately. It verifies that the data format and size are correct and returns an error message to the user if there are any problems. If successful, it starts the analysis process.

[0620] Step 4:

[0621] The server uses natural language processing algorithms to analyze text data. While understanding the context, it extracts important information such as key events, relevant characters, and emotional expressions. This information is then used to generate the story.

[0622] Step 5:

[0623] The server applies computer vision technology to analyze image data. It recognizes features of people, objects, and backgrounds within the image, and identifies visual elements. The obtained information is integrated with text data to assist in the generation of storyboards.

[0624] Step 6:

[0625] The server generates a storyboard based on the analysis results. It plans how to illustrate each scene in comic form, taking into account the flow of the episode and its climax. It creates a structure that effectively conveys the user's memories, considering panel layouts and visual effects.

[0626] Step 7:

[0627] The server initiates the process of drawing the comic using a generative AI, following the storyboard. Visual elements derived from images and the text storyline are integrated, and an art style is applied. This results in a comic that faithfully reflects the memories.

[0628] Step 8:

[0629] The server converts the completed manga into a file format and prepares it for distribution to the user. It makes it available for download via a link or file format and sends a notification to the user when the generation is complete.

[0630] Step 9:

[0631] Users receive notifications and download or view comics generated on their devices. This allows users to enjoy content that effectively visualizes their memories.

[0632] (Example 1)

[0633] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0634] There is a need for a way for users to visually preserve their memories and experiences and easily share them with others. However, there is a lack of means for ordinary users to generate professional-quality visual works without requiring advanced technology or a great deal of time. Furthermore, extracting the flow of a story and emotions from digital data containing a wealth of information and accurately reproducing them as a visual work is a difficult challenge.

[0635] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0636] In this invention, the server includes means for receiving digital data, means for analyzing the digital data and extracting key information, and means for generating a narrative structure. This makes it possible for users to easily create and share works that visually enrich their memories, even without technical knowledge.

[0637] "Digital data" refers to electronically stored information that a user provides to a system, and includes data in various formats, such as text and images.

[0638] "Key information" refers to information extracted from digital data that is central to the story, such as events, characters, and emotional expressions.

[0639] "Narrative structure" refers to the framework of the flow and development of a story, designed based on key information.

[0640] "Visual works" are visual content in the form of comics or illustrations that express the user's memories, automatically generated based on a narrative structure.

[0641] A "server" refers to a computing device that receives digital data and performs analysis and generation processes.

[0642] This invention is a system that generates visual works based on information provided by the user. The user inputs their memories as digital data using a terminal. Specifically, this digital data consists of text and images, and could include text data in the form of "memories of a family trip" or image data taken during the trip.

[0643] The user's device transmits the input digital data to the server via the internet. The server uses natural language processing (NLP) algorithms on the text data. Specifically, it uses NLP libraries (e.g., spaCy and BERT) to extract key events, characters, and emotional expressions from the text. The server also analyzes image data using computer vision techniques, identifying visual elements within the image using libraries such as OpenCV and TensorFlow.

[0644] The server generates a narrative structure based on the key information obtained from these analyses, and then generates a comic as a visual work. In this process, a generative AI model is used, which takes prompt sentences as input and applies art styles to automatically generate the work. For example, prompt sentences such as "highlight scenes from a family trip" or "memories of hiking in the mountains with friends" are used. As a result, users can easily obtain visually rich works of art without any technical knowledge.

[0645] The completed visual works are provided to users in digital format, which can be downloaded to their devices and easily shared with others. This system aims to improve the user experience by automating everything from receiving and analyzing digital data to generating visual works.

[0646] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0647] Step 1:

[0648] The user inputs digital data using a device. Specifically, the user inputs text data about their memories (e.g., "Memories of a family trip") and uploads corresponding image data using the file selection function. This digital data becomes the input data sent to the next step.

[0649] Step 2:

[0650] The user's device transmits the entered digital data to the server. Here, the data is encrypted over the internet and securely uploaded to the server. The device's role is to properly prepare the data and ensure its reliable transfer to the server.

[0651] Step 3:

[0652] The server applies a natural language processing algorithm to the received text data. In this process, the server uses an NLP library (e.g., spaCy or BERT) to analyze the text data and extract key information that forms the core of the story. The output of this process is information about the events, characters, and emotional expressions of the story.

[0653] Step 4:

[0654] The server analyzes image data using computer vision technology. During this process, the server utilizes libraries such as OpenCV and TensorFlow to identify visual elements within the image and extract features such as people and scenes. The output obtained from this process represents the primary visual information expressed in the image.

[0655] Step 5:

[0656] The server combines information extracted from text and images to generate a narrative structure. Based on the key information obtained, the server designs the narrative flow and scene layout of the visual work, and this plan is output.

[0657] Step 6:

[0658] The server automatically generates visual works using a generative AI model. In this process, the AI ​​applies an art style based on a prompt (e.g., "highlight scenes from a family trip") to generate a visual work that reflects the user's memories. The output of this process is a completed visual work in digital format.

[0659] Step 7:

[0660] The server sends the generated visual artwork to the user's device. The user can then view, download, and share the visual artwork on their device. This final output visually preserves the user's memories.

[0661] (Application Example 1)

[0662] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0663] In modern society, many individuals want to preserve and share their daily memories, but there is a lack of visual means to do so. Furthermore, there is a need for a method that is easy for even tech-savvy general users to use and can automatically visualize memories. In this context, the challenge is to provide a way for individual users to more intuitively record their daily lives and visually enjoy their personal memories.

[0664] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0665] In this invention, the server includes a device for receiving text and image information provided by the user, a device for analyzing the text information and extracting important information, and a device for analyzing the image information and identifying visual elements. This makes it possible for users without specialized skills to easily record memories of their daily lives using a device in their home and to automatically generate comics based on that data.

[0666] A "user" refers to an individual who uses the system to record and visually preserve memories.

[0667] "Textual information" refers to text or data in text format provided by the user.

[0668] "Image information" refers to data in the form of photographs or graphics provided by the user.

[0669] "Device" refers to a configuration of hardware and software used to perform processes such as receiving, analyzing, and generating information.

[0670] "Important information" refers to the core data and concepts of a story, extracted by analyzing textual information.

[0671] "Visual elements" refer to the constituent elements and features within an image that are identified through the analysis of image information.

[0672] "Scene composition" refers to the structure of each scene in a manga, planned based on important information and visual elements.

[0673] "Manga" refers to a type of artwork that uses automatically generated images to visually represent a user's memories and develop them into a story.

[0674] A "household automated device" refers to an autonomously operating device that records a user's daily life and provides services tailored to the user's needs based on the recorded data.

[0675] One embodiment of this invention provides a novel method for users to save memories as comics. First, the user uses a data input terminal to input and transmit textual and image information related to the memories. This data is then sent to a server in the cloud via the internet.

[0676] The server uses a natural language processing program to analyze textual information. This program can utilize natural language processing libraries such as Hugging Face Transformers to extract important storylines, characters, and emotional expressions from the provided textual information. In parallel, image information is analyzed using computer vision technologies such as OpenCV to identify visual elements. This analysis makes it possible to extract features such as people, scenes, and backgrounds.

[0677] Based on the extracted textual information and visual elements, the server generates a scene structure and plans how to represent the comic in each scene. A generative AI model uses this information to automatically create the comic, resulting in a visually engaging and emotionally resonant story that faithfully reflects the user's memories. Finally, the generated comic is provided to the user in digital format, which they can download from their device or share.

[0678] As a concrete example, a user might input a request into the device such as, "I'd like you to turn my memories of last week's family mountain climbing trip into a comic strip." An example of a prompt to the generation AI model in this case might be, "Please create a touching and fun story based on the user's memories of mountain climbing." This allows the user to visually revisit their memories and preserve them in an emotionally rich way.

[0679] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0680] Step 1:

[0681] The terminal accepts text and image information from the user as input. This allows the user to select and send text and images related to specific memories. The output from the terminal is the text and image files selected by the user.

[0682] Step 2:

[0683] The server performs natural language processing on text information received from the terminal. This processing uses the Hugging Face Transformers' NER (Named Entity Recognition) model to extract important storylines and emotional expressions from the text. The output is a data structure containing this important information.

[0684] Step 3:

[0685] The server performs computer vision analysis using the received image information as input. Using the OpenCV library, it identifies visual features such as people, backgrounds, and scenes within the image. The analysis results are output as visual elements of the image information.

[0686] Step 4:

[0687] The server uses the key information extracted in step 2 and the visual elements from step 3 as input to generate a scene composition. This is a plan for turning the user's memories into a story and determines how to materialize each scene of the comic. The output is a storyboard that describes the scenes in detail.

[0688] Step 5:

[0689] The server uses the generated storyboard as input to run an AI model that automatically generates the final comic. The AI ​​model used here reflects the specified art style and the user's emotions. The output is a comic in digital format.

[0690] Step 6:

[0691] Users download the manga generated by the server to their devices and then view or share it. This final output is provided as an electronic data file, allowing users to visually enjoy their memories at any time.

[0692] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0693] The system according to the present invention aims to visualize user-inputted memories in a comic book format and generate content that is more faithful to the user's emotions by using an emotion engine. The system consists of an application running on the user's terminal and a server.

[0694] Users input and upload text-based memory stories and related photos through their devices. Once user input is complete, the device sends this data to the server. At this point, the emotion engine receives the text and image data and begins analyzing them.

[0695] The emotion engine on the server analyzes the user's text using natural language processing to extract emotional elements. Simultaneously, it performs visual emotion analysis on image data to identify emotional elements in scenes and facial expressions. This allows the system to clearly understand the emotions the user intended.

[0696] The server uses the extracted emotional information to generate storyboards. In the storyboards, scenes are selected and structured according to the emotional elements, and the flow of the episode is expressed with rich emotion. Once the storyboards are complete, the generating AI draws the comic based on them. The results of the emotion engine are reflected, so the characters' expressions and the atmosphere of the scenes match the user's experience.

[0697] For example, if a user enters memories of a fun trip with friends, the emotion engine extracts emotions such as "fun" and "friendship" from the text and images. The server creates an emotion-infused storyboard and generates a comic that vividly portrays the highlights of the trip and intimate moments with friends. Finally, the user can download the comic and enjoy a work that vividly visualizes their memories.

[0698] Thus, the system of the present invention, through emotion recognition, realizes a more faithful and emotionally resonant comic adaptation of memories, providing users with a richer experience.

[0699] The following describes the processing flow.

[0700] Step 1:

[0701] Users use their devices to input text and image data related to their memories and prepare to send them. First, they enter a story about their memory in text, and then they select and upload related photos.

[0702] Step 2:

[0703] The terminal bundles the data entered by the user and sends it to the server via the internet. During transmission, the data's integrity is checked, and encryption is applied to ensure security.

[0704] Step 3:

[0705] The server first passes the received data to the emotion engine to begin analysis. The text data is analyzed using natural language processing techniques to identify emotional keywords and phrases within the text and determine the associated emotions.

[0706] Step 4:

[0707] The server uses an emotion engine to analyze image data, identifying facial expressions and the emotions of the scene from the images. Based on visual features, it determines what kind of emotion is being expressed.

[0708] Step 5:

[0709] The server generates a storyboard based on emotional information extracted by the emotion engine. It selects and arranges scenes that make up the episode's narrative, designing how they appear emotionally rich.

[0710] Step 6:

[0711] The server uses a generation AI based on the storyboard to automatically generate the comic. In this process, visual elements such as facial expressions and backgrounds are reflected to faithfully reproduce the emotions extracted from the characters and scenes.

[0712] Step 7:

[0713] The server creates the completed manga as a digital file and prepares it for user access. It also notifies the user that the manga is complete.

[0714] Step 8:

[0715] Users receive notifications and download or view the generated comics via their devices. At this time, users can enjoy their own memories as emotionally richly expressed comic works.

[0716] (Example 2)

[0717] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0718] Traditional content generation systems have struggled to emotionally recreate user experiences, often resulting in visualized content that fails to meet user expectations. Such systems merely visualize data without faithfully reflecting emotional nuances or complex feelings. Consequently, users often lack content that accurately visualizes their memories, making it difficult to enjoy a comfortable experience.

[0719] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0720] In this invention, the server includes means for receiving a data format for a user to provide information, means for performing natural language manipulation to process the data format and obtaining information including emotions, and means for performing analysis on visual information and identifying special elements associated with emotions. This enables the automatic generation of emotionally rich content that is faithful to the user's experience.

[0721] "Data formats for users to provide information" refers to the format of digital data, such as text and images, that users use to record their own experiences and memories.

[0722] "Natural language processing" refers to the technology that enables computers to understand and process human language, particularly the process of extracting information from text and analyzing its meaning.

[0723] "Information containing emotions" refers to information that indicates elements related to emotions such as joy and sadness, extracted from data provided by the user.

[0724] "Visual information" refers to digital data that is presented visually, such as images and videos provided by the user.

[0725] "Special elements related to emotions" are elements extracted from visual information that represent emotional characteristics related to the scene or facial expression.

[0726] "Methods for generating scenarios" refer to techniques for constructing the flow and arrangement of a story based on extracted emotional information and special elements.

[0727] "Methods for automatically generating illustrations" refer to technologies that automatically create visual content such as images and comics based on a generated scenario.

[0728] The system according to this invention is a platform for visualizing the user's experience in an emotionally rich way using the user's digital data. The system mainly consists of an application that runs on the user's terminal and a server located remotely.

[0729] Users launch a web browser or dedicated application on their device, enter a memorable episode as text, and select related photos. JPEG images and plain text files are commonly used as data formats for collecting this information. Once the user has finished entering the information, the device sends this data to the server via the internet.

[0730] The server utilizes natural language processing (NLP) techniques to analyze the received text data. Specifically, open-source NLP libraries such as SpaCy and NLTK are used to extract information about emotions and storylines from the text. Simultaneously, the server uses OpenCV and TensorFlow to perform visual analysis of image data, identifying visual features and emotional elements.

[0731] By integrating these two analysis results, the system generates a storyboard that reflects the emotions intended by the user. Based on this storyboard, a generating AI model automatically draws the comic. The generation process utilizes existing AI frameworks, such as PyTorch and Keras, to maximize the quality of the illustrations and the accuracy of emotional expression.

[0732] As a concrete example, if a user wants to visualize "memories of a summer trip with friends," they might enter a prompt like this: "I want to turn my memories of a summer beach trip with friends into a comic strip. The photo is a group shot taken on the beach. Please visualize this fun moment, emphasizing friendship and enjoyment." Based on this prompt, the system can provide emotionally rich visual content that meets the user's expectations.

[0733] Ultimately, users can download the generated comics in digital format and visually enjoy the memories and experiences recreated through the system. This makes it possible to deliver works that vividly and emotionally reconstruct memories to users.

[0734] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0735] Step 1:

[0736] The user uses an application on their device to input their memories as text data and upload related photos. The input in this step consists of text and image data provided by the user. The device formats this data and prepares it for transmission to the server. Specifically, it checks the encoding of the text file and converts the image data to JPEG or PNG format so that it can be sent to the server in the correct format.

[0737] Step 2:

[0738] The terminal sends the data entered by the user to the server. The input consists of text and image data formatted in step 1. The server receives this data and verifies its integrity. It also manages the data by storing it in a database with a user-specific ID. The output is data stored in a parseable format.

[0739] Step 3:

[0740] The server analyzes the received text data using natural language processing techniques. The input is text data stored in a database. Specifically, libraries such as SpaCy and NLTK are used to extract keywords and phrases that indicate emotion. Through this analysis, emotion categories and important events within the text are identified, and an emotion profile is generated based on these. The output is the emotion profile.

[0741] Step 4:

[0742] The server visually analyzes image data. The input is image data from a database. Using OpenCV and TensorFlow, it analyzes facial expressions of people and scene features within the images, identifying visual emotional elements. During this process, it recognizes emotional changes and location from frame to frame. The output is the result of the visual emotion analysis.

[0743] Step 5:

[0744] The server integrates sentiment information extracted from text and images. The input is the output of steps 3 and 4. This is the process of combining each dataset to generate a storyboard that reflects the user's intended emotions. This storyboard includes the flow of emotions and key points of the narrative. The output is the generated storyboard.

[0745] Step 6:

[0746] The generative AI model automatically generates comics based on storyboards. The input is a completed storyboard. Specifically, it utilizes PyTorch and Keras to depict characters and backgrounds in detail for each scene. The AI ​​makes contextually appropriate creative decisions to improve the quality of the illustrations. The output is a visually rich, emotionally reproducible comic-style content.

[0747] Step 7:

[0748] The user downloads the final comic and checks its contents. The input is the completed comic file generated on the server. The user can save this to their device and enjoy the visualized memories. The output is the digital comic saved on the user's device.

[0749] (Application Example 2)

[0750] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0751] Traditional technologies have had limitations in expressing users' memories and experiences in an emotionally rich and visually stimulating way. In particular, visualizing everyday episodes with genuine emotion is difficult, and the need for advanced skills on the part of the user to express them is a problem. Furthermore, the lack of mechanisms for easily sharing such content has prevented many users from creatively recording and sharing their experiences.

[0752] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0753] In this invention, the server includes means for receiving text data and still image data provided by the user, means for analyzing the text data to extract important information, and means for analyzing the still image data to identify visual elements. This makes it possible for users to generate comics that visualize their memories in an emotionally rich and faithful way, and to easily view and distribute them through information terminals.

[0754] A "user" is an entity that uses the system to input memories and generate emotionally rich, visualized content.

[0755] "Text data" refers to information in text format that users provide to the system, such as memories and experiences expressed in written form.

[0756] "Still image data" refers to image-format information provided by the user to the system, including visual content associated with text data.

[0757] "Visual elements" refer to features and emotions identified from still image data, and are elements that influence the composition and expression of visual content.

[0758] A "storyboard" is a blueprint for a visual story, generated based on important information and visual elements, and provides a framework for the comic.

[0759] "Manga" is a type of graphical content automatically generated based on storyboards, which expresses the user's memories and experiences in a rich and emotional way.

[0760] An "information terminal" refers to a device used by users to view and distribute generated content, and is the hardware on which applications run.

[0761] This invention is a system for users to visualize their memories in an emotionally rich way. The system is primarily implemented via a user terminal and a server.

[0762] The user first inputs text and still image data using an information terminal. This data is then transmitted to the server via a communication interface such as Bluetooth or Wi-Fi.

[0763] The server uses a natural language processing library (e.g., Spacy) to analyze the received text data. Based on the analysis results, it extracts emotions and important information contained in the text. Meanwhile, for still image data, an image analysis module (e.g., OpenCV) is used to extract visual elements. At this time, facial expressions and the atmosphere of the scene in the image are identified.

[0764] Next, the server generates a storyboard based on the extracted information. This storyboard serves as a framework for automatically generating comics using a generation AI model (for example, the DALL-E model).

[0765] The generated comics can be viewed by users on their information terminals and shared with others. Furthermore, emotion recognition technology enables rich emotional expression that faithfully reflects the user's experience.

[0766] For example, if a user enters memories of a holiday with their family, a comic strip will be generated that includes scenes of enjoyable conversations and beautiful scenery. An example of a prompt message would be, "I want to turn my fun family holiday memories into a comic strip. I want you to capture the moments of smiles and the beautiful scenery."

[0767] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0768] Step 1:

[0769] The user uses an information terminal to input text and still image data. The input data is temporarily stored on the terminal and waits until it is ready to be sent.

[0770] Step 2:

[0771] The terminal transmits the entered text data and still image data to the server via a communication interface. The server then receives the data and stores it in data storage for initial processing.

[0772] Step 3:

[0773] The server uses a natural language processing library to analyze received text data and extract important information. The input is text data, and the output generates sentiment tags and important keywords. The analysis process calculates the sentiment value of each word and phrase to grasp the overall tone of the text.

[0774] Step 4:

[0775] The server uses an image analysis module to analyze still image data and identify visual elements. The input is image data, and the output includes emotional features and object information within the image. The analysis process visually evaluates facial expressions and the atmosphere of a scene, and converts them into visualized data.

[0776] Step 5:

[0777] The server generates storyboards based on the extracted key information and visual elements. In this step, it connects the emotions in the text with the visual elements of the images to create the scene composition. The output is storyboard data showing the composition of each page of the manga.

[0778] Step 6:

[0779] The server automatically generates comics based on storyboards using a generation AI model. Storyboard data is provided as input, and the completed comic images are generated as output. The AI ​​model automatically designs the visual content to reflect the expressed emotions.

[0780] Step 7:

[0781] Finally, the server sends the generated comic to the information terminal, making it available for users to view and share. This allows users to relive their memories in an emotionally rich, visualized form and share them with others.

[0782] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0783] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0784] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0785] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0786] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0787] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0788] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0789] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0790] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0791] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0792] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0793] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0794] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0795] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0796] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0797] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0798] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0799] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0800] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0801] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0802] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0803] The following is further disclosed regarding the embodiments described above.

[0804] (Claim 1)

[0805] Means for receiving text data and image data provided by the user,

[0806] A means for analyzing the aforementioned text data and extracting important information,

[0807] A means for analyzing the aforementioned image data to identify visual elements,

[0808] Means for generating a storyboard based on the aforementioned important information and visual elements,

[0809] A means for automatically generating a comic based on the aforementioned storyboard,

[0810] A system that includes this.

[0811] (Claim 2)

[0812] The system according to claim 1, which associates the content of received image data with characters and scenes extracted from text data.

[0813] (Claim 3)

[0814] The system according to claim 1, which faithfully reflects the extracted storyline and emotional expression in the generated comic.

[0815] "Example 1"

[0816] (Claim 1)

[0817] A means of receiving digital data provided by the user,

[0818] A means for analyzing the aforementioned digital data and extracting key information,

[0819] A means for generating a narrative structure based on the aforementioned key information,

[0820] A means for automatically generating a visual work based on the aforementioned narrative structure,

[0821] A means of outputting the generated visual work,

[0822] A system that includes this.

[0823] (Claim 2)

[0824] The system according to claim 1, which associates the content of received visual data with elements extracted from key information.

[0825] (Claim 3)

[0826] The system according to claim 1, which faithfully reflects the extracted narrative flow and emotional expression in the generated visual work.

[0827] "Application Example 1"

[0828] (Claim 1)

[0829] A device that receives text information and image information provided by the user,

[0830] A device for analyzing the aforementioned textual information and extracting important information,

[0831] A device that analyzes the aforementioned image information to identify visual elements,

[0832] A device for generating a scene configuration based on the aforementioned important information and visual elements,

[0833] A device for automatically generating comics based on the aforementioned scene composition,

[0834] A home-use automated device that has a function to record the user's daily life and generates comics based on the recorded data,

[0835] A system that includes this.

[0836] (Claim 2)

[0837] The system according to claim 1, which associates the content of received image information with characters and scenes extracted from text information.

[0838] (Claim 3)

[0839] The system according to claim 1, which faithfully reflects the extracted storyline and emotional expressions in the generated manga.

[0840] "Example 2 of combining an emotion engine"

[0841] (Claim 1)

[0842] A means of receiving data in a format for users to provide information,

[0843] A means for performing natural language manipulation to process the aforementioned data format and obtaining information including emotions,

[0844] A means of analyzing visual information and identifying special elements related to emotion,

[0845] A means for generating a content scenario by combining the acquired emotional information and special elements,

[0846] A means for automatically generating illustrations based on the aforementioned scenario,

[0847] A system that includes this.

[0848] (Claim 2)

[0849] The system according to claim 1, which associates the content of received visual information with the social groups and scenes in which the information is obtained.

[0850] (Claim 3)

[0851] The system according to claim 1, which accurately reflects the acquired development and emotional expression in the generated illustration.

[0852] "Application example 2 of combining emotional engines"

[0853] (Claim 1)

[0854] A means for receiving text data and still image data provided by the user,

[0855] A means for analyzing the aforementioned text data and extracting important information,

[0856] A means for analyzing the aforementioned still image data to identify visual elements,

[0857] Means for generating a storyboard based on the aforementioned important information and visual elements,

[0858] A means for automatically generating a manga based on the aforementioned storyboard,

[0859] A means of displaying and distributing the generated manga on an information terminal,

[0860] A system that includes this.

[0861] (Claim 2)

[0862] The system according to claim 1, which associates the content of received still image data with people and scenes extracted from text data, and makes the generated comic viewable on an information terminal.

[0863] (Claim 3)

[0864] The system according to claim 1, which faithfully reflects the extracted storyline and emotional expression in the generated manga, and provides an emotionally rich user experience on an information terminal. [Explanation of Symbols]

[0865] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. Means for receiving text data and image data provided by the user, A means for analyzing the aforementioned text data and extracting important information, A means for analyzing the aforementioned image data to identify visual elements, Means for generating a storyboard based on the aforementioned important information and visual elements, A means for automatically generating a comic based on the aforementioned storyboard, A system that includes this.

2. The system according to claim 1, which associates the content of received image data with characters and scenes extracted from text data.

3. The system according to claim 1, which faithfully reflects the extracted storyline and emotional expression in the generated manga.