system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The system addresses the challenge of managing digital image data by using AI to select and integrate user comments and news, automating the creation of personalized physical albums, enhancing the user experience.

JP2026096429APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

Application Information

Patent Timeline

03 Dec 2024

Application

15 Jun 2026

Publication

JP2026096429A

IPC: G06Q50/10; G06Q30/015; G06Q30/0207; G06Q30/0601

AI Tagging

Application Domain

Commerce

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Users face challenges in efficiently managing large volumes of digital image data, particularly in creating physical albums that reflect personal memories without the hassle of manual selection and editing.

Method used

A system that utilizes AI to select high-quality images, incorporates user comments and relevant news data, and automatically generates and delivers physical albums by integrating with a server, terminal, and printing company.

Benefits of technology

Enables seamless preservation of memories in physical form, reducing user effort by automating the selection, integration, and printing process, ensuring high-quality and emotionally resonant album creation.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096429000001_ABST

Patent Text Reader

Abstract

Provide a system. 【Solution means】 Storage means for receiving the captured image data; Selection means for selecting an optimal one from the image data; Information addition means for receiving text information based on user input; Data collection means for collecting news data related to a specific date; Generation means for generating an album by combining the selected image, text information and news data; Printing instruction means for printing the generated album; Delivery means for physically delivering the printed album; A system including the above.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] With the popularization of digital cameras and smartphones, users take a large number of photos daily, but their data is mainly stored in digital format and there is a risk of loss. Also, it takes time and effort for users to create physical albums themselves. Therefore, there is a demand for a system that can efficiently select the taken photos and leave them in the form of memories.

Means for Solving the Problems

[0005] This invention provides a system that receives captured image data and uses AI to select the best image from it. The user inputs text information about the day's events, and the system automatically collects and incorporates relevant news data. This data is combined to generate an album, and a means is provided to print and deliver the physical album through instructions to a printing company. This allows users to preserve their precious memories in a physical form without any hassle.

[0006] "Image data" refers to photographic and video information recorded in digital format.

[0007] "Memory means" refers to devices and technologies that store digital information and make it accessible later.

[0008] "Selection method" refers to a process or device for selecting the optimal option from a set of options based on specific conditions.

[0009] "Information addition means" refers to a function that receives text or other data entered by the user and incorporates it into existing data within the system.

[0010] "Data collection means" refers to methods and devices for searching for and acquiring necessary data from external information sources.

[0011] "Generation means" refers to processes and devices for creating new data structures or documents by combining various input data.

[0012] "Print instruction means" refers to the function that issues printing instructions in order to convert digital data into a physical form.

[0013] "Delivery method" refers to the method or equipment used to deliver physically manufactured goods to a designated location. [Brief explanation of the drawing]

[0014] [Figure 1]It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Mode for Carrying Out the Invention

[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0016] First, the terms used in the following description will be explained.

[0017] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0018] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0019] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0020] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0022] [First Embodiment]

[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0035] This invention relates to a system for automatically managing image data captured by a user and saving it as a physical album. This system operates in cooperation with a server, a terminal, and the user.

[0036] First, users routinely take photos using their devices. These devices have a function to automatically upload the captured image data to a server. The server receives the image data via an internet connection and analyzes its contents.

[0037] The server uses an AI algorithm to select the best-quality photo from among multiple uploaded images. This process evaluates the photos based on several criteria, including composition, smiles, and focus.

[0038] Next, the user enters comments on their device about memories and events related to the photos taken that day. This comment information is sent to the server, which simultaneously automatically collects relevant news from the internet based on that date. Based on this information, the server generates the layout for the album page.

[0039] The generated album data is sent to a partner printing company for printing and binding. Finally, the completed album is shipped from the server to the address specified by the user.

[0040] As a concrete example, suppose a user takes various photos at their child's birthday party. The photos taken using the device are automatically uploaded to a server, which then selects the most memorable moments. The user then enters short comments about their impressions of the party and the fun moments they experienced. In addition, the server collects information about "major local events" as news for the day. Combining this information, the server creates an album documenting the user's special day. This entire process is technically seamless, allowing the user to preserve memories without any hassle.

[0041] The following describes the processing flow.

[0042] Step 1:

[0043] The user takes a photo using the device. The device automatically uploads the captured image data to the management server.

[0044] Step 2:

[0045] The server analyzes the received image data. Using AI algorithms, it evaluates the images based on criteria such as composition, resolution, and facial expressions.

[0046] Step 3:

[0047] The server selects the most suitable card based on the evaluation results. The selection criteria can be customized according to the user's settings.

[0048] Step 4:

[0049] Users enter information about the day's events and comments from their devices. The entered comments are sent to the server.

[0050] Step 5:

[0051] The server automatically collects date-related news data from the internet. This news data, along with comments, is incorporated into the album.

[0052] Step 6:

[0053] The server automatically generates album pages by combining selected photos, user comments, and collected news.

[0054] Step 7:

[0055] The server sends the completed album data to the printing company. The printing company prints and binds the album based on the received data.

[0056] Step 8:

[0057] The server handles the shipping arrangements and prepares the completed album to be delivered to the address specified by the user.

[0058] (Example 1)

[0059] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0060] In the digital age, vast amounts of image data are generated daily, but there is a lack of efficient ways to manage this data and to physically record special moments. Furthermore, manual management and editing by users are time-consuming and laborious, and it is difficult to maintain the desired level of perfection in a printed album that reflects individual sensibilities.

[0061] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0062] In this invention, the server includes a storage means for receiving and storing image information from a shooting device, a selection means for selecting high-quality images based on multiple criteria, and an information addition means for adding user text data. This allows users to automatically generate and save special moments as high-quality albums without having to perform complicated editing tasks.

[0063] A "storage means" is a function for receiving image information from a shooting device and storing it via a network as needed.

[0064] The "selection method" is a function that automatically selects the highest quality image from the received image information based on multiple criteria such as composition, facial expression, and focus.

[0065] The "information addition method" is a function that receives text data entered by the user and incorporates it into the album along with image data.

[0066] A "data collection method" is a function that automatically collects news information related to a specific date and time from the internet.

[0067] The "generation method" is a function that integrates selected image information, text data, and collected news information and automatically constructs them as an album.

[0068] The "print instruction means" is a function that sends instructions to an external printing device to output the generated album onto paper.

[0069] "Delivery method" refers to the function of physically transporting printed albums to a location specified by the user.

[0070] This system automatically generates physical albums using image information, with the server, terminal, and user working in conjunction with each other. Users routinely take images using their terminals, and this image information is automatically uploaded to the server via a dedicated application. This process utilizes a network, ensuring smooth data transmission without requiring any special actions from the user.

[0071] The server analyzes the received image information using AI algorithms and selects high-quality images based on criteria such as composition, facial expression, and focus. AI algorithms include, for example, machine learning models and image recognition technologies. The server leverages these technologies to evaluate multiple images quickly and select the necessary data.

[0072] Next, users can add comments via their device about events and memories related to the images they have taken. This user-entered text information is sent to the server and stored as part of the album content. The server also automatically collects news information related to that date via the internet and uses it as content to add value to the album.

[0073] Finally, the server combines the selected image information, user text information, and collected news information to generate an album. A template is used for album generation, providing a consistent and visually appealing layout. The generated album data is then sent to a partner printing facility and printed on paper. The printed album is delivered to the user's specified address via a delivery service.

[0074] As a concrete example, consider a scenario where a user manages photos taken in a nature park during a holiday on their device. The user can easily upload these images to a server through an application and add comments about the enjoyable moments. Based on these comments, the server collects relevant news and event information from news sources. The resulting album then becomes a physical record that vividly preserves the user's memories.

[0075] An example of a prompt would be, "Please provide an overview of the AI-powered automated photo album generation system. In particular, please explain the image selection criteria and layout generation in detail." Using this prompt, the generating AI model can explain the system's details.

[0076] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0077] Step 1:

[0078] The user uses the device to capture image information. The captured image information is configured to be automatically uploaded to a server by a dedicated application on the device. The input is the captured image file, which is sent to the server via the internet. The output is the image data stored on the server. The device's operation includes transmitting the image data in the appropriate format over the network connection.

[0079] Step 2:

[0080] The server analyzes the received image data using an AI algorithm. Specifically, it evaluates the images based on multiple criteria, such as composition, the subject's smile, and focus accuracy. The input consists of multiple uploaded image data, which are then evaluated by the AI algorithm. The output consists of images deemed to be of high quality. The server's specific operations include calling an AI model for image analysis and performing scoring on each image.

[0081] Step 3:

[0082] Users enter comments about events and memories related to images taken via their device. Input consists of text information entered by the user into the application, which is then sent to the server. Output is text data stored on the server. The device's operation includes the ability to input comments through the user interface and send that data to the server.

[0083] Step 4:

[0084] The server automatically collects news information related to a specific date and time via the internet. The input is date information, and based on this, it retrieves relevant news from online resources. The output is news data to be added to an album. The server's operation involves a process of gathering relevant information for that day using news APIs and web scraping techniques.

[0085] Step 5:

[0086] The server generates an album layout by combining selected image data, user comments, and collected news information. Inputs include selected images, text information, and news information, which are arranged in a consistent layout using templates. The output is digital album data for printing. The server's operation involves arranging content using a template engine and generating the completed album data.

[0087] Step 6:

[0088] The server sends the generated album data to a partner printing facility. The input is the completed album data, which is sent in the optimal format for printing. The output is a physical album ready for printing. The server's operation includes sending the data in the appropriate format according to the printing partner's instructions.

[0089] Step 7:

[0090] The server physically delivers the printed album to the address specified by the user. The input consists of the printed album and the user's delivery address information; the output is the album delivered to the user's specified location. The server's operations include issuing delivery instructions to the shipping company and managing tracking information.

[0091] (Application Example 1)

[0092] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0093] Managing the large amount of visual data captured in daily life, and using that data to create memories, is a time-consuming and laborious task for individual users. Furthermore, integrating detailed information and events related to those memories and saving them in physical form is an even more laborious activity. This project aims to provide a method that can perform these tasks seamlessly by integrating them with home-use cameras and audio input devices.

[0094] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0095] In this invention, the server includes a storage means for receiving visual data recorded by a camera, a voice conversion means for converting voice input from a user into text information, and a data collection means for collecting information related to a specific date. This allows the user to automatically save the captured data in an optimized form and generate an integrated album using individual events and related information.

[0096] A "recording device" is a device that has the function of recording visual data, and includes cameras and the like.

[0097] "Visual data" refers to data such as images and videos recorded by a camera or videographer.

[0098] A "memory device" is a device for storing and retaining received visual data.

[0099] A "selection method" is a function that provides a process for selecting the most suitable data from the received visual data according to specific criteria.

[0100] An "information addition mechanism" is a function that receives user input and incorporates that information into visual data.

[0101] "Data collection means" refers to the function of collecting information related to a specific date or event using the internet or other information sources.

[0102] A "generation means" is a device that provides a process for creating an integrated album using selected visual data and collected information.

[0103] The "instruction means" refers to a function that executes the printing instructions for the generated album.

[0104] "Delivery method" refers to the process or equipment used to deliver printed physical albums to a designated location.

[0105] A "voice conversion device" is a device that has the function of converting the user's voice input into text information and digitizing it.

[0106] The system for realizing this invention provides a process for efficiently managing and saving visual data captured by users in their daily lives as an album. Specifically, users routinely capture visual data with a camera device such as a smartphone, and this data is automatically uploaded to a server. This process is realized through an application built into the camera device and a Wi-Fi connection function.

[0107] The server stores the received visual data in a memory device. Furthermore, it is equipped with a selection mechanism that uses an AI algorithm to evaluate the visual data and select the best image. This evaluation is performed using an AI framework such as TENSORFLOW® and is based on multiple criteria such as focus, composition, and the subject's facial expression.

[0108] The user sends comments related to the captured data to the server using a voice input device. The server converts this into text information using a voice conversion means and collects information related to that date from the internet via a data collection means.

[0109] The server automatically generates the album layout using a generation mechanism, utilizing selected visual data, text information, and collected related information. The generated album data is sent to a partner printing company via a print instruction mechanism. The printed album is then delivered to the user's specified address using a delivery mechanism.

[0110] As a concrete example, if a user records their family's daily life or special events as visual data and adds comments via voice input, this data will be integrated with local news from the same day and compiled into an album of memories of that special day. An example of a prompt is as follows: "Please select a beautiful landscape photo taken by your home robot that is in focus and shows everyone in the family smiling, and combine it with your comments and today's news to create an album of this special day."

[0111] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0112] Step 1:

[0113] The user captures visual data using their smartphone's camera. After capture, an application on the smartphone automatically uploads the visual data to the server. The input is visual data, and the output is storage on the server's storage device. Wi-Fi is used for stable and efficient data transfer during this process.

[0114] Step 2:

[0115] The server stores the received visual data in a storage device. The storage device holds the data and stores it in a format suitable for subsequent AI processing. The input is the uploaded visual data, and the output is the stored visual data.

[0116] Step 3:

[0117] The server analyzes stored visual data using a selection mechanism. An AI algorithm is used to evaluate the visual data and select the best image. Factors such as focus, composition, and subject expression are considered during this process. The input is the stored visual data, and the output is the selected, optimal image.

[0118] Step 4:

[0119] Users provide feedback and comments related to visual data using their smartphone's voice input function. This voice data is sent to a server and converted into text information using a speech-to-text conversion tool. The input is voice data, and the output is text information.

[0120] Step 5:

[0121] The server collects information related to a specific date from the internet using data collection methods. It retrieves relevant news and event information to enrich the album's content. The input is date information, and the output is the collected related information.

[0122] Step 6:

[0123] The server combines selected visual data, text information, and collected related information to create album layouts using a generation mechanism. This process is automated, providing efficient and consistent layouts. The inputs are selected visual data, text information, and related information, and the output is album layout data.

[0124] Step 7:

[0125] The server sends the generated album data to the printing company via a print instruction system. It instructs the physical printing of the album and prepares it for delivery after completion. The input is the album layout data, and the output is the instruction to the printing company.

[0126] Step 8:

[0127] The server delivers the printed albums to the user's specified address using a delivery service. It manages the delivery process and ensures the albums reach the user. The input is the physical albums, and the output is the completion of delivery to the user.

[0128] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0129] This invention relates to a system that optimizes the selection of captured image data and album generation by utilizing an emotion engine that recognizes user emotions. Users routinely take photos using their devices, and this data is uploaded to a server. The server analyzes the received image data using a combination of an AI algorithm and the emotion engine.

[0130] In analyzing image data, the server considers not only technical aspects such as composition and resolution, but also the user's emotions. The emotion engine detects the user's emotions based on factors such as the tone of voice when the photo was taken, comments entered after the photo was taken, and other biometric data. Based on this emotion data, it selects the image that best reflects the user's feelings.

[0131] Furthermore, the sentiment engine analyzes the comments entered by users to detect what emotions are being expressed. This allows the server to customize the album page layout according to the emotions. For example, if there are many happy emotions, a bright design will be chosen, and if there are many calm emotions, a simple and quiet design will be adopted.

[0132] The emotion engine is also useful for collecting news data. It can take into account the user's emotional state, allowing for customization such as prioritizing the collection of positive news when the user is feeling emotionally uplifted.

[0133] As a concrete example, consider a scenario where a user takes photos to record how their family spends their holidays. The device simultaneously records a voice memo when taking the photo, and the server uses this to analyze the user's emotions. Based on this emotional data, the server selects the most impactful photo and generates an album page that expresses positive emotions. Furthermore, by collecting and incorporating relevant news, such as feature articles and local event information, into the album, the user's memories are preserved in a richer way.

[0134] This system aims to enhance users' memories by pursuing perfection in both technical and emotional dimensions.

[0135] The following describes the processing flow.

[0136] Step 1:

[0137] The user takes a photo using the device. The device collects the photo data along with voice memos and biometric information, and uploads these to the management server.

[0138] Step 2:

[0139] The server receives image data, voice memos, and biometric information, and analyzes the user's emotions based on this data. The emotion engine analyzes voice tone, content, and biometric data to identify emotions.

[0140] Step 3:

[0141] Based on the results of sentiment analysis, the server uses an AI algorithm to select the photo that best reflects the user's emotions from among the uploaded photos.

[0142] Step 4:

[0143] The user enters comments on their device about memories and events related to the photos. The entered comments are sent to the server.

[0144] Step 5:

[0145] The server receives comments, and the emotion engine analyzes their content to extract emotions from them. Based on this information, the album page design is customized with a style that matches the emotion.

[0146] Step 6:

[0147] The server automatically collects news data related to a specific date from the internet. During this process, it takes into account the user's emotional state and prioritizes collecting news that is highly relevant.

[0148] Step 7:

[0149] The server automatically generates album pages by combining selected photos, emotion-based layouts, and collected news data.

[0150] Step 8:

[0151] The server sends the generated album data to the printing company. The printing company prints and binds the album based on the received data.

[0152] Step 9:

[0153] The server handles the shipping arrangements and prepares the completed album to be delivered to the address specified by the user.

[0154] (Example 2)

[0155] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0156] Currently, organizing and saving captured image data in a way that suits the user is not easy, and creating albums that take emotional elements into account is particularly difficult. Furthermore, the process of users selecting the best photo from a large number of images is time-consuming, so an efficient method is needed. In addition, collecting and integrating date-related news data based on personal emotions is also a challenge.

[0157] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0158] In this invention, the server includes a storage means for receiving and storing captured image data, an emotion analysis means for analyzing audio data related to the image data and extracting emotion data, a selection means for selecting the optimal image based on the emotion data, an information addition means for receiving text information based on user input and analyzing the emotion, a data collection means for collecting news data related to a specific date and emotion data, and a provision means for providing the generated album in digital format. This makes it possible to automatically select the optimal image in a way that resonates with the user's emotions and efficiently generate and provide personalized albums and related information.

[0159] A "storage device" is a device or function for receiving and storing information such as captured image data.

[0160] "Emotional analysis means" refers to a device or function for analyzing voice data or text information to extract user emotional data.

[0161] "Selection means" refers to a device or function for automatically selecting the optimal image based on analyzed emotion data.

[0162] "Information addition means" refers to a device or function that receives text information provided by a user, analyzes that information, and detects emotional characteristics.

[0163] "Data collection means" refers to a device or function for collecting relevant news data based on a specific date or analyzed sentiment data.

[0164] "Generation means" refers to a device or function for generating an album by combining selected images, text information, and news data.

[0165] "Means of provision" refers to a device or function for providing the generated album to the user in digital format.

[0166] This invention relates to a system for efficiently managing captured image data and generating albums tailored to the user's needs. Users routinely take photos and save the data to a device. This device also includes a function for recording voice memos at the time of shooting. The photos taken by the user are uploaded to a server via the device, and the server performs analysis based on this information.

[0167] The server first has a storage mechanism for saving received image data. Next, it uses emotion analysis to detect the user's emotions from the audio data. This is achieved using general-purpose speech analysis software. A general-purpose speech recognition API can be used for speech analysis. Furthermore, the server uses an AI algorithm to evaluate the technical quality of the images. Image processing is performed quickly and efficiently by utilizing a GPU.

[0168] Based on sentiment analysis and image evaluation results, the server selects the most suitable image. At this stage, a generative AI model can be used to highly customize the selection criteria. Album generation is then performed based on the selected image and the text information entered by the user. The generative AI model analyzes prompt text containing the text information and is used to design an album layout that meets the user's preferences. For example, if a bright design theme is used, visually impactful design software is employed.

[0169] Furthermore, the server collects news data according to the user's emotional state. When the user's emotions are positive, news with positive content is prioritized and incorporated as relevant information in the album. This enriches the user's experience.

[0170] For example, if a user wants to record memories from a family trip, the server can automatically handle all these steps and generate an album that aggregates photos and information that contain many positive emotions. An example of a prompt would be, "Take photos of your family holiday and create an album that reflects the fun and joy it contained."

[0171] This allows users to easily save and relive special memories in a way that resonates with their emotions, without having to go through cumbersome selection and editing processes.

[0172] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0173] Step 1:

[0174] The device acquires image data captured by the user. Simultaneously with the capture, it records a voice memo and uploads this data to the server. The input image and voice data are sent to the server, and the server receives this data as output. Specifically, the device detects clicks through a camera application and simultaneously records and transmits images and audio.

[0175] Step 2:

[0176] The server analyzes the received image data. It receives image data as input, uses an AI algorithm to evaluate composition, resolution, and other factors, and analyzes the technical quality of the image. The output consists of evaluation results and metadata. Specifically, the server uses a GPU-based image processing engine to analyze the data and calculate results at high speed.

[0177] Step 3:

[0178] The server analyzes the audio data to extract the user's emotional data. This is done by using the audio data as input and analyzing emotions from voice tone and language patterns using emotion analysis tools. The output is the identified emotional data. Specifically, the server uses a speech recognition API to convert the audio data into text, and then uses an emotion AI model to classify the emotions.

[0179] Step 4:

[0180] The server integrates image evaluation and sentiment data to select the optimal image. The selection algorithm operates based on the input evaluation results and sentiment data, which serve as selection criteria. The output is the selected optimal image. The specific operation includes using a generative AI model to create selection prompts based on the analysis results and then executing the algorithm.

[0181] Step 5:

[0182] The server receives text information entered by the user and analyzes their emotions based on it. It receives text information as input and outputs the emotion analysis results. Specifically, it has a process of analyzing the user's text data using natural language processing techniques and aggregating it as emotion data.

[0183] Step 6:

[0184] The server generates personalized albums based on selected images and analyzed sentiment data. Here, selected images, text information, and news data are used as input, and a customized album is generated as output. Specifically, a generation AI model is used to select a template, and the album is constructed using design software.

[0185] Step 7:

[0186] The server provides the generated album to the user. The user can view the album and share their experience. The input is the generated album data, and the output is the digital album displayed on the user's device. The specific operation involves a process of delivering the album in real time via a web browser or mobile application.

[0187] (Application Example 2)

[0188] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0189] Selecting the most suitable photos from user-submitted images to match specific emotions and atmospheres, and generating a record based on those selections, is not easy. Furthermore, creating albums that enhance emotional value based on images and text has limitations with conventional technologies. There is a need to consider the user's emotions and the atmosphere of the moment, and to collect even more relevant information to create records that deepen individual experiences.

[0190] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0191] In this invention, the server includes data storage means for receiving captured image information, extraction means for selecting the optimal image from the image information, and sentiment analysis means for collecting sentiment information based on the user's voice and comments. This makes it possible to select the optimal image that matches the user's emotions and record that emotion.

[0192] A "data storage means" is a component that has the function of receiving captured image information and storing it for subsequent processing.

[0193] An "extraction means" is a technical device that performs processing to select the optimal image from the received image information.

[0194] "Emotional analysis means" refers to a component that incorporates technology to analyze user voices and comments and collect emotional information.

[0195] A "layout generation means" is a device or system that has the function of customizing the layout of an album or record based on emotional information.

[0196] "Information gathering means" refers to components used to collect information related to a specific date and incorporate it into data.

[0197] "Record generation means" refers to a device or program that generates a record combining extracted images, emotional information, and collected information.

[0198] An "instruction means" is a component that issues commands to output the generated recorded data.

[0199] The system for realizing this invention comprises various modules. First, when a user takes an image using the terminal's camera, the image data is uploaded to the server by a data storage means. At the same time, the terminal also records the user's voice comments and sends them to the server. The server converts the voice data into text using speech recognition technology such as Google's Speech-to-Text API.

[0200] Next, the server analyzes the user's emotions using emotion analysis libraries such as IBM Watson® Tone Analyzer. This emotion data is combined with image data that has undergone technical evaluation through image analysis tools such as OpenCV and TensorFlow. Based on the analysis results, the server uses a layout generation means to customize the album page design according to the emotions. Furthermore, it uses an information gathering means to collect news and event information related to a specific date and integrates it as part of the record that reflects the user's emotions.

[0201] As a concrete example, suppose a user takes photos during a family trip and exclaims, "This is fun!" If this voice is analyzed and recognized as an emotion of "joy," the system will select that photo as the best one and generate a bright and vibrant album page. Furthermore, news and event information from the travel destination can be incorporated into the album as added value.

[0202] Examples of prompts using a generative AI model include "Please suggest the optimal album generation method based on sentiment analysis of family photos" and "Please show the steps for designing a system that recommends news based on user sentiment." In this way, it becomes possible to provide memorable records that deeply consider the user's emotions.

[0203] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0204] Step 1:

[0205] The user takes an image using their device and inputs voice comments. The image data and voice data are uploaded from the device to the server. The input is the captured image file and voice file, and the output is the saving of those files on the server.

[0206] Step 2:

[0207] The server uses the Google Speech-to-Text API to convert audio data into text data. The input is an audio file, and the output is a comment in text format. In this process, speech recognition technology is used to analyze the audio signal and convert it into the appropriate text format.

[0208] Step 3:

[0209] The server analyzes the converted text data using IBM Watson Tone Analyzer to identify the user's emotions. The input is a text comment, and the output is emotional information (e.g., joy, excitement). The process involves analyzing the context of the text and the emotional tone.

[0210] Step 4:

[0211] The server uses OpenCV and TensorFlow to perform technical evaluations of image data. The input is an image file, and the output is technical metrics (including resolution and composition). Image analysis evaluates clarity and compositional balance.

[0212] Step 5:

[0213] The server extracts the best image based on emotional information and technical evaluation. This selection process involves scoring based on emotional and technical quality, choosing the image with the highest score. The input is emotional information and technical metrics, and the output is the selected image.

[0214] Step 6:

[0215] Based on the selected images, the album layout is customized to reflect the emotions expressed. The input consists of the selected images and emotional information, while the output is a personalized album layout. The customization process includes selecting color schemes and design elements that match the emotions.

[0216] Step 7:

[0217] This system uses information gathering tools to collect news and event information related to a specific date and incorporates it into an album. The input is date information, and the output is a list of related information. The server automatically scans external information sources and extracts relevant data.

[0218] Step 8:

[0219] The completed album data is visualized by the user and, if necessary, saved to a storage medium or printed. The input is the album layout and related information, and the output is the final album provided to the user. The server visualizes the generated data and processes it according to the user's preferences and printing requests.

[0220] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0221] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0222] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0223] [Second Embodiment]

[0224] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0225] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0226] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0227] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0228] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0229] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0230] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0231] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0232] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0233] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0234] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0235] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0236] This invention relates to a system for automatically managing image data captured by a user and saving it as a physical album. This system operates in cooperation with a server, a terminal, and the user.

[0237] First, users routinely take photos using their devices. These devices have a function to automatically upload the captured image data to a server. The server receives the image data via an internet connection and analyzes its contents.

[0238] The server uses an AI algorithm to select the best-quality photo from among multiple uploaded images. This process evaluates the photos based on several criteria, including composition, smiles, and focus.

[0239] Next, the user enters comments on their device about memories and events related to the photos taken that day. This comment information is sent to the server, which simultaneously automatically collects relevant news from the internet based on that date. Based on this information, the server generates the layout for the album page.

[0240] The generated album data is sent to a partner printing company for printing and binding. Finally, the completed album is shipped from the server to the address specified by the user.

[0241] As a concrete example, suppose a user takes various photos at their child's birthday party. The photos taken using the device are automatically uploaded to a server, which then selects the most memorable moments. The user then enters short comments about their impressions of the party and the fun moments they experienced. In addition, the server collects information about "major local events" as news for the day. Combining this information, the server creates an album documenting the user's special day. This entire process is technically seamless, allowing the user to preserve memories without any hassle.

[0242] The following describes the processing flow.

[0243] Step 1:

[0244] The user takes a photo using the device. The device automatically uploads the captured image data to the management server.

[0245] Step 2:

[0246] The server analyzes the received image data. Using AI algorithms, it evaluates the images based on criteria such as composition, resolution, and facial expressions.

[0247] Step 3:

[0248] The server selects the most suitable card based on the evaluation results. The selection criteria can be customized according to the user's settings.

[0249] Step 4:

[0250] Users enter information about the day's events and comments from their devices. The entered comments are sent to the server.

[0251] Step 5:

[0252] The server automatically collects date-related news data from the internet. This news data, along with comments, is incorporated into the album.

[0253] Step 6:

[0254] The server automatically generates album pages by combining selected photos, user comments, and collected news.

[0255] Step 7:

[0256] The server sends the completed album data to the printing company. The printing company prints and binds the album based on the received data.

[0257] Step 8:

[0258] The server handles the shipping arrangements and prepares the completed album to be delivered to the address specified by the user.

[0259] (Example 1)

[0260] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0261] In the digital age, vast amounts of image data are generated daily, but there is a lack of efficient ways to manage this data and to physically record special moments. Furthermore, manual management and editing by users are time-consuming and laborious, and it is difficult to maintain the desired level of perfection in a printed album that reflects individual sensibilities.

[0262] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0263] In this invention, the server includes a storage means for receiving and storing image information from a shooting device, a selection means for selecting high-quality images based on multiple criteria, and an information addition means for adding user text data. This allows users to automatically generate and save special moments as high-quality albums without having to perform complicated editing tasks.

[0264] A "storage means" is a function for receiving image information from a shooting device and storing it via a network as needed.

[0265] The "selection method" is a function that automatically selects the highest quality image from the received image information based on multiple criteria such as composition, facial expression, and focus.

[0266] The "information addition method" is a function that receives text data entered by the user and incorporates it into the album along with image data.

[0267] A "data collection method" is a function that automatically collects news information related to a specific date and time from the internet.

[0268] The "generation method" is a function that integrates selected image information, text data, and collected news information and automatically constructs them as an album.

[0269] The "print instruction means" is a function that sends instructions to an external printing device to output the generated album onto paper.

[0270] "Delivery method" refers to the function of physically transporting printed albums to a location specified by the user.

[0271] This system automatically generates physical albums using image information, with the server, terminal, and user working in conjunction with each other. Users routinely take images using their terminals, and this image information is automatically uploaded to the server via a dedicated application. This process utilizes a network, ensuring smooth data transmission without requiring any special actions from the user.

[0272] The server analyzes the received image information using AI algorithms and selects high-quality images based on criteria such as composition, facial expression, and focus. AI algorithms include, for example, machine learning models and image recognition technologies. The server leverages these technologies to evaluate multiple images quickly and select the necessary data.

[0273] Next, users can add comments via their device about events and memories related to the images they have taken. This user-entered text information is sent to the server and stored as part of the album content. The server also automatically collects news information related to that date via the internet and uses it as content to add value to the album.

[0274] Finally, the server combines the selected image information, user text information, and collected news information to generate an album. A template is used for album generation, providing a consistent and visually appealing layout. The generated album data is then sent to a partner printing facility and printed on paper. The printed album is delivered to the user's specified address via a delivery service.

[0275] As a concrete example, consider a scenario where a user manages photos taken in a nature park during a holiday on their device. The user can easily upload these images to a server through an application and add comments about the enjoyable moments. Based on these comments, the server collects relevant news and event information from news sources. The resulting album then becomes a physical record that vividly preserves the user's memories.

[0276] An example of a prompt would be, "Please provide an overview of the AI-powered automated photo album generation system. In particular, please explain the image selection criteria and layout generation in detail." Using this prompt, the generating AI model can explain the system's details.

[0277] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0278] Step 1:

[0279] The user uses the device to capture image information. The captured image information is configured to be automatically uploaded to a server by a dedicated application on the device. The input is the captured image file, which is sent to the server via the internet. The output is the image data stored on the server. The device's operation includes transmitting the image data in the appropriate format over the network connection.

[0280] Step 2:

[0281] The server analyzes the received image data using an AI algorithm. Specifically, it performs an evaluation based on multiple criteria such as the composition of the image, the subject's smile, and the accuracy of the focus. The inputs are multiple uploaded image data, which are evaluated by the AI algorithm. The output is the image determined to be of high quality. The specific operations of the server include calling an AI model for image analysis and performing scoring on each image.

[0282] Step 3:

[0283] The user inputs comments related to the events and memories associated with the images taken via the terminal. The inputs are the text information entered by the user into the application, which is sent to the server. The output is the text data stored in the server. The operations of the terminal include the function of inputting comments through the user interface and sending the data to the server.

[0284] Step 4:

[0285] The server automatically collects news information related to a specific date and time through the Internet. The input is the date information, and based on this, relevant news is retrieved from online resources. The output is the news data to be added to the album. The operations of the server include the process of collecting relevant information of that day using news APIs and web scraping technologies.

[0286] Step 5:

[0287] The server generates an album layout by combining selected image data, user comments, and collected news information. Inputs include selected images, text information, and news information, which are arranged in a consistent layout using templates. The output is digital album data for printing. The server's operation involves arranging content using a template engine and generating the completed album data.

[0288] Step 6:

[0289] The server sends the generated album data to a partner printing facility. The input is the completed album data, which is sent in the optimal format for printing. The output is a physical album ready for printing. The server's operation includes sending the data in the appropriate format according to the printing partner's instructions.

[0290] Step 7:

[0291] The server physically delivers the printed album to the address specified by the user. The input consists of the printed album and the user's delivery address information; the output is the album delivered to the user's specified location. The server's operations include issuing delivery instructions to the shipping company and managing tracking information.

[0292] (Application Example 1)

[0293] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0294] Managing the large amount of visual data captured in daily life, and using that data to create memories, is a time-consuming and laborious task for individual users. Furthermore, integrating detailed information and events related to those memories and saving them in physical form is an even more laborious activity. This project aims to provide a method that can perform these tasks seamlessly by integrating them with home-use cameras and audio input devices.

[0295] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0296] In this invention, the server includes a storage means for receiving visual data recorded by a camera, a voice conversion means for converting voice input from a user into text information, and a data collection means for collecting information related to a specific date. This allows the user to automatically save the captured data in an optimized form and generate an integrated album using individual events and related information.

[0297] A "recording device" is a device that has the function of recording visual data, and includes cameras and the like.

[0298] "Visual data" refers to data such as images and videos recorded by a camera or videographer.

[0299] A "memory device" is a device for storing and retaining received visual data.

[0300] A "selection method" is a function that provides a process for selecting the most suitable data from the received visual data according to specific criteria.

[0301] An "information addition mechanism" is a function that receives user input and incorporates that information into visual data.

[0302] "Data collection means" refers to the function of collecting information related to a specific date or event using the internet or other information sources.

[0303] The "generation means" is a device that provides a process for creating an integrated album using selected visual data and collected information.

[0304] The "instruction means" is something that provides a function to execute a print instruction for the generated album.

[0305] The "delivery means" is a process or device for delivering the printed physical album to a designated location.

[0306] The "voice conversion means" is a device that has the function of converting the user's voice input into text information and digitizing it.

[0307] The system for realizing this invention provides a process for efficiently managing the visual data captured by the user in daily life and storing it as an album. Specifically, the user captures visual data with a photographing device such as a smartphone on a daily basis, and automatically uploads the data to the server. This process is realized through the application built into the photographing device and the Wi-Fi connection function.

[0308] The server accumulates the received visual data in the storage means. Furthermore, it has a selection means for evaluating the visual data using an AI algorithm and selecting the optimal single image. This evaluation is carried out using an AI framework such as TensorFlow and is based on a plurality of criteria such as focus, composition, and the expression of the subject.

[0309] The user uses a voice input device to send comments related to the photographed data to the server. The server converts this into text information with the voice conversion means and collects information related to that date from the Internet via the data collection means.

[0310] The server automatically generates the album layout using a generation mechanism, utilizing selected visual data, text information, and collected related information. The generated album data is sent to a partner printing company via a print instruction mechanism. The printed album is then delivered to the user's specified address using a delivery mechanism.

[0311] As a concrete example, if a user records their family's daily life or special events as visual data and adds comments via voice input, this data will be integrated with local news from the same day and compiled into an album of memories of that special day. An example of a prompt is as follows: "Please select a beautiful landscape photo taken by your home robot that is in focus and shows everyone in the family smiling, and combine it with your comments and today's news to create an album of this special day."

[0312] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0313] Step 1:

[0314] The user captures visual data using their smartphone's camera. After capture, an application on the smartphone automatically uploads the visual data to the server. The input is visual data, and the output is storage on the server's storage device. Wi-Fi is used for stable and efficient data transfer during this process.

[0315] Step 2:

[0316] The server stores the received visual data in a storage device. The storage device holds the data and stores it in a format suitable for subsequent AI processing. The input is the uploaded visual data, and the output is the stored visual data.

[0317] Step 3:

[0318] The server analyzes stored visual data using a selection mechanism. An AI algorithm is used to evaluate the visual data and select the best image. Factors such as focus, composition, and subject expression are considered during this process. The input is the stored visual data, and the output is the selected, optimal image.

[0319] Step 4:

[0320] Users provide feedback and comments related to visual data using their smartphone's voice input function. This voice data is sent to a server and converted into text information using a speech-to-text conversion tool. The input is voice data, and the output is text information.

[0321] Step 5:

[0322] The server collects information related to a specific date from the internet using data collection methods. It retrieves relevant news and event information to enrich the album's content. The input is date information, and the output is the collected related information.

[0323] Step 6:

[0324] The server combines selected visual data, text information, and collected related information to create album layouts using a generation mechanism. This process is automated, providing efficient and consistent layouts. The inputs are selected visual data, text information, and related information, and the output is album layout data.

[0325] Step 7:

[0326] The server sends the generated album data to the printing company via a print instruction system. It instructs the physical printing of the album and prepares it for delivery after completion. The input is the album layout data, and the output is the instruction to the printing company.

[0327] Step 8:

[0328] The server delivers the printed albums to the user's specified address using a delivery service. It manages the delivery process and ensures the albums reach the user. The input is the physical albums, and the output is the completion of delivery to the user.

[0329] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0330] This invention relates to a system that optimizes the selection of captured image data and album generation by utilizing an emotion engine that recognizes user emotions. Users routinely take photos using their devices, and this data is uploaded to a server. The server analyzes the received image data using a combination of an AI algorithm and the emotion engine.

[0331] In analyzing image data, the server considers not only technical aspects such as composition and resolution, but also the user's emotions. The emotion engine detects the user's emotions based on factors such as the tone of voice when the photo was taken, comments entered after the photo was taken, and other biometric data. Based on this emotion data, it selects the image that best reflects the user's feelings.

[0332] Furthermore, the sentiment engine analyzes the comments entered by users to detect what emotions are being expressed. This allows the server to customize the album page layout according to the emotions. For example, if there are many happy emotions, a bright design will be chosen, and if there are many calm emotions, a simple and quiet design will be adopted.

[0333] The emotion engine is also useful for collecting news data. It can take into account the user's emotional state, allowing for customization such as prioritizing the collection of positive news when the user is feeling emotionally uplifted.

[0334] As a concrete example, consider a scenario where a user takes photos to record how their family spends their holidays. The device simultaneously records a voice memo when taking the photo, and the server uses this to analyze the user's emotions. Based on this emotional data, the server selects the most impactful photo and generates an album page that expresses positive emotions. Furthermore, by collecting and incorporating relevant news, such as feature articles and local event information, into the album, the user's memories are preserved in a richer way.

[0335] This system aims to enhance users' memories by pursuing perfection in both technical and emotional dimensions.

[0336] The following describes the processing flow.

[0337] Step 1:

[0338] The user takes a photo using the device. The device collects the photo data along with voice memos and biometric information, and uploads these to the management server.

[0339] Step 2:

[0340] The server receives image data, voice memos, and biometric information, and analyzes the user's emotions based on this data. The emotion engine analyzes voice tone, content, and biometric data to identify emotions.

[0341] Step 3:

[0342] Based on the results of sentiment analysis, the server uses an AI algorithm to select the photo that best reflects the user's emotions from among the uploaded photos.

[0343] Step 4:

[0344] The user enters comments on their device about memories and events related to the photos. The entered comments are sent to the server.

[0345] Step 5:

[0346] The server receives comments, and the emotion engine analyzes their content to extract emotions from them. Based on this information, the album page design is customized with a style that matches the emotion.

[0347] Step 6:

[0348] The server automatically collects news data related to a specific date from the internet. During this process, it takes into account the user's emotional state and prioritizes collecting news that is highly relevant.

[0349] Step 7:

[0350] The server automatically generates album pages by combining selected photos, emotion-based layouts, and collected news data.

[0351] Step 8:

[0352] The server sends the generated album data to the printing company. The printing company prints and binds the album based on the received data.

[0353] Step 9:

[0354] The server handles the shipping arrangements and prepares the completed album to be delivered to the address specified by the user.

[0355] (Example 2)

[0356] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0357] Currently, organizing and saving captured image data in a way that suits the user is not easy, and creating albums that take emotional elements into account is particularly difficult. Furthermore, the process of users selecting the best photo from a large number of images is time-consuming, so an efficient method is needed. In addition, collecting and integrating date-related news data based on personal emotions is also a challenge.

[0358] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0359] In this invention, the server includes a storage means for receiving and storing captured image data, an emotion analysis means for analyzing audio data related to the image data and extracting emotion data, a selection means for selecting the optimal image based on the emotion data, an information addition means for receiving text information based on user input and analyzing the emotion, a data collection means for collecting news data related to a specific date and emotion data, and a provision means for providing the generated album in digital format. This makes it possible to automatically select the optimal image in a way that resonates with the user's emotions and efficiently generate and provide personalized albums and related information.

[0360] A "storage device" is a device or function for receiving and storing information such as captured image data.

[0361] "Emotional analysis means" refers to a device or function for analyzing voice data or text information to extract user emotional data.

[0362] "Selection means" refers to a device or function for automatically selecting the optimal image based on analyzed emotion data.

[0363] "Information addition means" refers to a device or function that receives text information provided by a user, analyzes that information, and detects emotional characteristics.

[0364] "Data collection means" refers to a device or function for collecting relevant news data based on a specific date or analyzed sentiment data.

[0365] "Generation means" refers to a device or function for generating an album by combining selected images, text information, and news data.

[0366] "Means of provision" refers to a device or function for providing the generated album to the user in digital format.

[0367] This invention relates to a system for efficiently managing captured image data and generating albums tailored to the user's needs. Users routinely take photos and save the data to a device. This device also includes a function for recording voice memos at the time of shooting. The photos taken by the user are uploaded to a server via the device, and the server performs analysis based on this information.

[0368] The server first has a storage mechanism for saving received image data. Next, it uses emotion analysis to detect the user's emotions from the audio data. This is achieved using general-purpose speech analysis software. A general-purpose speech recognition API can be used for speech analysis. Furthermore, the server uses an AI algorithm to evaluate the technical quality of the images. Image processing is performed quickly and efficiently by utilizing a GPU.

[0369] Based on sentiment analysis and image evaluation results, the server selects the most suitable image. At this stage, a generative AI model can be used to highly customize the selection criteria. Album generation is then performed based on the selected image and the text information entered by the user. The generative AI model analyzes prompt text containing the text information and is used to design an album layout that meets the user's preferences. For example, if a bright design theme is used, visually impactful design software is employed.

[0370] Furthermore, the server collects news data according to the user's emotional state. When the user's emotions are positive, news with positive content is prioritized and incorporated as relevant information in the album. This enriches the user's experience.

[0371] For example, if a user wants to record memories from a family trip, the server can automatically handle all these steps and generate an album that aggregates photos and information that contain many positive emotions. An example of a prompt would be, "Take photos of your family holiday and create an album that reflects the fun and joy it contained."

[0372] This allows users to easily save and relive special memories in a way that resonates with their emotions, without having to go through cumbersome selection and editing processes.

[0373] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0374] Step 1:

[0375] The device acquires image data captured by the user. Simultaneously with the capture, it records a voice memo and uploads this data to the server. The input image and voice data are sent to the server, and the server receives this data as output. Specifically, the device detects clicks through a camera application and simultaneously records and transmits images and audio.

[0376] Step 2:

[0377] The server analyzes the received image data. It receives image data as input, uses an AI algorithm to evaluate composition, resolution, and other factors, and analyzes the technical quality of the image. The output consists of evaluation results and metadata. Specifically, the server uses a GPU-based image processing engine to analyze the data and calculate results at high speed.

[0378] Step 3:

[0379] The server analyzes the audio data to extract the user's emotional data. This is done by using the audio data as input and analyzing emotions from voice tone and language patterns using emotion analysis tools. The output is the identified emotional data. Specifically, the server uses a speech recognition API to convert the audio data into text, and then uses an emotion AI model to classify the emotions.

[0380] Step 4:

[0381] The server integrates image evaluation and sentiment data to select the optimal image. The selection algorithm operates based on the input evaluation results and sentiment data, which serve as selection criteria. The output is the selected optimal image. The specific operation includes using a generative AI model to create selection prompts based on the analysis results and then executing the algorithm.

[0382] Step 5:

[0383] The server receives text information entered by the user and analyzes their emotions based on it. It receives text information as input and outputs the emotion analysis results. Specifically, it has a process of analyzing the user's text data using natural language processing techniques and aggregating it as emotion data.

[0384] Step 6:

[0385] The server generates personalized albums based on selected images and analyzed sentiment data. Here, selected images, text information, and news data are used as input, and a customized album is generated as output. Specifically, a generation AI model is used to select a template, and the album is constructed using design software.

[0386] Step 7:

[0387] The server provides the generated album to the user. The user can view the album and share their experience. The input is the generated album data, and the output is the digital album displayed on the user's device. The specific operation involves a process of delivering the album in real time via a web browser or mobile application.

[0388] (Application Example 2)

[0389] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0390] Selecting the most suitable photos from user-submitted images to match specific emotions and atmospheres, and generating a record based on those selections, is not easy. Furthermore, creating albums that enhance emotional value based on images and text has limitations with conventional technologies. There is a need to consider the user's emotions and the atmosphere of the moment, and to collect even more relevant information to create records that deepen individual experiences.

[0391] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0392] In this invention, the server includes data storage means for receiving captured image information, extraction means for selecting the optimal image from the image information, and sentiment analysis means for collecting sentiment information based on the user's voice and comments. This makes it possible to select the optimal image that matches the user's emotions and record that emotion.

[0393] A "data storage means" is a component that has the function of receiving captured image information and storing it for subsequent processing.

[0394] An "extraction means" is a technical device that performs processing to select the optimal image from the received image information.

[0395] "Emotional analysis means" refers to a component that incorporates technology to analyze user voices and comments and collect emotional information.

[0396] A "layout generation means" is a device or system that has the function of customizing the layout of an album or record based on emotional information.

[0397] "Information gathering means" refers to components used to collect information related to a specific date and incorporate it into data.

[0398] "Record generation means" refers to a device or program that generates a record combining extracted images, emotional information, and collected information.

[0399] An "instruction means" is a component that issues commands to output the generated recorded data.

[0400] The system for realizing this invention comprises various modules. First, when a user takes an image using the terminal's camera, the image data is uploaded to the server by a data storage means. At the same time, the terminal also records the user's voice comments and sends them to the server. The server converts the voice data into text using speech recognition technology such as the Google Speech-to-Text API.

[0401] Next, the server analyzes the user's emotions using emotion analysis libraries such as IBM Watson Tone Analyzer. This emotion data is combined with image data that has undergone technical evaluation through image analysis tools such as OpenCV and TensorFlow. Based on the analysis results, the server uses a layout generation mechanism to customize the album page design according to the emotions. Furthermore, it uses an information gathering mechanism to collect news and event information related to a specific date and integrates it as part of the record that reflects the user's emotions.

[0402] As a concrete example, suppose a user takes photos during a family trip and exclaims, "This is fun!" If this voice is analyzed and recognized as an emotion of "joy," the system will select that photo as the best one and generate a bright and vibrant album page. Furthermore, news and event information from the travel destination can be incorporated into the album as added value.

[0403] Examples of prompts using a generative AI model include "Please suggest the optimal album generation method based on sentiment analysis of family photos" and "Please show the steps for designing a system that recommends news based on user sentiment." In this way, it becomes possible to provide memorable records that deeply consider the user's emotions.

[0404] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0405] Step 1:

[0406] The user takes an image using their device and inputs voice comments. The image data and voice data are uploaded from the device to the server. The input is the captured image file and voice file, and the output is the saving of those files on the server.

[0407] Step 2:

[0408] The server uses the Google Speech-to-Text API to convert audio data into text data. The input is an audio file, and the output is a comment in text format. In this process, speech recognition technology is used to analyze the audio signal and convert it into the appropriate text format.

[0409] Step 3:

[0410] The server analyzes the converted text data using IBM Watson Tone Analyzer to identify the user's emotions. The input is a text comment, and the output is emotional information (e.g., joy, excitement). The process involves analyzing the context of the text and the emotional tone.

[0411] Step 4:

[0412] The server uses OpenCV and TensorFlow to perform technical evaluations of image data. The input is an image file, and the output is technical metrics (including resolution and composition). Image analysis evaluates clarity and compositional balance.

[0413] Step 5:

[0414] The server extracts the best image based on emotional information and technical evaluation. This selection process involves scoring based on emotional and technical quality, choosing the image with the highest score. The input is emotional information and technical metrics, and the output is the selected image.

[0415] Step 6:

[0416] Based on the selected images, the album layout is customized to reflect the emotions expressed. The input consists of the selected images and emotional information, while the output is a personalized album layout. The customization process includes selecting color schemes and design elements that match the emotions.

[0417] Step 7:

[0418] This system uses information gathering tools to collect news and event information related to a specific date and incorporates it into an album. The input is date information, and the output is a list of related information. The server automatically scans external information sources and extracts relevant data.

[0419] Step 8:

[0420] The completed album data is visualized by the user and, if necessary, saved to a storage medium or printed. The input is the album layout and related information, and the output is the final album provided to the user. The server visualizes the generated data and processes it according to the user's preferences and printing requests.

[0421] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0422] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0423] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0424] [Third Embodiment]

[0425] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0426] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0427] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0428] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0429] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0430] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0431] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0432] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0433] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0434] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0435] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0436] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0437] This invention relates to a system for automatically managing image data captured by a user and saving it as a physical album. This system operates in cooperation with a server, a terminal, and the user.

[0438] First, users routinely take photos using their devices. These devices have a function to automatically upload the captured image data to a server. The server receives the image data via an internet connection and analyzes its contents.

[0439] The server uses an AI algorithm to select the best-quality photo from among multiple uploaded images. This process evaluates the photos based on several criteria, including composition, smiles, and focus.

[0440] Next, the user enters comments on their device about memories and events related to the photos taken that day. This comment information is sent to the server, which simultaneously automatically collects relevant news from the internet based on that date. Based on this information, the server generates the layout for the album page.

[0441] The generated album data is sent to a partner printing company for printing and binding. Finally, the completed album is shipped from the server to the address specified by the user.

[0442] As a concrete example, suppose a user takes various photos at their child's birthday party. The photos taken using the device are automatically uploaded to a server, which then selects the most memorable moments. The user then enters short comments about their impressions of the party and the fun moments they experienced. In addition, the server collects information about "major local events" as news for the day. Combining this information, the server creates an album documenting the user's special day. This entire process is technically seamless, allowing the user to preserve memories without any hassle.

[0443] The following describes the processing flow.

[0444] Step 1:

[0445] The user takes a photo using the device. The device automatically uploads the captured image data to the management server.

[0446] Step 2:

[0447] The server analyzes the received image data. Using AI algorithms, it evaluates the images based on criteria such as composition, resolution, and facial expressions.

[0448] Step 3:

[0449] The server selects the most suitable card based on the evaluation results. The selection criteria can be customized according to the user's settings.

[0450] Step 4:

[0451] Users enter information about the day's events and comments from their devices. The entered comments are sent to the server.

[0452] Step 5:

[0453] The server automatically collects date-related news data from the internet. This news data, along with comments, is incorporated into the album.

[0454] Step 6:

[0455] The server automatically generates album pages by combining selected photos, user comments, and collected news.

[0456] Step 7:

[0457] The server sends the completed album data to the printing company. The printing company prints and binds the album based on the received data.

[0458] Step 8:

[0459] The server handles the shipping arrangements and prepares the completed album to be delivered to the address specified by the user.

[0460] (Example 1)

[0461] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0462] In the digital age, vast amounts of image data are generated daily, but there is a lack of efficient ways to manage this data and to physically record special moments. Furthermore, manual management and editing by users are time-consuming and laborious, and it is difficult to maintain the desired level of perfection in a printed album that reflects individual sensibilities.

[0463] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0464] In this invention, the server includes a storage means for receiving and storing image information from a shooting device, a selection means for selecting high-quality images based on multiple criteria, and an information addition means for adding user text data. This allows users to automatically generate and save special moments as high-quality albums without having to perform complicated editing tasks.

[0465] A "storage means" is a function for receiving image information from a shooting device and storing it via a network as needed.

[0466] The "selection method" is a function that automatically selects the highest quality image from the received image information based on multiple criteria such as composition, facial expression, and focus.

[0467] The "information addition method" is a function that receives text data entered by the user and incorporates it into the album along with image data.

[0468] A "data collection method" is a function that automatically collects news information related to a specific date and time from the internet.

[0469] The "generation method" is a function that integrates selected image information, text data, and collected news information and automatically constructs them as an album.

[0470] The "print instruction means" is a function that sends instructions to an external printing device to output the generated album onto paper.

[0471] "Delivery method" refers to the function of physically transporting printed albums to a location specified by the user.

[0472] This system automatically generates physical albums using image information, with the server, terminal, and user working in conjunction with each other. Users routinely take images using their terminals, and this image information is automatically uploaded to the server via a dedicated application. This process utilizes a network, ensuring smooth data transmission without requiring any special actions from the user.

[0473] The server analyzes the received image information using AI algorithms and selects high-quality images based on criteria such as composition, facial expression, and focus. AI algorithms include, for example, machine learning models and image recognition technologies. The server leverages these technologies to evaluate multiple images quickly and select the necessary data.

[0474] Next, users can add comments via their device about events and memories related to the images they have taken. This user-entered text information is sent to the server and stored as part of the album content. The server also automatically collects news information related to that date via the internet and uses it as content to add value to the album.

[0475] Finally, the server combines the selected image information, user text information, and collected news information to generate an album. A template is used for album generation, providing a consistent and visually appealing layout. The generated album data is then sent to a partner printing facility and printed on paper. The printed album is delivered to the user's specified address via a delivery service.

[0476] As a concrete example, consider a scenario where a user manages photos taken in a nature park during a holiday on their device. The user can easily upload these images to a server through an application and add comments about the enjoyable moments. Based on these comments, the server collects relevant news and event information from news sources. The resulting album then becomes a physical record that vividly preserves the user's memories.

[0477] An example of a prompt would be, "Please provide an overview of the AI-powered automated photo album generation system. In particular, please explain the image selection criteria and layout generation in detail." Using this prompt, the generating AI model can explain the system's details.

[0478] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0479] Step 1:

[0480] The user uses the device to capture image information. The captured image information is configured to be automatically uploaded to a server by a dedicated application on the device. The input is the captured image file, which is sent to the server via the internet. The output is the image data stored on the server. The device's operation includes transmitting the image data in the appropriate format over the network connection.

[0481] Step 2:

[0482] The server analyzes the received image data using an AI algorithm. Specifically, it evaluates the images based on multiple criteria, such as composition, the subject's smile, and focus accuracy. The input consists of multiple uploaded image data, which are then evaluated by the AI algorithm. The output consists of images deemed to be of high quality. The server's specific operations include calling an AI model for image analysis and performing scoring on each image.

[0483] Step 3:

[0484] Users enter comments about events and memories related to images taken via their device. Input consists of text information entered by the user into the application, which is then sent to the server. Output is text data stored on the server. The device's operation includes the ability to input comments through the user interface and send that data to the server.

[0485] Step 4:

[0486] The server automatically collects news information related to a specific date and time via the internet. The input is date information, and based on this, it retrieves relevant news from online resources. The output is news data to be added to an album. The server's operation involves a process of gathering relevant information for that day using news APIs and web scraping techniques.

[0487] Step 5:

[0488] The server generates an album layout by combining selected image data, user comments, and collected news information. Inputs include selected images, text information, and news information, which are arranged in a consistent layout using templates. The output is digital album data for printing. The server's operation involves arranging content using a template engine and generating the completed album data.

[0489] Step 6:

[0490] The server sends the generated album data to a partner printing facility. The input is the completed album data, which is sent in the optimal format for printing. The output is a physical album ready for printing. The server's operation includes sending the data in the appropriate format according to the printing partner's instructions.

[0491] Step 7:

[0492] The server physically delivers the printed album to the address specified by the user. The input consists of the printed album and the user's delivery address information; the output is the album delivered to the user's specified location. The server's operations include issuing delivery instructions to the shipping company and managing tracking information.

[0493] (Application Example 1)

[0494] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0495] Managing the large amount of visual data captured in daily life, and using that data to create memories, is a time-consuming and laborious task for individual users. Furthermore, integrating detailed information and events related to those memories and saving them in physical form is an even more laborious activity. This project aims to provide a method that can perform these tasks seamlessly by integrating them with home-use cameras and audio input devices.

[0496] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0497] In this invention, the server includes a storage means for receiving visual data recorded by a camera, a voice conversion means for converting voice input from a user into text information, and a data collection means for collecting information related to a specific date. This allows the user to automatically save the captured data in an optimized form and generate an integrated album using individual events and related information.

[0498] A "recording device" is a device that has the function of recording visual data, and includes cameras and the like.

[0499] "Visual data" refers to data such as images and videos recorded by a camera or videographer.

[0500] A "memory device" is a device for storing and retaining received visual data.

[0501] A "selection method" is a function that provides a process for selecting the most suitable data from the received visual data according to specific criteria.

[0502] An "information addition mechanism" is a function that receives user input and incorporates that information into visual data.

[0503] "Data collection means" refers to the function of collecting information related to a specific date or event using the internet or other information sources.

[0504] A "generation means" is a device that provides a process for creating an integrated album using selected visual data and collected information.

[0505] The "instruction means" refers to a function that executes the printing instructions for the generated album.

[0506] "Delivery method" refers to the process or equipment used to deliver printed physical albums to a designated location.

[0507] A "voice conversion device" is a device that has the function of converting the user's voice input into text information and digitizing it.

[0508] The system for realizing this invention provides a process for efficiently managing and saving visual data captured by users in their daily lives as an album. Specifically, users routinely capture visual data with a camera device such as a smartphone, and this data is automatically uploaded to a server. This process is realized through an application built into the camera device and a Wi-Fi connection function.

[0509] The server stores the received visual data in a memory device. Furthermore, it has a selection mechanism that uses an AI algorithm to evaluate the visual data and select the best image. This evaluation is performed using an AI framework such as TensorFlow and is based on multiple criteria such as focus, composition, and the subject's facial expression.

[0510] The user sends comments related to the captured data to the server using a voice input device. The server converts this into text information using a voice conversion means and collects information related to that date from the internet via a data collection means.

[0511] The server automatically generates the album layout using a generation mechanism, utilizing selected visual data, text information, and collected related information. The generated album data is sent to a partner printing company via a print instruction mechanism. The printed album is then delivered to the user's specified address using a delivery mechanism.

[0512] As a concrete example, if a user records their family's daily life or special events as visual data and adds comments via voice input, this data will be integrated with local news from the same day and compiled into an album of memories of that special day. An example of a prompt is as follows: "Please select a beautiful landscape photo taken by your home robot that is in focus and shows everyone in the family smiling, and combine it with your comments and today's news to create an album of this special day."

[0513] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0514] Step 1:

[0515] The user captures visual data using their smartphone's camera. After capture, an application on the smartphone automatically uploads the visual data to the server. The input is visual data, and the output is storage on the server's storage device. Wi-Fi is used for stable and efficient data transfer during this process.

[0516] Step 2:

[0517] The server stores the received visual data in a storage device. The storage device holds the data and stores it in a format suitable for subsequent AI processing. The input is the uploaded visual data, and the output is the stored visual data.

[0518] Step 3:

[0519] The server analyzes stored visual data using a selection mechanism. An AI algorithm is used to evaluate the visual data and select the best image. Factors such as focus, composition, and subject expression are considered during this process. The input is the stored visual data, and the output is the selected, optimal image.

[0520] Step 4:

[0521] Users provide feedback and comments related to visual data using their smartphone's voice input function. This voice data is sent to a server and converted into text information using a speech-to-text conversion tool. The input is voice data, and the output is text information.

[0522] Step 5:

[0523] The server collects information related to a specific date from the internet using data collection methods. It retrieves relevant news and event information to enrich the album's content. The input is date information, and the output is the collected related information.

[0524] Step 6:

[0525] The server combines selected visual data, text information, and collected related information to create album layouts using a generation mechanism. This process is automated, providing efficient and consistent layouts. The inputs are selected visual data, text information, and related information, and the output is album layout data.

[0526] Step 7:

[0527] The server sends the generated album data to the printing company via a print instruction system. It instructs the physical printing of the album and prepares it for delivery after completion. The input is the album layout data, and the output is the instruction to the printing company.

[0528] Step 8:

[0529] The server delivers the printed albums to the user's specified address using a delivery service. It manages the delivery process and ensures the albums reach the user. The input is the physical albums, and the output is the completion of delivery to the user.

[0530] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0531] This invention relates to a system that optimizes the selection of captured image data and album generation by utilizing an emotion engine that recognizes user emotions. Users routinely take photos using their devices, and this data is uploaded to a server. The server analyzes the received image data using a combination of an AI algorithm and the emotion engine.

[0532] In analyzing image data, the server considers not only technical aspects such as composition and resolution, but also the user's emotions. The emotion engine detects the user's emotions based on factors such as the tone of voice when the photo was taken, comments entered after the photo was taken, and other biometric data. Based on this emotion data, it selects the image that best reflects the user's feelings.

[0533] Furthermore, the sentiment engine analyzes the comments entered by users to detect what emotions are being expressed. This allows the server to customize the album page layout according to the emotions. For example, if there are many happy emotions, a bright design will be chosen, and if there are many calm emotions, a simple and quiet design will be adopted.

[0534] The emotion engine is also useful for collecting news data. It can take into account the user's emotional state, allowing for customization such as prioritizing the collection of positive news when the user is feeling emotionally uplifted.

[0535] As a concrete example, consider a scenario where a user takes photos to record how their family spends their holidays. The device simultaneously records a voice memo when taking the photo, and the server uses this to analyze the user's emotions. Based on this emotional data, the server selects the most impactful photo and generates an album page that expresses positive emotions. Furthermore, by collecting and incorporating relevant news, such as feature articles and local event information, into the album, the user's memories are preserved in a richer way.

[0536] This system aims to enhance users' memories by pursuing perfection in both technical and emotional dimensions.

[0537] The following describes the processing flow.

[0538] Step 1:

[0539] The user takes a photo using the device. The device collects the photo data along with voice memos and biometric information, and uploads these to the management server.

[0540] Step 2:

[0541] The server receives image data, voice memos, and biometric information, and analyzes the user's emotions based on this data. The emotion engine analyzes voice tone, content, and biometric data to identify emotions.

[0542] Step 3:

[0543] Based on the results of sentiment analysis, the server uses an AI algorithm to select the photo that best reflects the user's emotions from among the uploaded photos.

[0544] Step 4:

[0545] The user enters comments on their device about memories and events related to the photos. The entered comments are sent to the server.

[0546] Step 5:

[0547] The server receives comments, and the emotion engine analyzes their content to extract emotions from them. Based on this information, the album page design is customized with a style that matches the emotion.

[0548] Step 6:

[0549] The server automatically collects news data related to a specific date from the internet. During this process, it takes into account the user's emotional state and prioritizes collecting news that is highly relevant.

[0550] Step 7:

[0551] The server automatically generates album pages by combining selected photos, emotion-based layouts, and collected news data.

[0552] Step 8:

[0553] The server sends the generated album data to the printing company. The printing company prints and binds the album based on the received data.

[0554] Step 9:

[0555] The server handles the shipping arrangements and prepares the completed album to be delivered to the address specified by the user.

[0556] (Example 2)

[0557] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0558] Currently, organizing and saving captured image data in a way that suits the user is not easy, and creating albums that take emotional elements into account is particularly difficult. Furthermore, the process of users selecting the best photo from a large number of images is time-consuming, so an efficient method is needed. In addition, collecting and integrating date-related news data based on personal emotions is also a challenge.

[0559] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0560] In this invention, the server includes a storage means for receiving and storing captured image data, an emotion analysis means for analyzing audio data related to the image data and extracting emotion data, a selection means for selecting the optimal image based on the emotion data, an information addition means for receiving text information based on user input and analyzing the emotion, a data collection means for collecting news data related to a specific date and emotion data, and a provision means for providing the generated album in digital format. This makes it possible to automatically select the optimal image in a way that resonates with the user's emotions and efficiently generate and provide personalized albums and related information.

[0561] A "storage device" is a device or function for receiving and storing information such as captured image data.

[0562] "Emotional analysis means" refers to a device or function for analyzing voice data or text information to extract user emotional data.

[0563] "Selection means" refers to a device or function for automatically selecting the optimal image based on analyzed emotion data.

[0564] "Information addition means" refers to a device or function that receives text information provided by a user, analyzes that information, and detects emotional characteristics.

[0565] "Data collection means" refers to a device or function for collecting relevant news data based on a specific date or analyzed sentiment data.

[0566] "Generation means" refers to a device or function for generating an album by combining selected images, text information, and news data.

[0567] "Means of provision" refers to a device or function for providing the generated album to the user in digital format.

[0568] This invention relates to a system for efficiently managing captured image data and generating albums tailored to the user's needs. Users routinely take photos and save the data to a device. This device also includes a function for recording voice memos at the time of shooting. The photos taken by the user are uploaded to a server via the device, and the server performs analysis based on this information.

[0569] The server first has a storage mechanism for saving received image data. Next, it uses emotion analysis to detect the user's emotions from the audio data. This is achieved using general-purpose speech analysis software. A general-purpose speech recognition API can be used for speech analysis. Furthermore, the server uses an AI algorithm to evaluate the technical quality of the images. Image processing is performed quickly and efficiently by utilizing a GPU.

[0570] Based on sentiment analysis and image evaluation results, the server selects the most suitable image. At this stage, a generative AI model can be used to highly customize the selection criteria. Album generation is then performed based on the selected image and the text information entered by the user. The generative AI model analyzes prompt text containing the text information and is used to design an album layout that meets the user's preferences. For example, if a bright design theme is used, visually impactful design software is employed.

[0571] Furthermore, the server collects news data according to the user's emotional state. When the user's emotions are positive, news with positive content is prioritized and incorporated as relevant information in the album. This enriches the user's experience.

[0572] For example, if a user wants to record memories from a family trip, the server can automatically handle all these steps and generate an album that aggregates photos and information that contain many positive emotions. An example of a prompt would be, "Take photos of your family holiday and create an album that reflects the fun and joy it contained."

[0573] This allows users to easily save and relive special memories in a way that resonates with their emotions, without having to go through cumbersome selection and editing processes.

[0574] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0575] Step 1:

[0576] The device acquires image data captured by the user. Simultaneously with the capture, it records a voice memo and uploads this data to the server. The input image and voice data are sent to the server, and the server receives this data as output. Specifically, the device detects clicks through a camera application and simultaneously records and transmits images and audio.

[0577] Step 2:

[0578] The server analyzes the received image data. It receives image data as input, uses an AI algorithm to evaluate composition, resolution, and other factors, and analyzes the technical quality of the image. The output consists of evaluation results and metadata. Specifically, the server uses a GPU-based image processing engine to analyze the data and calculate results at high speed.

[0579] Step 3:

[0580] The server analyzes the audio data to extract the user's emotional data. This is done by using the audio data as input and analyzing emotions from voice tone and language patterns using emotion analysis tools. The output is the identified emotional data. Specifically, the server uses a speech recognition API to convert the audio data into text, and then uses an emotion AI model to classify the emotions.

[0581] Step 4:

[0582] The server integrates image evaluation and sentiment data to select the optimal image. The selection algorithm operates based on the input evaluation results and sentiment data, which serve as selection criteria. The output is the selected optimal image. The specific operation includes using a generative AI model to create selection prompts based on the analysis results and then executing the algorithm.

[0583] Step 5:

[0584] The server receives text information entered by the user and analyzes their emotions based on it. It receives text information as input and outputs the emotion analysis results. Specifically, it has a process of analyzing the user's text data using natural language processing techniques and aggregating it as emotion data.

[0585] Step 6:

[0586] The server generates personalized albums based on selected images and analyzed sentiment data. Here, selected images, text information, and news data are used as input, and a customized album is generated as output. Specifically, a generation AI model is used to select a template, and the album is constructed using design software.

[0587] Step 7:

[0588] The server provides the generated album to the user. The user can view the album and share their experience. The input is the generated album data, and the output is the digital album displayed on the user's device. The specific operation involves a process of delivering the album in real time via a web browser or mobile application.

[0589] (Application Example 2)

[0590] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0591] Selecting the most suitable photos from user-submitted images to match specific emotions and atmospheres, and generating a record based on those selections, is not easy. Furthermore, creating albums that enhance emotional value based on images and text has limitations with conventional technologies. There is a need to consider the user's emotions and the atmosphere of the moment, and to collect even more relevant information to create records that deepen individual experiences.

[0592] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0593] In this invention, the server includes data storage means for receiving captured image information, extraction means for selecting the optimal image from the image information, and sentiment analysis means for collecting sentiment information based on the user's voice and comments. This makes it possible to select the optimal image that matches the user's emotions and record that emotion.

[0594] A "data storage means" is a component that has the function of receiving captured image information and storing it for subsequent processing.

[0595] An "extraction means" is a technical device that performs processing to select the optimal image from the received image information.

[0596] "Emotional analysis means" refers to a component that incorporates technology to analyze user voices and comments and collect emotional information.

[0597] A "layout generation means" is a device or system that has the function of customizing the layout of an album or record based on emotional information.

[0598] "Information gathering means" refers to components used to collect information related to a specific date and incorporate it into data.

[0599] "Record generation means" refers to a device or program that generates a record combining extracted images, emotional information, and collected information.

[0600] An "instruction means" is a component that issues commands to output the generated recorded data.

[0601] The system for realizing this invention comprises various modules. First, when a user takes an image using the terminal's camera, the image data is uploaded to the server by a data storage means. At the same time, the terminal also records the user's voice comments and sends them to the server. The server converts the voice data into text using speech recognition technology such as the Google Speech-to-Text API.

[0602] Next, the server analyzes the user's emotions using emotion analysis libraries such as IBM Watson Tone Analyzer. This emotion data is combined with image data that has undergone technical evaluation through image analysis tools such as OpenCV and TensorFlow. Based on the analysis results, the server uses a layout generation mechanism to customize the album page design according to the emotions. Furthermore, it uses an information gathering mechanism to collect news and event information related to a specific date and integrates it as part of the record that reflects the user's emotions.

[0603] As a concrete example, suppose a user takes photos during a family trip and exclaims, "This is fun!" If this voice is analyzed and recognized as an emotion of "joy," the system will select that photo as the best one and generate a bright and vibrant album page. Furthermore, news and event information from the travel destination can be incorporated into the album as added value.

[0604] Examples of prompts using a generative AI model include "Please suggest the optimal album generation method based on sentiment analysis of family photos" and "Please show the steps for designing a system that recommends news based on user sentiment." In this way, it becomes possible to provide memorable records that deeply consider the user's emotions.

[0605] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0606] Step 1:

[0607] The user takes an image using their device and inputs voice comments. The image data and voice data are uploaded from the device to the server. The input is the captured image file and voice file, and the output is the saving of those files on the server.

[0608] Step 2:

[0609] The server uses the Google Speech-to-Text API to convert audio data into text data. The input is an audio file, and the output is a comment in text format. In this process, speech recognition technology is used to analyze the audio signal and convert it into the appropriate text format.

[0610] Step 3:

[0611] The server analyzes the converted text data using IBM Watson Tone Analyzer to identify the user's emotions. The input is a text comment, and the output is emotional information (e.g., joy, excitement). The process involves analyzing the context of the text and the emotional tone.

[0612] Step 4:

[0613] The server uses OpenCV and TensorFlow to perform technical evaluations of image data. The input is an image file, and the output is technical metrics (including resolution and composition). Image analysis evaluates clarity and compositional balance.

[0614] Step 5:

[0615] The server extracts the best image based on emotional information and technical evaluation. This selection process involves scoring based on emotional and technical quality, choosing the image with the highest score. The input is emotional information and technical metrics, and the output is the selected image.

[0616] Step 6:

[0617] Based on the selected images, the album layout is customized to reflect the emotions expressed. The input consists of the selected images and emotional information, while the output is a personalized album layout. The customization process includes selecting color schemes and design elements that match the emotions.

[0618] Step 7:

[0619] This system uses information gathering tools to collect news and event information related to a specific date and incorporates it into an album. The input is date information, and the output is a list of related information. The server automatically scans external information sources and extracts relevant data.

[0620] Step 8:

[0621] The completed album data is visualized by the user and, if necessary, saved to a storage medium or printed. The input is the album layout and related information, and the output is the final album provided to the user. The server visualizes the generated data and processes it according to the user's preferences and printing requests.

[0622] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0623] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0624] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0625] [Fourth Embodiment]

[0626] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0627] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0628] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0629] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0630] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0631] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0632] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0633] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0634] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0635] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0636] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0637] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0638] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0639] This invention relates to a system for automatically managing image data captured by a user and saving it as a physical album. This system operates in cooperation with a server, a terminal, and the user.

[0640] First, users routinely take photos using their devices. These devices have a function to automatically upload the captured image data to a server. The server receives the image data via an internet connection and analyzes its contents.

[0641] The server uses an AI algorithm to select the best-quality photo from among multiple uploaded images. This process evaluates the photos based on several criteria, including composition, smiles, and focus.

[0642] Next, the user enters comments on their device about memories and events related to the photos taken that day. This comment information is sent to the server, which simultaneously automatically collects relevant news from the internet based on that date. Based on this information, the server generates the layout for the album page.

[0643] The generated album data is sent to a partner printing company for printing and binding. Finally, the completed album is shipped from the server to the address specified by the user.

[0644] As a concrete example, suppose a user takes various photos at their child's birthday party. The photos taken using the device are automatically uploaded to a server, which then selects the most memorable moments. The user then enters short comments about their impressions of the party and the fun moments they experienced. In addition, the server collects information about "major local events" as news for the day. Combining this information, the server creates an album documenting the user's special day. This entire process is technically seamless, allowing the user to preserve memories without any hassle.

[0645] The following describes the processing flow.

[0646] Step 1:

[0647] The user takes a photo using the device. The device automatically uploads the captured image data to the management server.

[0648] Step 2:

[0649] The server analyzes the received image data. Using AI algorithms, it evaluates the images based on criteria such as composition, resolution, and facial expressions.

[0650] Step 3:

[0651] The server selects the most suitable card based on the evaluation results. The selection criteria can be customized according to the user's settings.

[0652] Step 4:

[0653] Users enter information about the day's events and comments from their devices. The entered comments are sent to the server.

[0654] Step 5:

[0655] The server automatically collects date-related news data from the internet. This news data, along with comments, is incorporated into the album.

[0656] Step 6:

[0657] The server automatically generates album pages by combining selected photos, user comments, and collected news.

[0658] Step 7:

[0659] The server sends the completed album data to the printing company. The printing company prints and binds the album based on the received data.

[0660] Step 8:

[0661] The server handles the shipping arrangements and prepares the completed album to be delivered to the address specified by the user.

[0662] (Example 1)

[0663] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0664] In the digital age, vast amounts of image data are generated daily, but there is a lack of efficient ways to manage this data and to physically record special moments. Furthermore, manual management and editing by users are time-consuming and laborious, and it is difficult to maintain the desired level of perfection in a printed album that reflects individual sensibilities.

[0665] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0666] In this invention, the server includes a storage means for receiving and storing image information from a shooting device, a selection means for selecting high-quality images based on multiple criteria, and an information addition means for adding user text data. This allows users to automatically generate and save special moments as high-quality albums without having to perform complicated editing tasks.

[0667] A "storage means" is a function for receiving image information from a shooting device and storing it via a network as needed.

[0668] The "selection method" is a function that automatically selects the highest quality image from the received image information based on multiple criteria such as composition, facial expression, and focus.

[0669] The "information addition method" is a function that receives text data entered by the user and incorporates it into the album along with image data.

[0670] A "data collection method" is a function that automatically collects news information related to a specific date and time from the internet.

[0671] The "generation method" is a function that integrates selected image information, text data, and collected news information and automatically constructs them as an album.

[0672] The "print instruction means" is a function that sends instructions to an external printing device to output the generated album onto paper.

[0673] "Delivery method" refers to the function of physically transporting printed albums to a location specified by the user.

[0674] This system automatically generates physical albums using image information, with the server, terminal, and user working in conjunction with each other. Users routinely take images using their terminals, and this image information is automatically uploaded to the server via a dedicated application. This process utilizes a network, ensuring smooth data transmission without requiring any special actions from the user.

[0675] The server analyzes the received image information using AI algorithms and selects high-quality images based on criteria such as composition, facial expression, and focus. AI algorithms include, for example, machine learning models and image recognition technologies. The server leverages these technologies to evaluate multiple images quickly and select the necessary data.

[0676] Next, users can add comments via their device about events and memories related to the images they have taken. This user-entered text information is sent to the server and stored as part of the album content. The server also automatically collects news information related to that date via the internet and uses it as content to add value to the album.

[0677] Finally, the server combines the selected image information, user text information, and collected news information to generate an album. A template is used for album generation, providing a consistent and visually appealing layout. The generated album data is then sent to a partner printing facility and printed on paper. The printed album is delivered to the user's specified address via a delivery service.

[0678] As a concrete example, consider a scenario where a user manages photos taken in a nature park during a holiday on their device. The user can easily upload these images to a server through an application and add comments about the enjoyable moments. Based on these comments, the server collects relevant news and event information from news sources. The resulting album then becomes a physical record that vividly preserves the user's memories.

[0679] An example of a prompt would be, "Please provide an overview of the AI-powered automated photo album generation system. In particular, please explain the image selection criteria and layout generation in detail." Using this prompt, the generating AI model can explain the system's details.

[0680] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0681] Step 1:

[0682] The user uses the device to capture image information. The captured image information is configured to be automatically uploaded to a server by a dedicated application on the device. The input is the captured image file, which is sent to the server via the internet. The output is the image data stored on the server. The device's operation includes transmitting the image data in the appropriate format over the network connection.

[0683] Step 2:

[0684] The server analyzes the received image data using an AI algorithm. Specifically, it evaluates the images based on multiple criteria, such as composition, the subject's smile, and focus accuracy. The input consists of multiple uploaded image data, which are then evaluated by the AI algorithm. The output consists of images deemed to be of high quality. The server's specific operations include calling an AI model for image analysis and performing scoring on each image.

[0685] Step 3:

[0686] Users enter comments about events and memories related to images taken via their device. Input consists of text information entered by the user into the application, which is then sent to the server. Output is text data stored on the server. The device's operation includes the ability to input comments through the user interface and send that data to the server.

[0687] Step 4:

[0688] The server automatically collects news information related to a specific date and time via the internet. The input is date information, and based on this, it retrieves relevant news from online resources. The output is news data to be added to an album. The server's operation involves a process of gathering relevant information for that day using news APIs and web scraping techniques.

[0689] Step 5:

[0690] The server generates an album layout by combining selected image data, user comments, and collected news information. Inputs include selected images, text information, and news information, which are arranged in a consistent layout using templates. The output is digital album data for printing. The server's operation involves arranging content using a template engine and generating the completed album data.

[0691] Step 6:

[0692] The server sends the generated album data to a partner printing facility. The input is the completed album data, which is sent in the optimal format for printing. The output is a physical album ready for printing. The server's operation includes sending the data in the appropriate format according to the printing partner's instructions.

[0693] Step 7:

[0694] The server physically delivers the printed album to the address specified by the user. The input consists of the printed album and the user's delivery address information; the output is the album delivered to the user's specified location. The server's operations include issuing delivery instructions to the shipping company and managing tracking information.

[0695] (Application Example 1)

[0696] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0697] Managing the large amount of visual data captured in daily life, and using that data to create memories, is a time-consuming and laborious task for individual users. Furthermore, integrating detailed information and events related to those memories and saving them in physical form is an even more laborious activity. This project aims to provide a method that can perform these tasks seamlessly by integrating them with home-use cameras and audio input devices.

[0698] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0699] In this invention, the server includes a storage means for receiving visual data recorded by a camera, a voice conversion means for converting voice input from a user into text information, and a data collection means for collecting information related to a specific date. This allows the user to automatically save the captured data in an optimized form and generate an integrated album using individual events and related information.

[0700] A "recording device" is a device that has the function of recording visual data, and includes cameras and the like.

[0701] "Visual data" refers to data such as images and videos recorded by a camera or videographer.

[0702] A "memory device" is a device for storing and retaining received visual data.

[0703] A "selection method" is a function that provides a process for selecting the most suitable data from the received visual data according to specific criteria.

[0704] An "information addition mechanism" is a function that receives user input and incorporates that information into visual data.

[0705] "Data collection means" refers to the function of collecting information related to a specific date or event using the internet or other information sources.

[0706] A "generation means" is a device that provides a process for creating an integrated album using selected visual data and collected information.

[0707] The "instruction means" refers to a function that executes the printing instructions for the generated album.

[0708] "Delivery method" refers to the process or equipment used to deliver printed physical albums to a designated location.

[0709] A "voice conversion device" is a device that has the function of converting the user's voice input into text information and digitizing it.

[0710] The system for realizing this invention provides a process for efficiently managing and saving visual data captured by users in their daily lives as an album. Specifically, users routinely capture visual data with a camera device such as a smartphone, and this data is automatically uploaded to a server. This process is realized through an application built into the camera device and a Wi-Fi connection function.

[0711] The server stores the received visual data in a memory device. Furthermore, it has a selection mechanism that uses an AI algorithm to evaluate the visual data and select the best image. This evaluation is performed using an AI framework such as TensorFlow and is based on multiple criteria such as focus, composition, and the subject's facial expression.

[0712] The user sends comments related to the captured data to the server using a voice input device. The server converts this into text information using a voice conversion means and collects information related to that date from the internet via a data collection means.

[0713] The server automatically generates the album layout using a generation mechanism, utilizing selected visual data, text information, and collected related information. The generated album data is sent to a partner printing company via a print instruction mechanism. The printed album is then delivered to the user's specified address using a delivery mechanism.

[0714] As a concrete example, if a user records their family's daily life or special events as visual data and adds comments via voice input, this data will be integrated with local news from the same day and compiled into an album of memories of that special day. An example of a prompt is as follows: "Please select a beautiful landscape photo taken by your home robot that is in focus and shows everyone in the family smiling, and combine it with your comments and today's news to create an album of this special day."

[0715] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0716] Step 1:

[0717] The user captures visual data using their smartphone's camera. After capture, an application on the smartphone automatically uploads the visual data to the server. The input is visual data, and the output is storage on the server's storage device. Wi-Fi is used for stable and efficient data transfer during this process.

[0718] Step 2:

[0719] The server stores the received visual data in a storage device. The storage device holds the data and stores it in a format suitable for subsequent AI processing. The input is the uploaded visual data, and the output is the stored visual data.

[0720] Step 3:

[0721] The server analyzes stored visual data using a selection mechanism. An AI algorithm is used to evaluate the visual data and select the best image. Factors such as focus, composition, and subject expression are considered during this process. The input is the stored visual data, and the output is the selected, optimal image.

[0722] Step 4:

[0723] Users provide feedback and comments related to visual data using their smartphone's voice input function. This voice data is sent to a server and converted into text information using a speech-to-text conversion tool. The input is voice data, and the output is text information.

[0724] Step 5:

[0725] The server collects information related to a specific date from the internet using data collection methods. It retrieves relevant news and event information to enrich the album's content. The input is date information, and the output is the collected related information.

[0726] Step 6:

[0727] The server combines selected visual data, text information, and collected related information to create album layouts using a generation mechanism. This process is automated, providing efficient and consistent layouts. The inputs are selected visual data, text information, and related information, and the output is album layout data.

[0728] Step 7:

[0729] The server sends the generated album data to the printing company via a print instruction system. It instructs the physical printing of the album and prepares it for delivery after completion. The input is the album layout data, and the output is the instruction to the printing company.

[0730] Step 8:

[0731] The server delivers the printed albums to the user's specified address using a delivery service. It manages the delivery process and ensures the albums reach the user. The input is the physical albums, and the output is the completion of delivery to the user.

[0732] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0733] This invention relates to a system that optimizes the selection of captured image data and album generation by utilizing an emotion engine that recognizes user emotions. Users routinely take photos using their devices, and this data is uploaded to a server. The server analyzes the received image data using a combination of an AI algorithm and the emotion engine.

[0734] In analyzing image data, the server considers not only technical aspects such as composition and resolution, but also the user's emotions. The emotion engine detects the user's emotions based on factors such as the tone of voice when the photo was taken, comments entered after the photo was taken, and other biometric data. Based on this emotion data, it selects the image that best reflects the user's feelings.

[0735] Furthermore, the sentiment engine analyzes the comments entered by users to detect what emotions are being expressed. This allows the server to customize the album page layout according to the emotions. For example, if there are many happy emotions, a bright design will be chosen, and if there are many calm emotions, a simple and quiet design will be adopted.

[0736] The emotion engine is also useful for collecting news data. It can take into account the user's emotional state, allowing for customization such as prioritizing the collection of positive news when the user is feeling emotionally uplifted.

[0737] As a concrete example, consider a scenario where a user takes photos to record how their family spends their holidays. The device simultaneously records a voice memo when taking the photo, and the server uses this to analyze the user's emotions. Based on this emotional data, the server selects the most impactful photo and generates an album page that expresses positive emotions. Furthermore, by collecting and incorporating relevant news, such as feature articles and local event information, into the album, the user's memories are preserved in a richer way.

[0738] This system aims to enhance users' memories by pursuing perfection in both technical and emotional dimensions.

[0739] The following describes the processing flow.

[0740] Step 1:

[0741] The user takes a photo using the device. The device collects the photo data along with voice memos and biometric information, and uploads these to the management server.

[0742] Step 2:

[0743] The server receives image data, voice memos, and biometric information, and analyzes the user's emotions based on this data. The emotion engine analyzes voice tone, content, and biometric data to identify emotions.

[0744] Step 3:

[0745] Based on the results of sentiment analysis, the server uses an AI algorithm to select the photo that best reflects the user's emotions from among the uploaded photos.

[0746] Step 4:

[0747] The user enters comments on their device about memories and events related to the photos. The entered comments are sent to the server.

[0748] Step 5:

[0749] The server receives comments, and the emotion engine analyzes their content to extract emotions from them. Based on this information, the album page design is customized with a style that matches the emotion.

[0750] Step 6:

[0751] The server automatically collects news data related to a specific date from the internet. During this process, it takes into account the user's emotional state and prioritizes collecting news that is highly relevant.

[0752] Step 7:

[0753] The server automatically generates album pages by combining selected photos, emotion-based layouts, and collected news data.

[0754] Step 8:

[0755] The server sends the generated album data to the printing company. The printing company prints and binds the album based on the received data.

[0756] Step 9:

[0757] The server handles the shipping arrangements and prepares the completed album to be delivered to the address specified by the user.

[0758] (Example 2)

[0759] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0760] Currently, organizing and saving captured image data in a way that suits the user is not easy, and creating albums that take emotional elements into account is particularly difficult. Furthermore, the process of users selecting the best photo from a large number of images is time-consuming, so an efficient method is needed. In addition, collecting and integrating date-related news data based on personal emotions is also a challenge.

[0761] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0762] In this invention, the server includes a storage means for receiving and storing captured image data, an emotion analysis means for analyzing audio data related to the image data and extracting emotion data, a selection means for selecting the optimal image based on the emotion data, an information addition means for receiving text information based on user input and analyzing the emotion, a data collection means for collecting news data related to a specific date and emotion data, and a provision means for providing the generated album in digital format. This makes it possible to automatically select the optimal image in a way that resonates with the user's emotions and efficiently generate and provide personalized albums and related information.

[0763] A "storage device" is a device or function for receiving and storing information such as captured image data.

[0764] "Emotional analysis means" refers to a device or function for analyzing voice data or text information to extract user emotional data.

[0765] "Selection means" refers to a device or function for automatically selecting the optimal image based on analyzed emotion data.

[0766] "Information addition means" refers to a device or function that receives text information provided by a user, analyzes that information, and detects emotional characteristics.

[0767] "Data collection means" refers to a device or function for collecting relevant news data based on a specific date or analyzed sentiment data.

[0768] "Generation means" refers to a device or function for generating an album by combining selected images, text information, and news data.

[0769] "Means of provision" refers to a device or function for providing the generated album to the user in digital format.

[0770] This invention relates to a system for efficiently managing captured image data and generating albums tailored to the user's needs. Users routinely take photos and save the data to a device. This device also includes a function for recording voice memos at the time of shooting. The photos taken by the user are uploaded to a server via the device, and the server performs analysis based on this information.

[0771] The server first has a storage mechanism for saving received image data. Next, it uses emotion analysis to detect the user's emotions from the audio data. This is achieved using general-purpose speech analysis software. A general-purpose speech recognition API can be used for speech analysis. Furthermore, the server uses an AI algorithm to evaluate the technical quality of the images. Image processing is performed quickly and efficiently by utilizing a GPU.

[0772] Based on sentiment analysis and image evaluation results, the server selects the most suitable image. At this stage, a generative AI model can be used to highly customize the selection criteria. Album generation is then performed based on the selected image and the text information entered by the user. The generative AI model analyzes prompt text containing the text information and is used to design an album layout that meets the user's preferences. For example, if a bright design theme is used, visually impactful design software is employed.

[0773] Furthermore, the server collects news data according to the user's emotional state. When the user's emotions are positive, news with positive content is prioritized and incorporated as relevant information in the album. This enriches the user's experience.

[0774] For example, if a user wants to record memories from a family trip, the server can automatically handle all these steps and generate an album that aggregates photos and information that contain many positive emotions. An example of a prompt would be, "Take photos of your family holiday and create an album that reflects the fun and joy it contained."

[0775] This allows users to easily save and relive special memories in a way that resonates with their emotions, without having to go through cumbersome selection and editing processes.

[0776] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0777] Step 1:

[0778] The device acquires image data captured by the user. Simultaneously with the capture, it records a voice memo and uploads this data to the server. The input image and voice data are sent to the server, and the server receives this data as output. Specifically, the device detects clicks through a camera application and simultaneously records and transmits images and audio.

[0779] Step 2:

[0780] The server analyzes the received image data. It receives image data as input, uses an AI algorithm to evaluate composition, resolution, and other factors, and analyzes the technical quality of the image. The output consists of evaluation results and metadata. Specifically, the server uses a GPU-based image processing engine to analyze the data and calculate results at high speed.

[0781] Step 3:

[0782] The server analyzes the audio data to extract the user's emotional data. This is done by using the audio data as input and analyzing emotions from voice tone and language patterns using emotion analysis tools. The output is the identified emotional data. Specifically, the server uses a speech recognition API to convert the audio data into text, and then uses an emotion AI model to classify the emotions.

[0783] Step 4:

[0784] The server integrates image evaluation and sentiment data to select the optimal image. The selection algorithm operates based on the input evaluation results and sentiment data, which serve as selection criteria. The output is the selected optimal image. The specific operation includes using a generative AI model to create selection prompts based on the analysis results and then executing the algorithm.

[0785] Step 5:

[0786] The server receives text information entered by the user and analyzes their emotions based on it. It receives text information as input and outputs the emotion analysis results. Specifically, it has a process of analyzing the user's text data using natural language processing techniques and aggregating it as emotion data.

[0787] Step 6:

[0788] The server generates personalized albums based on selected images and analyzed sentiment data. Here, selected images, text information, and news data are used as input, and a customized album is generated as output. Specifically, a generation AI model is used to select a template, and the album is constructed using design software.

[0789] Step 7:

[0790] The server provides the generated album to the user. The user can view the album and share their experience. The input is the generated album data, and the output is the digital album displayed on the user's device. The specific operation involves a process of delivering the album in real time via a web browser or mobile application.

[0791] (Application Example 2)

[0792] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0793] Selecting the most suitable photos from user-submitted images to match specific emotions and atmospheres, and generating a record based on those selections, is not easy. Furthermore, creating albums that enhance emotional value based on images and text has limitations with conventional technologies. There is a need to consider the user's emotions and the atmosphere of the moment, and to collect even more relevant information to create records that deepen individual experiences.

[0794] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0795] In this invention, the server includes data storage means for receiving captured image information, extraction means for selecting the optimal image from the image information, and sentiment analysis means for collecting sentiment information based on the user's voice and comments. This makes it possible to select the optimal image that matches the user's emotions and record that emotion.

[0796] A "data storage means" is a component that has the function of receiving captured image information and storing it for subsequent processing.

[0797] An "extraction means" is a technical device that performs processing to select the optimal image from the received image information.

[0798] "Emotional analysis means" refers to a component that incorporates technology to analyze user voices and comments and collect emotional information.

[0799] A "layout generation means" is a device or system that has the function of customizing the layout of an album or record based on emotional information.

[0800] "Information gathering means" refers to components used to collect information related to a specific date and incorporate it into data.

[0801] "Record generation means" refers to a device or program that generates a record combining extracted images, emotional information, and collected information.

[0802] An "instruction means" is a component that issues commands to output the generated recorded data.

[0803] The system for realizing this invention comprises various modules. First, when a user takes an image using the terminal's camera, the image data is uploaded to the server by a data storage means. At the same time, the terminal also records the user's voice comments and sends them to the server. The server converts the voice data into text using speech recognition technology such as the Google Speech-to-Text API.

[0804] Next, the server analyzes the user's emotions using emotion analysis libraries such as IBM Watson Tone Analyzer. This emotion data is combined with image data that has undergone technical evaluation through image analysis tools such as OpenCV and TensorFlow. Based on the analysis results, the server uses a layout generation mechanism to customize the album page design according to the emotions. Furthermore, it uses an information gathering mechanism to collect news and event information related to a specific date and integrates it as part of the record that reflects the user's emotions.

[0805] As a concrete example, suppose a user takes photos during a family trip and exclaims, "This is fun!" If this voice is analyzed and recognized as an emotion of "joy," the system will select that photo as the best one and generate a bright and vibrant album page. Furthermore, news and event information from the travel destination can be incorporated into the album as added value.

[0806] Examples of prompts using a generative AI model include "Please suggest the optimal album generation method based on sentiment analysis of family photos" and "Please show the steps for designing a system that recommends news based on user sentiment." In this way, it becomes possible to provide memorable records that deeply consider the user's emotions.

[0807] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0808] Step 1:

[0809] The user takes an image using their device and inputs voice comments. The image data and voice data are uploaded from the device to the server. The input is the captured image file and voice file, and the output is the saving of those files on the server.

[0810] Step 2:

[0811] The server uses the Google Speech-to-Text API to convert audio data into text data. The input is an audio file, and the output is a comment in text format. In this process, speech recognition technology is used to analyze the audio signal and convert it into the appropriate text format.

[0812] Step 3:

[0813] The server analyzes the converted text data using IBM Watson Tone Analyzer to identify the user's emotions. The input is a text comment, and the output is emotional information (e.g., joy, excitement). The process involves analyzing the context of the text and the emotional tone.

[0814] Step 4:

[0815] The server uses OpenCV and TensorFlow to perform technical evaluations of image data. The input is an image file, and the output is technical metrics (including resolution and composition). Image analysis evaluates clarity and compositional balance.

[0816] Step 5:

[0817] The server extracts the best image based on emotional information and technical evaluation. This selection process involves scoring based on emotional and technical quality, choosing the image with the highest score. The input is emotional information and technical metrics, and the output is the selected image.

[0818] Step 6:

[0819] Based on the selected images, the album layout is customized to reflect the emotions expressed. The input consists of the selected images and emotional information, while the output is a personalized album layout. The customization process includes selecting color schemes and design elements that match the emotions.

[0820] Step 7:

[0821] This system uses information gathering tools to collect news and event information related to a specific date and incorporates it into an album. The input is date information, and the output is a list of related information. The server automatically scans external information sources and extracts relevant data.

[0822] Step 8:

[0823] The completed album data is visualized by the user and, if necessary, saved to a storage medium or printed. The input is the album layout and related information, and the output is the final album provided to the user. The server visualizes the generated data and processes it according to the user's preferences and printing requests.

[0824] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0825] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0826] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0827] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0828] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0829] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0830] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0831] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0832] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0833] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0834] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0835] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0836] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0837] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0838] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0839] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0840] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0841] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0842] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0843] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0844] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted as being incorporated by reference.

[0845] The following is further disclosed regarding the embodiments described above.

[0846] (Claim 1)

[0847] A storage means for receiving captured image data,

[0848] A selection means for selecting the optimal image from the aforementioned image data,

[0849] Information addition means for receiving text information based on user input,

[0850] A data collection method for collecting news data related to a specific date,

[0851] A generation means for generating an album by combining the selected images, text information, and news data,

[0852] A print instruction means for printing the generated album,

[0853] A delivery method for physically shipping printed albums,

[0854] A system that includes this.

[0855] (Claim 2)

[0856] The system according to claim 1, which automatically selects the optimal single sheet based on various evaluation criteria.

[0857] (Claim 3)

[0858] The system according to claim 1, wherein a template is used in generating the aforementioned album.

[0859] "Example 1"

[0860] (Claim 1)

[0861] A storage means for receiving image information from a shooting device and storing the image information via a network,

[0862] A selection method for selecting a high-quality image from the aforementioned image information based on multiple criteria such as composition, facial expression, and focus,

[0863] A means for receiving text data input from a user and adding it to an album,

[0864] A data collection method for automatically collecting news information related to a specific date and time,

[0865] A generation means that automatically constructs an album by combining the selected image information, text data, and news information,

[0866] A printing instruction means for outputting the constructed album onto paper media,

[0867] A means of delivery for physically transporting the printed album,

[0868] A system that includes this.

[0869] (Claim 2)

[0870] The system according to claim 1, which automatically selects one high-quality sheet based on various evaluation criteria.

[0871] (Claim 3)

[0872] The system according to claim 1, wherein a template is used in the construction of the aforementioned album.

[0873] "Application Example 1"

[0874] (Claim 1)

[0875] A storage means for receiving visual data recorded by a camera,

[0876] A selection means for selecting the optimal image from the aforementioned visual data,

[0877] An information addition means for receiving text information based on input from the user,

[0878] A data collection method for collecting information related to a specific date,

[0879] A generation means for generating an album by combining the selected visual data, text information, and related information,

[0880] An instruction means for issuing an instruction to print the generated album,

[0881] A delivery method for physically shipping printed albums,

[0882] A voice conversion means that receives voice input and converts it into text information,

[0883] A system that includes this.

[0884] (Claim 2)

[0885] The system according to claim 1, which automatically and optimally selects visual data based on various evaluation criteria.

[0886] (Claim 3)

[0887] The system according to claim 1, wherein a format is used in the generation of an album.

[0888] "Example 2 of combining an emotion engine"

[0889] (Claim 1)

[0890] A storage means for receiving captured image data,

[0891] An emotion analysis means for analyzing audio data related to the aforementioned image data and extracting emotion data,

[0892] A selection means for selecting the optimal image based on the aforementioned emotional data,

[0893] A means for receiving text information based on user input and analyzing emotions,

[0894] A data collection means for collecting news data related to specific dates and sentiment data,

[0895] A generation means that combines the selected images, text information, and news data to generate an album that reflects emotions,

[0896] A means for providing the generated album in digital format,

[0897] A system that includes this.

[0898] (Claim 2)

[0899] The system according to claim 1, which automatically selects the optimal image based on various evaluation criteria and user sentiment data.

[0900] (Claim 3)

[0901] The system according to claim 1, wherein a customized template based on emotional data is used in generating the aforementioned album.

[0902] "Application example 2 when combining with an emotional engine"

[0903] (Claim 1)

[0904] A data storage means for receiving captured image information,

[0905] An extraction means for selecting the optimal image from the aforementioned image information,

[0906] A sentiment analysis method that collects sentiment information based on user voice and comments,

[0907] A layout generation means for customizing the album layout based on the aforementioned emotional information,

[0908] Information gathering means for collecting information related to a specific date,

[0909] A record generation means that generates a record by combining the extracted images, emotional information, and collected information,

[0910] An instruction means for outputting the generated recorded data,

[0911] A system that includes this.

[0912] (Claim 2)

[0913] The system according to claim 1, which automatically selects the optimal image based on various evaluation criteria and emotional data.

[0914] (Claim 3)

[0915] The system according to claim 1, wherein in generating the record, an emotion-based design template is used. [Explanation of symbols]

[0916] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A storage means for receiving captured image data, A selection means for selecting the optimal image from the aforementioned image data, Information addition means for receiving text information based on user input, A data collection method for collecting news data related to a specific date, A generation means for generating an album by combining the selected images, text information, and news data, A printing instruction means for printing the generated album, A delivery method for physically shipping printed albums, A system that includes this.

2. The system according to claim 1, which automatically selects the optimal single sheet based on various evaluation criteria.

3. The system according to claim 1, wherein a template is used in generating the aforementioned album.