system

A system using generative AI to analyze and convert craftsmen's movements into digital format addresses the challenge of preserving traditional skills, ensuring their accurate transmission and standardization.

JP2026100728APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional technologies are difficult to verbalize and document, leading to the risk of extinction due to aging craftsmen and a lack of successors, necessitating a method to effectively record and transmit these skills to future generations.

Method used

A system utilizing motion image data from cameras to analyze craftsmen's movements with generative AI, converting technical information into a format that can be passed on, and incorporating feedback for system improvement.

Benefits of technology

Enables the accurate preservation and transmission of traditional skills, allowing for standardized training and replication by future generations.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100728000001_ABST
    Figure 2026100728000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A data processing means that receives motion image data acquired by a camera and analyzes the movement of an object based on the motion image data, An information conversion means that extracts technical information based on analyzed operational data and converts it into a format that can be passed down to future generations, A guide generation means that provides converted technical information to the user and generates guidelines for practical application, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] Traditional technologies have been maintained by skilled craftsmen, but due to aging and a shortage of successors, these precious technologies are on the verge of extinction. Furthermore, such technologies are difficult to verbalize and document, and have the characteristic of being difficult to传承 easily. Therefore, there is a need for a new method to effectively record these technologies and传承 them to future generations.

Means for Solving the Problems

[0005] This invention provides a system that utilizes motion image data obtained from a camera, analyzes the movements of craftsmen using a generating AI, extracts technical information, and converts it into a format that can be passed on to future generations. This system includes data processing means for analyzing motion data, information conversion means for converting it into technical information, and guide generation means for providing information to the user. Furthermore, by having functions for matching with an online database and receiving feedback from users, it facilitates the preservation and transmission of technology and addresses the challenge of a shortage of successors.

[0006] A "filming device" is a piece of equipment used to acquire still images or videos of a subject, and is used to record the technical movements of a craftsman.

[0007] "Motion data" refers to video and sequential still image data acquired by a camera, and includes information about the actions of the craftsman being analyzed.

[0008] "Data processing means" refers to hardware or software functions that analyze received video data and capture the movement of an object in detail.

[0009] "Information conversion means" refers to devices or programs that extract technical information based on analyzed operational data and convert it into a format that can be passed down to future generations.

[0010] The term "guide generation means" refers to a function that creates practical guidelines for users based on the converted technical information.

[0011] An "online database" is a collection of data accessible via the internet, enabling the matching of similar technologies and related information.

[0012] "Feedback processing means" refers to a function within a system that receives input and opinions from users and uses them to improve technical information and the guidelines provided. [Brief explanation of the drawing]

[0013] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0014] An example of an embodiment of the system according to the technology of the present disclosure will be described below with reference to the accompanying drawings.

[0015] First, the terms used in the following description will be explained.

[0016] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0017] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0018] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0019] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0021] [First Embodiment]

[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0030] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0034] This invention provides a system that utilizes generative AI to pass on traditional techniques to future generations, and offers a method for analyzing and saving the technical actions of craftsmen as digital data. This system performs a series of processes, including the acquisition of dynamic image data, its analysis, the conversion of technical information, the provision of information to the user, and the processing of feedback.

[0035] Acquisition of video data

[0036] The user uses a camera to record the actions of a craftsman as they perform their technique, and imports the high-resolution video data into their device. It is crucial that this video data accurately captures the movements of the craftsman's hands and tools.

[0037] Motion analysis and extraction of technical information

[0038] The server receives video data transmitted from the terminal and analyzes the movements using data processing equipment. Using a generative AI model, it analyzes the movements and their subtle characteristics, extracting the core elements of the relevant technology. This allows the traditional gestures and movements of the technology to be preserved as digital data.

[0039] Technical information conversion and guide generation

[0040] The server converts the extracted technical information into a format that can be passed down to future generations using information conversion means, and generates guidelines to provide to users. In this process, the guidelines are created that explain in detail practical procedures and the selection of tools to be used, based on the analyzed data.

[0041] Feedback processing and system improvement

[0042] Users implement the technology based on the provided guidelines and send feedback about the process and results from their device to the server. The server uses the information collected through the feedback processing mechanism to improve the quality of the technical information and the overall system.

[0043] As a concrete example, consider the preservation of traditional pottery manufacturing procedures. Users meticulously photograph the craftsman's work process, and the server analyzes this data to record the specific movements of shaping and painting as digital data. The generated guidelines are used as teaching materials for new craftsmen, contributing to the standardization and preservation of skills. This makes it possible to pass on techniques that are in danger of disappearing to future generations.

[0044] The following describes the processing flow.

[0045] Step 1:

[0046] Users capture the technical movements of artisans using a high-quality camera and save the video data to their devices. This filming is done from multiple angles, and it is important to capture the details of the technique.

[0047] Step 2:

[0048] The device uploads the captured video data to the server. During this process, the data format and quality must be converted to meet the server's processing requirements.

[0049] Step 3:

[0050] The server receives the uploaded video data and performs preprocessing such as noise reduction and frame rate adjustment. This process prepares the data for analysis.

[0051] Step 4:

[0052] The server inputs pre-processed data into a generating AI model to analyze the craftsman's movements. This analysis recognizes details such as hand movements, tool usage, and timing of actions.

[0053] Step 5:

[0054] The server extracts the characteristics of the technology based on the analysis results and converts them into technical information in a form that can be passed down to future generations. Using information conversion means, the core elements of the technology are digitized and stored.

[0055] Step 6:

[0056] The server creates guidelines for providing users with the generated technical information. These guidelines include practical procedures, precautions, and information on necessary tools.

[0057] Step 7:

[0058] The device displays the generated guidelines to the user, supporting the practical application of the technology. Users can learn the technology by referring to the provided guidelines.

[0059] Step 8:

[0060] Users practice the technology according to the guidelines and send feedback on their results and areas for improvement to the server via their devices. This feedback is used to improve the technical information.

[0061] Step 9:

[0062] The server analyzes the feedback received and updates technical information and guidelines as needed. This update ensures continuous system improvement.

[0063] (Example 1)

[0064] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0065] The transmission of traditional techniques is an extremely difficult challenge, especially as the number of skilled artisans declines. In particular, there is a lack of appropriate means to accurately understand the technical movements of artisans and pass them on to the next generation. Traditional methods make it difficult to accurately record and transmit the characteristics and subtle movements of skills, raising concerns about the loss of these techniques. Therefore, a system is needed to analyze artisans' movements in detail and ensure the reliable transmission of skills to future generations.

[0066] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0067] In this invention, the server includes data processing means for receiving motion image data acquired by a camera and analyzing the movement of an object based on the motion image data; information conversion means for extracting technical information using a generated AI model based on the analyzed motion data and converting it into a data format that can be passed on to future generations; and guide generation means for providing the converted technical information to the user and generating guidelines that explain detailed procedures for practice and the tools to be used. This makes it possible to accurately analyze the technical movements of craftsmen and save and convert them as digital data. This contributes to the standardization and transmission of technology and realizes the protection of traditional techniques for the future.

[0068] A "camera" is a device used to capture the technical movements of a craftsman in high resolution, and is used to record the craftsman's skills in detail.

[0069] "Motion image data" refers to video-format data that records the actions and techniques of craftsmen acquired by a camera, and it forms the basis for analysis.

[0070] A "server" is a computer system that manages a series of processes, including processing received video and image data, analyzing and converting technical information, and providing it to the user.

[0071] "Data processing means" refers to technology that analyzes video data within a server to understand the actions of the craftsman who is the subject of the analysis.

[0072] A "generative AI model" is a machine learning algorithm used to analyze information extracted from video and image data and derive detailed technical information.

[0073] "Information conversion means" are methods for converting analyzed technical information into a format that can be passed down to future generations, and they play a role in organizing technical information as digital data.

[0074] A "guide generation method" is a means of constructing guidelines that explain procedures and usage methods in a format that is easy for users to understand, based on the converted technical information.

[0075] "Feedback processing means" refers to devices and methods for improving systems and technical information based on feedback received from users.

[0076] To implement this invention, it is possible to analyze, preserve, and transmit the technology following the steps outlined below.

[0077] The user first records the craftsman demonstrating their skills using a high-resolution camera. A portable, high-definition camera like a GoPro is recommended. The recorded video and image data are transferred and saved to the user's device. Laptops and smartphones are used as devices, and file transfer is done via USB cable or Wi-Fi.

[0078] Next, the user uploads this video data to the server via a dedicated web application. The uploaded data is analyzed by a data processing unit on the server. The server performs this analysis using generative AI models such as TENSORFLOW®. This AI model analyzes the specific actions and tool usage of the craftsman frame by frame, extracting the core elements of the technique as data. For example, molding techniques and painting methods in pottery are analyzed.

[0079] The analyzed data is stored in a database such as MongoDB using an information transformation mechanism. There, it is formalized and organized as data that can be transmitted. Based on this data, the server uses a guide generation mechanism to create guidelines to provide to the user. These guidelines are generated, for example, in JSON format and are provided in a way that allows the user to visually understand the technology, including detailed procedures, tools to be used, and precautions.

[0080] Ultimately, users review the generated guidelines on a dedicated viewer application to aid in their technical learning. Any questions or suggestions for improvement during practical application are sent to the server as feedback, and the server uses this information to further refine the technical details and improve the system.

[0081] A concrete example is the preservation of traditional pottery manufacturing techniques. Users meticulously photograph the work procedures of artisans, and the server analyzes this data to record the specific actions involved in shaping and painting pottery as digital data, generating educational materials useful for training the next generation of artisans. An example of a prompt would be, "Describe the traditional pottery shaping procedure in three steps and list the names of the tools used."

[0082] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0083] Step 1:

[0084] The user collects video data using a recording device to document the craftsman's techniques in high resolution. This input data captures the specific movements of the craftsman's hands and tools. The user saves this video data to their device using a USB cable or Wi-Fi. The output here is a high-resolution video data file.

[0085] Step 2:

[0086] Users upload video and image data stored on their devices to the server. The input is video and image files stored on the user's device, and the output is data stored on the server. Users complete the upload by using a web browser and easily dragging and dropping files through a dedicated online portal.

[0087] Step 3:

[0088] The server passes the received video data to the data processing unit. Based on the video data as input, it launches a generative AI model (e.g., using TensorFlow) to perform analysis. At this stage, the server analyzes the artisan's movements in the video frame by frame and extracts motion vector data and features. The output is a set of digital data that shows the structure of the artisan's movements.

[0089] Step 4:

[0090] The server transforms the extracted behavioral feature information using an information transformation mechanism and stores it in a database. The input is the technical feature data resulting from the analysis, and the output is data in a format that can be passed on to future generations. Here, the server uses a database such as MongoDB as the transformation process to structure and store the technical information.

[0091] Step 5:

[0092] The server creates detailed guidelines using guide generation tools based on structured technical information. It references a technical database as input and generates guidelines to be provided to the user as output. The server provides the guidelines in JSON or other formats, including practical procedures and tools to be used.

[0093] Step 6:

[0094] Users receive and review guidelines provided by the server through a viewer application. The input is guideline data from the server, and the output is guide information in a visually easy-to-understand format. Users can then use this information to implement the technology.

[0095] Step 7:

[0096] Users fill out a form with their practical results and feedback on the guidelines, and send it from their device to the server. The input is feedback information based on the user's practical experience, and the output is recorded as feedback data on the server. This allows for system improvements and increased accuracy of technical information.

[0097] (Application Example 1)

[0098] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0099] Conventional technologies have made it difficult to effectively standardize the skills and techniques of experienced craftsmen and for robots to replicate them in production sites such as factories. As a result, the know-how of highly skilled workers has not been widely shared, leading to challenges in achieving sufficient production efficiency and quality stability.

[0100] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0101] In this invention, the server includes data processing means for receiving motion image data acquired by a camera and analyzing the movement of an object based on the motion image data; information conversion means for extracting technical information based on the analyzed motion data and converting it into a format that can be passed on to future generations; guide generation means for providing the converted technical information to a user and generating guidelines for practical application; and control means for supplying motion instructions to a robot control device so that the robot can reproduce human movements. This makes it possible to teach the skills of skilled craftsmen to robots in the factory, thereby improving production efficiency and product quality.

[0102] A "filming device" is a device used to capture the movements of a craftsman in high resolution and record them as video data.

[0103] "Motion image data" refers to digital data that meticulously records the technical movements of a craftsman, and serves as the basis for analysis.

[0104] "Data processing means" refers to methods and devices for analyzing acquired video data and capturing the movement of an object.

[0105] "Information conversion means" refers to methods and devices for converting analyzed technical information into a format that can be transmitted to future generations.

[0106] A "guide generation means" is a method and apparatus for creating guidelines based on converted technical information so that users can put the technology into practice.

[0107] "Control means" refers to methods and devices that supply operation instructions to a robot control device, enabling the robot to reproduce human movements.

[0108] To realize this invention, a recording device is first required. This recording device acquires the technical actions performed by the craftsman as high-resolution video and generates dynamic image data. This allows for detailed recording of the craftsman's hand movements and the operation of the tools they use.

[0109] Next, the server receives this video data and performs analysis using data processing tools. This analysis uses software libraries such as OpenCV to break down the video data into frames. Then, generative AI models such as TensorFlow and PyTorch are used to extract the actions and characteristics of the craftsman and convert them into technical information in a digital format.

[0110] The converted technical information is processed by information conversion means into a format that can be passed down to future generations. This information is then provided to users as guidelines by guide generation means. These guidelines, including specific procedures and the selection of tools to be used, are utilized as educational materials for learning new technologies.

[0111] Furthermore, based on the analysis results, the control system supplies operation instructions to the robot control device. This allows the robots in the factory to replicate the movements of skilled workers and perform standardized, high-quality work.

[0112] A concrete example is the precision welding process. The movements of a skilled welder are filmed by a camera, and these movements are then programmed into a robot to perform the work. As a result of this process, uniform welding quality is achieved, and production efficiency is improved.

[0113] A concrete example of a prompt message for a generated AI model is: "Analyze the welding motion from the video data. Extract the motion characteristics and convert them into motion instructions for a factory robot."

[0114] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0115] Step 1:

[0116] The user uses a recording device to record the high-precision movements of a craftsman. In this process, the camera captures the craftsman's movements as video data, which is then saved to the device. The input is the craftsman's movements, and the output is high-resolution video data.

[0117] Step 2:

[0118] The terminal sends the captured video data to the server. The server supplies the received data to a data processing system, which uses libraries such as OpenCV to decompose the video data into individual frames. The input is video data, and the output is the video data decomposed into frames.

[0119] Step 3:

[0120] The server inputs the decomposed image data into a generating AI model, which then analyzes the characteristics of the craftsman's movements using TensorFlow and PyTorch. The goal of the analysis is to extract the core elements of the movements. The input is image data for each frame, and the output is the extracted technical information.

[0121] Step 4:

[0122] The server passes the extracted technical information to an information conversion system, which converts it into a format that can be passed down to future generations. This conversion process formats the technical information into text or guideline format. The input is the technical information, and the output is the converted guideline.

[0123] Step 5:

[0124] The server provides the converted guidelines to the user. The user is then helped to acquire the necessary skills using these guidelines. The input is the converted guidelines, and the output is the guide information presented to the user.

[0125] Step 6:

[0126] The server further generates the necessary motion instructions for the robot control device and provides them to the robot via the control means. This allows the robot to replicate the movements of a skilled worker. The input is technical information, and the output is control signals.

[0127] Step 7:

[0128] The user sends the results as feedback from their device to the server. The server collects this feedback and uses it to improve the accuracy of the generated AI model and the overall system. The input is user feedback, and the output is information for improving the system.

[0129] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0130] This invention provides a system that, in addition to inheriting traditional techniques, enables flexible learning support that takes into account the user's emotional state. This system consists of a generation AI that analyzes video data and an emotion engine that analyzes the user's emotional state.

[0131] Data acquisition and preprocessing

[0132] Users film the technical actions performed by craftsmen and send this video data to the server via their devices. It is recommended that this video data be filmed in high definition from multiple angles. After receiving this data, the server performs pre-processing such as noise reduction and image quality adjustment to prepare it for analysis.

[0133] Motion analysis and conversion to technical information

[0134] The server uses data processing tools to analyze video data and accurately recognize the hand movements and technical actions of the craftsman. The analyzed data is converted into technical information by information conversion tools, and detailed guidelines are created outlining specific procedures, tools to be used, and points to note.

[0135] Analysis of user state using an emotion engine

[0136] The server utilizes audio data and additional video data from the user to analyze the user's emotional state in real time using an emotion engine. This engine identifies the levels of stress and motivation the user experiences during learning and enables appropriate responses.

[0137] Providing guidelines and optimizing the user experience

[0138] The device provides the user with generated guidelines, but their content and order are personalized based on insights from the emotion engine. For example, if the user is feeling anxious, detailed explanations and reminders can be added for steps that require particular attention. If motivation is low, suggestions to start with easier steps will be made.

[0139] Feedback processing and continuous system improvement

[0140] Users send learning results, impressions, and assignments as feedback from their devices to the server. Based on this feedback, the server accumulates information to improve technical information, guidelines, and the emotion engine algorithm, thereby enhancing the user experience in the future.

[0141] For example, when transmitting the techniques of the tea ceremony, the system can detect tension through the user's facial expressions and tone of voice, and then display advice for relaxation and guidance on specific breathing techniques. In this way, the present invention is groundbreaking not only in the transmission of techniques but also in providing a learning experience that takes the user's emotions into consideration.

[0142] The following describes the processing flow.

[0143] Step 1:

[0144] Users will use a high-quality camera to record videos of the craftsman's techniques and save the data to their device. It is recommended to film from multiple viewpoints to capture detailed hand movements.

[0145] Step 2:

[0146] The device uploads the captured video data to the server. During the upload process, the data format is converted to one suitable for analysis, enabling efficient data transfer.

[0147] Step 3:

[0148] The server receives the uploaded video data and performs pre-processing such as noise reduction and brightness adjustment. This optimizes the data for analysis.

[0149] Step 4:

[0150] The server analyzes pre-processed data using a generative AI model to recognize the hand movements and technical actions of the craftsman. This process identifies the flow of movement and key points of action.

[0151] Step 5:

[0152] The server extracts technical information based on recognized operational data and uses information conversion means to transform it into guidelines that can be passed down to future generations. The guidelines include detailed instructions for each step and the intent behind the actions.

[0153] Step 6:

[0154] The terminal displays guidelines provided by the server to the user. The user can then practice the technology while referring to these guidelines.

[0155] Step 7:

[0156] The server uses additional audio and video data provided by the user to activate the emotion engine and analyze the user's emotional state in real time during the learning process.

[0157] Step 8:

[0158] The user's emotional data, analyzed by the emotion engine, is used to adjust the learning content. For example, if the user is feeling stressed, simple steps are emphasized, and if motivation needs to be increased, encouraging messages are added.

[0159] Step 9:

[0160] After putting the technology into practice, users send feedback to the server via their device. This feedback includes the results of the practice, any problems encountered, and changes in their feelings.

[0161] Step 10:

[0162] The server analyzes the feedback and improves technical information, guidelines, and the sentiment engine algorithms. These improvements are intended to make the user experience better in the future.

[0163] (Example 2)

[0164] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0165] Conventional technology transfer systems have problems such as inaccurate motion analysis and a lack of personalized learning support that takes into account the user's emotional state. Furthermore, there are challenges in effectively utilizing user feedback and continuously improving the system.

[0166] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0167] In this invention, the server includes information processing means for receiving video data acquired by an imaging device and analyzing the target's actions based on the video data; information conversion means for extracting technical information based on the analyzed action data and converting it into a transferable format; and guide generation means for providing the converted technical information to the user and generating guidance guidelines for practical application. This enables accurate analysis of the technology and personalized learning support that takes into account emotional states.

[0168] An "imaging device" is a device used to acquire video data, and typically includes cameras or the camera function of a smartphone.

[0169] "Reception" refers to the acquisition of data sent from another device or instrument.

[0170] "Video data" refers to moving image data acquired by an imaging device, which is used for motion analysis.

[0171] An "information processing means" is a device that has the function of analyzing video data to recognize the actions of a subject and extracting information about those actions.

[0172] An "information conversion means" is a means for converting technical information into a specific format based on analyzed operational information.

[0173] A "guide generation method" is a means of generating practical guidance guidelines for users based on converted technical information.

[0174] "Emotional analysis means" refers to a method for analyzing a user's voice and video data to analyze their emotional state in real time.

[0175] "Network data storage" refers to a database that stores relevant technical information and makes it accessible online.

[0176] A "feedback processing mechanism" is a means that receives opinions from users and processes them to help improve the system.

[0177] The embodiments for carrying out the present invention are described below.

[0178] The system of the present invention provides flexible learning support that takes into account the transmission of traditional techniques and the emotional state of the user, through the collaboration of the user, terminal, and server. First, the user acquires video data of the technique to be learned using an imaging device, such as a smartphone or a dedicated camera. This video data is transmitted to the server via the user's terminal.

[0179] The server uses information processing tools to analyze the received video data in order to process it quickly and accurately. Processing of video image data includes pre-processing such as noise reduction and image quality correction. The generative AI model used here analyzes the technical operations from the video data in real time, and based on the results, extracts and formats specific technical information using information conversion tools.

[0180] Furthermore, the server uses emotion analysis to evaluate the user's emotional state in real time based on audio data and additional video data received from the user. Based on this evaluation, the generated technical information is personalized by a guide generation mechanism, providing more effective learning support. The terminal presents personalized guidance to the user at the appropriate time, improving the user's learning experience.

[0181] For example, if emotional analysis reveals that a user is feeling tense during the process of learning the tea ceremony, the server will add instructions on breathing techniques to promote relaxation to the instruction guidelines. This information will then be displayed on the user's device.

[0182] An example of a prompt message could be, "I would like to learn the basic movements of the tea ceremony. I sometimes feel nervous, so please also teach me how to relax." In this way, the present invention efficiently transfers technical knowledge and provides a learning experience that incorporates individualized considerations.

[0183] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0184] Step 1:

[0185] The user acquires video footage of the traditional technique to be studied using an imaging device. It is recommended to use a smartphone or dedicated camera to capture high-definition video from multiple angles. The input is the acquired video data, and the output is a video file stored on the user's device.

[0186] Step 2:

[0187] The user sends video data captured using their device to the server. The input is a video file stored on the user's device, and the output is the video data received by the server.

[0188] Step 3:

[0189] The server uses information processing tools to pre-process the received video data, performing tasks such as noise reduction and image quality correction. This prepares the data for high-quality analysis. The input is the video data received by the server, and the output is the pre-processed video data.

[0190] Step 4:

[0191] The server analyzes pre-processed video data using a generative AI model to identify and extract target actions. This yields the results of the technical action analysis. The input is pre-processed video data, and the output is the analyzed action data.

[0192] Step 5:

[0193] The server converts the analyzed operational data into technical information through an information conversion mechanism. This process formats the data into a format that includes specific procedures, tools used, and points to note. The input is the analyzed operational data, and the output is guideline data as technical information.

[0194] Step 6:

[0195] The server analyzes audio and additional video data from the user in real time and uses sentiment analysis to understand the user's emotional state. The input is audio and video data from the user, and the output is the analysis result of the user's emotional state.

[0196] Step 7:

[0197] The server personalizes technical information based on sentiment analysis results and generates optimized guidance guidelines using a guide generation mechanism. The input consists of guideline data as technical information and the results of emotional state analysis, while the output is personalized guidance guidelines.

[0198] Step 8:

[0199] The terminal displays personalized guidance received from the server to the user. The input is the guidance received from the server, and the output is the guidance content displayed to the user.

[0200] Step 9:

[0201] Users learn based on the instructional guidelines and send feedback to the server via their device. The input is the user's learning results and feedback, while the output is the feedback information sent to the server.

[0202] Step 10:

[0203] The server uses user feedback to improve technical information, guidelines, and sentiment analysis algorithms, optimizing them for future system use. Input is user feedback, and output is improved technical information and system functionality.

[0204] (Application Example 2)

[0205] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0206] In the transmission of traditional techniques, there is a problem in providing instruction optimized for each individual learner. Furthermore, there is a need to provide flexible learning support that takes into account the emotional state of learners, but conventional systems have made it difficult to achieve this. In addition, there is a need for a reliable method to accurately analyze the movements of the techniques themselves and transmit them to future generations as technical information.

[0207] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0208] In this invention, the server includes data processing means for receiving video data acquired by a camera and analyzing the subject's actions based on the video data; information conversion means for extracting technical data based on the analyzed action data and converting it into a format that can be transmitted to future generations; guide generation means for providing the converted technical data to the user and generating guidelines for implementation; emotion analysis means for analyzing the user's emotional state in real time using audio data and additional video data; and guideline optimization means for individualizing the content and order of the guidelines based on the emotional state. This enables personalized instruction according to the learner's emotional state, making it possible to achieve more effective and user-friendly technology transfer.

[0209] "Photography equipment" refers to devices used to acquire moving image data and to capture the movement of an object from multiple angles.

[0210] A "data processing means" is a device that has the function of analyzing the movement of an object based on video data received from a camera.

[0211] An "information conversion means" is a device that generates technical documents based on analyzed operational data and converts them into a format that can be transmitted to future generations.

[0212] A "guide generation means" is a device that provides converted technical documents to users and has the function of generating guidelines for implementation.

[0213] "Voice data" refers to data obtained from a user's speech and voice, and is used for analyzing their emotional state.

[0214] "Emotional analysis tools" are devices that utilize audio data and additional video data to analyze the user's emotional state in real time.

[0215] A "guideline optimization method" is a device that has the function of individualizing the content and order of guidelines based on emotional state.

[0216] The system for realizing this invention is configured as follows: The server first receives video data acquired using a camera. This data is analyzed by a data processing means using OpenCV or the like to analyze the movement of the object. Based on the analysis results, an information conversion means generates technical data and converts it into a format that can be transmitted to future generations.

[0217] Furthermore, the server utilizes audio data and additional video data to analyze the user's emotional state in real time using frameworks such as TensorFlow and PyTorch. Based on the insights obtained from this emotion analysis, the guide generation system provides the user with transformed technical materials via the terminal, personalizing the guidelines according to the user's emotional state. The guide optimization system takes into account the user's tension and decreased motivation, appropriately adjusting the learning order and content.

[0218] For example, if the system analyzes a user's facial expressions and tone of voice while they are learning the procedures of the Japanese tea ceremony and detects that they are tense, it will display relaxation suggestions and guidance on specific breathing techniques on the screen. In this way, the system can optimize the learning experience according to the user's emotional state.

[0219] By using a generative AI model, one possible example of a prompt message regarding the user's learning progress is: "Analyze how the user is learning the technology, and analyze their emotional state from their facial expressions and voice. Based on this, suggest the optimal learning method."

[0220] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0221] Step 1:

[0222] The server receives video data transmitted from the camera. At this stage, noise reduction and image quality adjustments are performed to prepare the acquired data for proper processing. The input is unprocessed video data, and the output is video data that can be analyzed.

[0223] Step 2:

[0224] The server analyzes the pre-processed video data using OpenCV. This analysis recognizes the movement of the object frame by frame and extracts specific movement information. The input is the analyzable video data obtained in step 1, and the output is detailed information about the movement.

[0225] Step 3:

[0226] The server generates technical documentation based on the analyzed operational information using an information conversion mechanism. This documentation includes text and diagrams and is converted into a format that can be transmitted to future generations. The input is the operational information obtained in step 2, and the output is the technical documentation.

[0227] Step 4:

[0228] The terminal provides the user with generated technical documents. In doing so, it sends audio data and additional video data to a server for real-time analysis of the user's emotional state. The input consists of the technical documents and audio data from the user, while the output is the user's emotional information.

[0229] Step 5:

[0230] The server uses TensorFlow or PyTorch to perform sentiment analysis and estimate the user's emotional state. Based on the insights gained, a guide generation system creates personalized guidelines. The input is the user's emotional information, and the output is personalized guidelines.

[0231] Step 6:

[0232] The device supports learning by displaying user-adjusted guidance based on their emotional state. Specifically, it suggests relaxation when the user is feeling stressed and adds detailed guidance where attention is needed. The input is personalized guidance, and the output is an optimized learning experience.

[0233] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0234] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0235] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may be performed by the smart device 14.

[0236] [Second Embodiment]

[0237] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0238] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0239] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0240] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0241] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0242] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0243] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0244] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0245] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0246] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0247] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0248] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0249] This invention provides a system that utilizes generative AI to pass on traditional techniques to future generations, and offers a method for analyzing and saving the technical actions of craftsmen as digital data. This system performs a series of processes, including the acquisition of dynamic image data, its analysis, the conversion of technical information, the provision of information to the user, and the processing of feedback.

[0250] Acquisition of video data

[0251] The user uses a camera to record the actions of a craftsman as they perform their technique, and imports the high-resolution video data into their device. It is crucial that this video data accurately captures the movements of the craftsman's hands and tools.

[0252] Motion analysis and extraction of technical information

[0253] The server receives video data transmitted from the terminal and analyzes the movements using data processing equipment. Using a generative AI model, it analyzes the movements and their subtle characteristics, extracting the core elements of the relevant technology. This allows the traditional gestures and movements of the technology to be preserved as digital data.

[0254] Technical information conversion and guide generation

[0255] The server converts the extracted technical information into a format that can be passed down to future generations using information conversion means, and generates guidelines to provide to users. In this process, the guidelines are created that explain in detail practical procedures and the selection of tools to be used, based on the analyzed data.

[0256] Feedback processing and system improvement

[0257] Users implement the technology based on the provided guidelines and send feedback about the process and results from their device to the server. The server uses the information collected through the feedback processing mechanism to improve the quality of the technical information and the overall system.

[0258] As a concrete example, consider the preservation of traditional pottery manufacturing procedures. Users meticulously photograph the craftsman's work process, and the server analyzes this data to record the specific movements of shaping and painting as digital data. The generated guidelines are used as teaching materials for new craftsmen, contributing to the standardization and preservation of skills. This makes it possible to pass on techniques that are in danger of disappearing to future generations.

[0259] The following describes the processing flow.

[0260] Step 1:

[0261] Users capture the technical movements of artisans using a high-quality camera and save the video data to their devices. This filming is done from multiple angles, and it is important to capture the details of the technique.

[0262] Step 2:

[0263] The device uploads the captured video data to the server. During this process, the data format and quality must be converted to meet the server's processing requirements.

[0264] Step 3:

[0265] The server receives the uploaded video data and performs preprocessing such as noise reduction and frame rate adjustment. This process prepares the data for analysis.

[0266] Step 4:

[0267] The server inputs pre-processed data into a generating AI model to analyze the craftsman's movements. This analysis recognizes details such as hand movements, tool usage, and timing of actions.

[0268] Step 5:

[0269] The server extracts the characteristics of the technology based on the analysis results and converts them into technical information in a form that can be passed down to future generations. Using information conversion means, the core elements of the technology are digitized and stored.

[0270] Step 6:

[0271] The server creates guidelines for providing users with the generated technical information. These guidelines include practical procedures, precautions, and information on necessary tools.

[0272] Step 7:

[0273] The device displays the generated guidelines to the user, supporting the practical application of the technology. Users can learn the technology by referring to the provided guidelines.

[0274] Step 8:

[0275] Users practice the technology according to the guidelines and send feedback on their results and areas for improvement to the server via their devices. This feedback is used to improve the technical information.

[0276] Step 9:

[0277] The server analyzes the feedback received and updates technical information and guidelines as needed. This update ensures continuous system improvement.

[0278] (Example 1)

[0279] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0280] The transmission of traditional techniques is an extremely difficult challenge, especially as the number of skilled artisans declines. In particular, there is a lack of appropriate means to accurately understand the technical movements of artisans and pass them on to the next generation. Traditional methods make it difficult to accurately record and transmit the characteristics and subtle movements of skills, raising concerns about the loss of these techniques. Therefore, a system is needed to analyze artisans' movements in detail and ensure the reliable transmission of skills to future generations.

[0281] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0282] In this invention, the server includes data processing means for receiving moving image data acquired by an imaging device and analyzing the actions of an object based on the moving image data, information conversion means for extracting technical information using a generated AI model based on the analyzed action data and converting it into a data format that can be passed on to future generations, and guide generation means for providing the converted technical information to a user and generating guidelines for explaining detailed procedures for practice and tools to be used. As a result, it becomes possible to accurately analyze the technical actions of craftsmen and save and convert them as digital data. This contributes to the standardization and inheritance of technology and realizes the protection of traditional technologies for the future.

[0283] The "imaging device" is a device for acquiring the technical actions of a craftsman with high resolution and is used to record the craftsman's skills in detail.

[0284] The "moving image data" refers to video-formatted data in which the actions and skills of a craftsman acquired by an imaging device are recorded and serves as the basis for analysis.

[0285] The "server" is a computer system that manages a series of processes for processing the received moving image data, analyzing, converting, and providing technical information to a user.

[0286] The "data processing means" refers to the technology for analyzing moving image data within the server and understanding the actions of the craftsman, who is the object.

[0287] The "generated AI model" is a machine learning algorithm used to analyze information extracted from moving image data and derive detailed technical information.

[0288] The "information conversion means" is a means for converting the analyzed technical information into a format that can be passed on to future generations and has the role of organizing the technical information as digital data.

[0289] A "guide generation method" is a means of constructing guidelines that explain procedures and usage methods in a format that is easy for users to understand, based on the converted technical information.

[0290] "Feedback processing means" refers to devices and methods for improving systems and technical information based on feedback received from users.

[0291] To implement this invention, it is possible to analyze, preserve, and transmit the technology following the steps outlined below.

[0292] The user first records the craftsman demonstrating their skills using a high-resolution camera. A portable, high-definition camera like a GoPro is recommended. The recorded video and image data are transferred and saved to the user's device. Laptops and smartphones are used as devices, and file transfer is done via USB cable or Wi-Fi.

[0293] Next, the user uploads this video data to the server via a dedicated web application. The uploaded data is analyzed by a data processing unit on the server. The server performs this analysis using generative AI models such as TensorFlow. This AI model analyzes the specific actions and tool usage of the craftsman frame by frame, extracting the core elements of the technique as data. For example, shaping techniques and painting methods in pottery are analyzed.

[0294] The analyzed data is stored in a database such as MongoDB using an information transformation mechanism. There, it is formalized and organized as data that can be transmitted. Based on this data, the server uses a guide generation mechanism to create guidelines to provide to the user. These guidelines are generated, for example, in JSON format and are provided in a way that allows the user to visually understand the technology, including detailed procedures, tools to be used, and precautions.

[0295] Ultimately, users review the generated guidelines on a dedicated viewer application to aid in their technical learning. Any questions or suggestions for improvement during practical application are sent to the server as feedback, and the server uses this information to further refine the technical details and improve the system.

[0296] A concrete example is the preservation of traditional pottery manufacturing techniques. Users meticulously photograph the work procedures of artisans, and the server analyzes this data to record the specific actions involved in shaping and painting pottery as digital data, generating educational materials useful for training the next generation of artisans. An example of a prompt would be, "Describe the traditional pottery shaping procedure in three steps and list the names of the tools used."

[0297] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0298] Step 1:

[0299] The user collects video data using a recording device to document the craftsman's techniques in high resolution. This input data captures the specific movements of the craftsman's hands and tools. The user saves this video data to their device using a USB cable or Wi-Fi. The output here is a high-resolution video data file.

[0300] Step 2:

[0301] Users upload video and image data stored on their devices to the server. The input is video and image files stored on the user's device, and the output is data stored on the server. Users complete the upload by using a web browser and easily dragging and dropping files through a dedicated online portal.

[0302] Step 3:

[0303] The server passes the received moving image data to the data processing unit. Based on the moving image data as input, it activates and analyzes a generative AI model (for example, using TensorFlow). At this stage, the server analyzes the actions of the craftsman in the video frame by frame and extracts motion vector data and feature quantities. The output is a group of digital data indicating the structure of the craftsman's motion.

[0304] Step 4:

[0305] The server converts the extracted motion feature information by the information conversion means and stores it in the database. The input is the feature data of the technology which is the result of the analysis, and the output is data in a form that can be passed on to future generations. Here, the server uses a database such as MongoDB as the conversion process to structure and store the technical information.

[0306] Step 5:

[0307] Based on the structured technical information, the server uses the guide generation means to create detailed guidelines. It refers to the technical database as input and generates the guidelines provided to the user as output. The server arranges the guidelines including practical procedures and tools to be used in JSON or other formats.

[0308] Step 6:

[0309] The user receives and checks the guidelines provided by the server through the viewer application. The input is the guideline data from the server, and the output is guide information in a form that is easy for the user to visually understand. The user can practice the technology based on this information.

[0310] Step 7:

[0311] Users fill out a form with their practical results and feedback on the guidelines, and send it from their device to the server. The input is feedback information based on the user's practical experience, and the output is recorded as feedback data on the server. This allows for system improvements and increased accuracy of technical information.

[0312] (Application Example 1)

[0313] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0314] Conventional technologies have made it difficult to effectively standardize the skills and techniques of experienced craftsmen and for robots to replicate them in production sites such as factories. As a result, the know-how of highly skilled workers has not been widely shared, leading to challenges in achieving sufficient production efficiency and quality stability.

[0315] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0316] In this invention, the server includes data processing means for receiving motion image data acquired by a camera and analyzing the movement of an object based on the motion image data; information conversion means for extracting technical information based on the analyzed motion data and converting it into a format that can be passed on to future generations; guide generation means for providing the converted technical information to a user and generating guidelines for practical application; and control means for supplying motion instructions to a robot control device so that the robot can reproduce human movements. This makes it possible to teach the skills of skilled craftsmen to robots in the factory, thereby improving production efficiency and product quality.

[0317] A "filming device" is a device used to capture the movements of a craftsman in high resolution and record them as video data.

[0318] "Motion image data" refers to digital data that meticulously records the technical movements of a craftsman, and serves as the basis for analysis.

[0319] "Data processing means" refers to methods and devices for analyzing acquired video data and capturing the movement of an object.

[0320] "Information conversion means" refers to methods and devices for converting analyzed technical information into a format that can be transmitted to future generations.

[0321] A "guide generation means" is a method and apparatus for creating guidelines based on converted technical information so that users can put the technology into practice.

[0322] "Control means" refers to methods and devices that supply operation instructions to a robot control device, enabling the robot to reproduce human movements.

[0323] To realize this invention, a recording device is first required. This recording device acquires the technical actions performed by the craftsman as high-resolution video and generates dynamic image data. This allows for detailed recording of the craftsman's hand movements and the operation of the tools they use.

[0324] Next, the server receives this video data and performs analysis using data processing tools. This analysis uses software libraries such as OpenCV to break down the video data into frames. Then, generative AI models such as TensorFlow and PyTorch are used to extract the actions and characteristics of the craftsman and convert them into technical information in a digital format.

[0325] The converted technical information is processed by information conversion means into a format that can be passed down to future generations. This information is then provided to users as guidelines by guide generation means. These guidelines, including specific procedures and the selection of tools to be used, are utilized as educational materials for learning new technologies.

[0326] Furthermore, based on the analysis results, the control system supplies operation instructions to the robot control device. This allows the robots in the factory to replicate the movements of skilled workers and perform standardized, high-quality work.

[0327] A concrete example is the precision welding process. The movements of a skilled welder are filmed by a camera, and these movements are then programmed into a robot to perform the work. As a result of this process, uniform welding quality is achieved, and production efficiency is improved.

[0328] A concrete example of a prompt message for a generated AI model is: "Analyze the welding motion from the video data. Extract the motion characteristics and convert them into motion instructions for a factory robot."

[0329] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0330] Step 1:

[0331] The user uses a recording device to record the high-precision movements of a craftsman. In this process, the camera captures the craftsman's movements as video data, which is then saved to the device. The input is the craftsman's movements, and the output is high-resolution video data.

[0332] Step 2:

[0333] The terminal sends the captured video data to the server. The server supplies the received data to a data processing system, which uses libraries such as OpenCV to decompose the video data into individual frames. The input is video data, and the output is the video data decomposed into frames.

[0334] Step 3:

[0335] The server inputs the decomposed image data into a generating AI model, which then analyzes the characteristics of the craftsman's movements using TensorFlow and PyTorch. The goal of the analysis is to extract the core elements of the movements. The input is image data for each frame, and the output is the extracted technical information.

[0336] Step 4:

[0337] The server passes the extracted technical information to an information conversion system, which converts it into a format that can be passed down to future generations. This conversion process formats the technical information into text or guideline format. The input is the technical information, and the output is the converted guideline.

[0338] Step 5:

[0339] The server provides the converted guidelines to the user. The user is then helped to acquire the necessary skills using these guidelines. The input is the converted guidelines, and the output is the guide information presented to the user.

[0340] Step 6:

[0341] The server further generates the necessary motion instructions for the robot control device and provides them to the robot via the control means. This allows the robot to replicate the movements of a skilled worker. The input is technical information, and the output is control signals.

[0342] Step 7:

[0343] The user sends the results as feedback from their device to the server. The server collects this feedback and uses it to improve the accuracy of the generated AI model and the overall system. The input is user feedback, and the output is information for improving the system.

[0344] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0345] This invention provides a system that, in addition to inheriting traditional techniques, enables flexible learning support that takes into account the user's emotional state. This system consists of a generation AI that analyzes video data and an emotion engine that analyzes the user's emotional state.

[0346] Data acquisition and preprocessing

[0347] Users film the technical actions performed by craftsmen and send this video data to the server via their devices. It is recommended that this video data be filmed in high definition from multiple angles. After receiving this data, the server performs pre-processing such as noise reduction and image quality adjustment to prepare it for analysis.

[0348] Motion analysis and conversion to technical information

[0349] The server uses data processing tools to analyze video data and accurately recognize the hand movements and technical actions of the craftsman. The analyzed data is converted into technical information by information conversion tools, and detailed guidelines are created outlining specific procedures, tools to be used, and points to note.

[0350] Analysis of user state using an emotion engine

[0351] The server utilizes audio data and additional video data from the user to analyze the user's emotional state in real time using an emotion engine. This engine identifies the levels of stress and motivation the user experiences during learning and enables appropriate responses.

[0352] Providing guidelines and optimizing the user experience

[0353] The device provides the user with generated guidelines, but their content and order are personalized based on insights from the emotion engine. For example, if the user is feeling anxious, detailed explanations and reminders can be added for steps that require particular attention. If motivation is low, suggestions to start with easier steps will be made.

[0354] Feedback processing and continuous system improvement

[0355] Users send learning results, impressions, and assignments as feedback from their devices to the server. Based on this feedback, the server accumulates information to improve technical information, guidelines, and the emotion engine algorithm, thereby enhancing the user experience in the future.

[0356] For example, when transmitting the techniques of the tea ceremony, the system can detect tension through the user's facial expressions and tone of voice, and then display advice for relaxation and guidance on specific breathing techniques. In this way, the present invention is groundbreaking not only in the transmission of techniques but also in providing a learning experience that takes the user's emotions into consideration.

[0357] The following describes the processing flow.

[0358] Step 1:

[0359] Users will use a high-quality camera to record videos of the craftsman's techniques and save the data to their device. It is recommended to film from multiple viewpoints to capture detailed hand movements.

[0360] Step 2:

[0361] The device uploads the captured video data to the server. During the upload process, the data format is converted to one suitable for analysis, enabling efficient data transfer.

[0362] Step 3:

[0363] The server receives the uploaded video data and performs pre-processing such as noise reduction and brightness adjustment. This optimizes the data for analysis.

[0364] Step 4:

[0365] The server analyzes pre-processed data using a generative AI model to recognize the hand movements and technical actions of the craftsman. This process identifies the flow of movement and key points of action.

[0366] Step 5:

[0367] The server extracts technical information based on recognized operational data and uses information conversion means to transform it into guidelines that can be passed down to future generations. The guidelines include detailed instructions for each step and the intent behind the actions.

[0368] Step 6:

[0369] The terminal displays guidelines provided by the server to the user. The user can then practice the technology while referring to these guidelines.

[0370] Step 7:

[0371] The server uses additional audio and video data provided by the user to activate the emotion engine and analyze the user's emotional state in real time during the learning process.

[0372] Step 8:

[0373] The user's emotional data, analyzed by the emotion engine, is used to adjust the learning content. For example, if the user is feeling stressed, simple steps are emphasized, and if motivation needs to be increased, encouraging messages are added.

[0374] Step 9:

[0375] After putting the technology into practice, users send feedback to the server via their device. This feedback includes the results of the practice, any problems encountered, and changes in their feelings.

[0376] Step 10:

[0377] The server analyzes the feedback and improves technical information, guidelines, and the sentiment engine algorithms. These improvements are intended to make the user experience better in the future.

[0378] (Example 2)

[0379] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0380] Conventional technology transfer systems have problems such as inaccurate motion analysis and a lack of personalized learning support that takes into account the user's emotional state. Furthermore, there are challenges in effectively utilizing user feedback and continuously improving the system.

[0381] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0382] In this invention, the server includes information processing means for receiving video data acquired by an imaging device and analyzing the target's actions based on the video data; information conversion means for extracting technical information based on the analyzed action data and converting it into a transferable format; and guide generation means for providing the converted technical information to the user and generating guidance guidelines for practical application. This enables accurate analysis of the technology and personalized learning support that takes into account emotional states.

[0383] An "imaging device" is a device used to acquire video data, and typically includes cameras or the camera function of a smartphone.

[0384] "Reception" refers to the acquisition of data sent from another device or instrument.

[0385] "Video data" refers to moving image data acquired by an imaging device, which is used for motion analysis.

[0386] An "information processing means" is a device that has the function of analyzing video data to recognize the actions of a subject and extracting information about those actions.

[0387] An "information conversion means" is a means for converting technical information into a specific format based on analyzed operational information.

[0388] A "guide generation method" is a means of generating practical guidance guidelines for users based on converted technical information.

[0389] "Emotional analysis means" refers to a method for analyzing a user's voice and video data to analyze their emotional state in real time.

[0390] "Network data storage" refers to a database that stores relevant technical information and makes it accessible online.

[0391] A "feedback processing mechanism" is a means that receives opinions from users and processes them to help improve the system.

[0392] The embodiments for carrying out the present invention are described below.

[0393] The system of the present invention provides flexible learning support that takes into account the transmission of traditional techniques and the emotional state of the user, through the collaboration of the user, terminal, and server. First, the user acquires video data of the technique to be learned using an imaging device, such as a smartphone or a dedicated camera. This video data is transmitted to the server via the user's terminal.

[0394] The server uses information processing tools to analyze the received video data in order to process it quickly and accurately. Processing of video image data includes pre-processing such as noise reduction and image quality correction. The generative AI model used here analyzes the technical operations from the video data in real time, and based on the results, extracts and formats specific technical information using information conversion tools.

[0395] Furthermore, the server uses emotion analysis to evaluate the user's emotional state in real time based on audio data and additional video data received from the user. Based on this evaluation, the generated technical information is personalized by a guide generation mechanism, providing more effective learning support. The terminal presents personalized guidance to the user at the appropriate time, improving the user's learning experience.

[0396] For example, if emotional analysis reveals that a user is feeling tense during the process of learning the tea ceremony, the server will add instructions on breathing techniques to promote relaxation to the instruction guidelines. This information will then be displayed on the user's device.

[0397] An example of a prompt message could be, "I would like to learn the basic movements of the tea ceremony. I sometimes feel nervous, so please also teach me how to relax." In this way, the present invention efficiently transfers technical knowledge and provides a learning experience that incorporates individualized considerations.

[0398] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0399] Step 1:

[0400] The user acquires video footage of the traditional technique to be studied using an imaging device. It is recommended to use a smartphone or dedicated camera to capture high-definition video from multiple angles. The input is the acquired video data, and the output is a video file stored on the user's device.

[0401] Step 2:

[0402] The user sends video data captured using their device to the server. The input is a video file stored on the user's device, and the output is the video data received by the server.

[0403] Step 3:

[0404] The server uses information processing tools to pre-process the received video data, performing tasks such as noise reduction and image quality correction. This prepares the data for high-quality analysis. The input is the video data received by the server, and the output is the pre-processed video data.

[0405] Step 4:

[0406] The server analyzes pre-processed video data using a generative AI model to identify and extract target actions. This yields the results of the technical action analysis. The input is pre-processed video data, and the output is the analyzed action data.

[0407] Step 5:

[0408] The server converts the analyzed operational data into technical information through an information conversion mechanism. This process formats the data into a format that includes specific procedures, tools used, and points to note. The input is the analyzed operational data, and the output is guideline data as technical information.

[0409] Step 6:

[0410] The server analyzes audio and additional video data from the user in real time and uses sentiment analysis to understand the user's emotional state. The input is audio and video data from the user, and the output is the analysis result of the user's emotional state.

[0411] Step 7:

[0412] The server personalizes technical information based on sentiment analysis results and generates optimized guidance guidelines using a guide generation mechanism. The input consists of guideline data as technical information and the results of emotional state analysis, while the output is personalized guidance guidelines.

[0413] Step 8:

[0414] The terminal displays personalized guidance received from the server to the user. The input is the guidance received from the server, and the output is the guidance content displayed to the user.

[0415] Step 9:

[0416] Users learn based on the instructional guidelines and send feedback to the server via their device. The input is the user's learning results and feedback, while the output is the feedback information sent to the server.

[0417] Step 10:

[0418] The server uses user feedback to improve technical information, guidelines, and sentiment analysis algorithms, optimizing them for future system use. Input is user feedback, and output is improved technical information and system functionality.

[0419] (Application Example 2)

[0420] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0421] In the transmission of traditional techniques, there is a problem in providing instruction optimized for each individual learner. Furthermore, there is a need to provide flexible learning support that takes into account the emotional state of learners, but conventional systems have made it difficult to achieve this. In addition, there is a need for a reliable method to accurately analyze the movements of the techniques themselves and transmit them to future generations as technical information.

[0422] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0423] In this invention, the server includes data processing means for receiving video data acquired by a camera and analyzing the subject's actions based on the video data; information conversion means for extracting technical data based on the analyzed action data and converting it into a format that can be transmitted to future generations; guide generation means for providing the converted technical data to the user and generating guidelines for implementation; emotion analysis means for analyzing the user's emotional state in real time using audio data and additional video data; and guideline optimization means for individualizing the content and order of the guidelines based on the emotional state. This enables personalized instruction according to the learner's emotional state, making it possible to achieve more effective and user-friendly technology transfer.

[0424] "Photography equipment" refers to devices used to acquire moving image data and to capture the movement of an object from multiple angles.

[0425] A "data processing means" is a device that has the function of analyzing the movement of an object based on video data received from a camera.

[0426] An "information conversion means" is a device that generates technical documents based on analyzed operational data and converts them into a format that can be transmitted to future generations.

[0427] A "guide generation means" is a device that provides converted technical documents to users and has the function of generating guidelines for implementation.

[0428] "Voice data" refers to data obtained from a user's speech and voice, and is used for analyzing their emotional state.

[0429] "Emotional analysis tools" are devices that utilize audio data and additional video data to analyze the user's emotional state in real time.

[0430] A "guideline optimization method" is a device that has the function of individualizing the content and order of guidelines based on emotional state.

[0431] The system for realizing this invention is configured as follows: The server first receives video data acquired using a camera. This data is analyzed by a data processing means using OpenCV or the like to analyze the movement of the object. Based on the analysis results, an information conversion means generates technical data and converts it into a format that can be transmitted to future generations.

[0432] Furthermore, the server utilizes audio data and additional video data to analyze the user's emotional state in real time using frameworks such as TensorFlow and PyTorch. Based on the insights obtained from this emotion analysis, the guide generation system provides the user with transformed technical materials via the terminal, personalizing the guidelines according to the user's emotional state. The guide optimization system takes into account the user's tension and decreased motivation, appropriately adjusting the learning order and content.

[0433] For example, if the system analyzes a user's facial expressions and tone of voice while they are learning the procedures of the Japanese tea ceremony and detects that they are tense, it will display relaxation suggestions and guidance on specific breathing techniques on the screen. In this way, the system can optimize the learning experience according to the user's emotional state.

[0434] By using a generative AI model, one possible example of a prompt message regarding the user's learning progress is: "Analyze how the user is learning the technology, and analyze their emotional state from their facial expressions and voice. Based on this, suggest the optimal learning method."

[0435] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0436] Step 1:

[0437] The server receives video data transmitted from the camera. At this stage, noise reduction and image quality adjustments are performed to prepare the acquired data for proper processing. The input is unprocessed video data, and the output is video data that can be analyzed.

[0438] Step 2:

[0439] The server analyzes the pre-processed video data using OpenCV. This analysis recognizes the movement of the object frame by frame and extracts specific movement information. The input is the analyzable video data obtained in step 1, and the output is detailed information about the movement.

[0440] Step 3:

[0441] The server generates technical documentation based on the analyzed operational information using an information conversion mechanism. This documentation includes text and diagrams and is converted into a format that can be transmitted to future generations. The input is the operational information obtained in step 2, and the output is the technical documentation.

[0442] Step 4:

[0443] The terminal provides the user with generated technical documents. In doing so, it sends audio data and additional video data to a server for real-time analysis of the user's emotional state. The input consists of the technical documents and audio data from the user, while the output is the user's emotional information.

[0444] Step 5:

[0445] The server uses TensorFlow or PyTorch to perform sentiment analysis and estimate the user's emotional state. Based on the insights gained, a guide generation system creates personalized guidelines. The input is the user's emotional information, and the output is personalized guidelines.

[0446] Step 6:

[0447] The device supports learning by displaying user-adjusted guidance based on their emotional state. Specifically, it suggests relaxation when the user is feeling stressed and adds detailed guidance where attention is needed. The input is personalized guidance, and the output is an optimized learning experience.

[0448] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0449] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0450] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0451] [Third Embodiment]

[0452] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0453] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0454] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0455] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0456] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0457] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0458] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0459] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0460] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0461] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0462] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0463] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0464] This invention provides a system that utilizes generative AI to pass on traditional techniques to future generations, and offers a method for analyzing and saving the technical actions of craftsmen as digital data. This system performs a series of processes, including the acquisition of dynamic image data, its analysis, the conversion of technical information, the provision of information to the user, and the processing of feedback.

[0465] Acquisition of video data

[0466] The user uses a camera to record the actions of a craftsman as they perform their technique, and imports the high-resolution video data into their device. It is crucial that this video data accurately captures the movements of the craftsman's hands and tools.

[0467] Motion analysis and extraction of technical information

[0468] The server receives video data transmitted from the terminal and analyzes the movements using data processing equipment. Using a generative AI model, it analyzes the movements and their subtle characteristics, extracting the core elements of the relevant technology. This allows the traditional gestures and movements of the technology to be preserved as digital data.

[0469] Technical information conversion and guide generation

[0470] The server converts the extracted technical information into a format that can be passed down to future generations using information conversion means, and generates guidelines to provide to users. In this process, the guidelines are created that explain in detail practical procedures and the selection of tools to be used, based on the analyzed data.

[0471] Feedback processing and system improvement

[0472] Users implement the technology based on the provided guidelines and send feedback about the process and results from their device to the server. The server uses the information collected through the feedback processing mechanism to improve the quality of the technical information and the overall system.

[0473] As a concrete example, consider the preservation of traditional pottery manufacturing procedures. Users meticulously photograph the craftsman's work process, and the server analyzes this data to record the specific movements of shaping and painting as digital data. The generated guidelines are used as teaching materials for new craftsmen, contributing to the standardization and preservation of skills. This makes it possible to pass on techniques that are in danger of disappearing to future generations.

[0474] The following describes the processing flow.

[0475] Step 1:

[0476] Users capture the technical movements of artisans using a high-quality camera and save the video data to their devices. This filming is done from multiple angles, and it is important to capture the details of the technique.

[0477] Step 2:

[0478] The device uploads the captured video data to the server. During this process, the data format and quality must be converted to meet the server's processing requirements.

[0479] Step 3:

[0480] The server receives the uploaded video data and performs preprocessing such as noise reduction and frame rate adjustment. This process prepares the data for analysis.

[0481] Step 4:

[0482] The server inputs pre-processed data into a generating AI model to analyze the craftsman's movements. This analysis recognizes details such as hand movements, tool usage, and timing of actions.

[0483] Step 5:

[0484] The server extracts the characteristics of the technology based on the analysis results and converts them into technical information in a form that can be passed down to future generations. Using information conversion means, the core elements of the technology are digitized and stored.

[0485] Step 6:

[0486] The server creates guidelines for providing users with the generated technical information. These guidelines include practical procedures, precautions, and information on necessary tools.

[0487] Step 7:

[0488] The device displays the generated guidelines to the user, supporting the practical application of the technology. Users can learn the technology by referring to the provided guidelines.

[0489] Step 8:

[0490] Users practice the technology according to the guidelines and send feedback on their results and areas for improvement to the server via their devices. This feedback is used to improve the technical information.

[0491] Step 9:

[0492] The server analyzes the feedback received and updates technical information and guidelines as needed. This update ensures continuous system improvement.

[0493] (Example 1)

[0494] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0495] The transmission of traditional techniques is an extremely difficult challenge, especially as the number of skilled artisans declines. In particular, there is a lack of appropriate means to accurately understand the technical movements of artisans and pass them on to the next generation. Traditional methods make it difficult to accurately record and transmit the characteristics and subtle movements of skills, raising concerns about the loss of these techniques. Therefore, a system is needed to analyze artisans' movements in detail and ensure the reliable transmission of skills to future generations.

[0496] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0497] In this invention, the server includes data processing means for receiving motion image data acquired by a camera and analyzing the movement of an object based on the motion image data; information conversion means for extracting technical information using a generated AI model based on the analyzed motion data and converting it into a data format that can be passed on to future generations; and guide generation means for providing the converted technical information to the user and generating guidelines that explain detailed procedures for practice and the tools to be used. This makes it possible to accurately analyze the technical movements of craftsmen and save and convert them as digital data. This contributes to the standardization and transmission of technology and realizes the protection of traditional techniques for the future.

[0498] A "camera" is a device used to capture the technical movements of a craftsman in high resolution, and is used to record the craftsman's skills in detail.

[0499] "Motion image data" refers to video-format data that records the actions and techniques of craftsmen acquired by a camera, and it forms the basis for analysis.

[0500] A "server" is a computer system that manages a series of processes, including processing received video and image data, analyzing and converting technical information, and providing it to the user.

[0501] "Data processing means" refers to technology that analyzes video data within a server to understand the actions of the craftsman who is the subject of the analysis.

[0502] A "generative AI model" is a machine learning algorithm used to analyze information extracted from video and image data and derive detailed technical information.

[0503] "Information conversion means" are methods for converting analyzed technical information into a format that can be passed down to future generations, and they play a role in organizing technical information as digital data.

[0504] A "guide generation method" is a means of constructing guidelines that explain procedures and usage methods in a format that is easy for users to understand, based on the converted technical information.

[0505] "Feedback processing means" refers to devices and methods for improving systems and technical information based on feedback received from users.

[0506] To implement this invention, it is possible to analyze, preserve, and transmit the technology following the steps outlined below.

[0507] The user first records the craftsman demonstrating their skills using a high-resolution camera. A portable, high-definition camera like a GoPro is recommended. The recorded video and image data are transferred and saved to the user's device. Laptops and smartphones are used as devices, and file transfer is done via USB cable or Wi-Fi.

[0508] Next, the user uploads this video data to the server via a dedicated web application. The uploaded data is analyzed by a data processing unit on the server. The server performs this analysis using generative AI models such as TensorFlow. This AI model analyzes the specific actions and tool usage of the craftsman frame by frame, extracting the core elements of the technique as data. For example, shaping techniques and painting methods in pottery are analyzed.

[0509] The analyzed data is stored in a database such as MongoDB using an information transformation mechanism. There, it is formalized and organized as data that can be transmitted. Based on this data, the server uses a guide generation mechanism to create guidelines to provide to the user. These guidelines are generated, for example, in JSON format and are provided in a way that allows the user to visually understand the technology, including detailed procedures, tools to be used, and precautions.

[0510] Ultimately, users review the generated guidelines on a dedicated viewer application to aid in their technical learning. Any questions or suggestions for improvement during practical application are sent to the server as feedback, and the server uses this information to further refine the technical details and improve the system.

[0511] A concrete example is the preservation of traditional pottery manufacturing techniques. Users meticulously photograph the work procedures of artisans, and the server analyzes this data to record the specific actions involved in shaping and painting pottery as digital data, generating educational materials useful for training the next generation of artisans. An example of a prompt would be, "Describe the traditional pottery shaping procedure in three steps and list the names of the tools used."

[0512] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0513] Step 1:

[0514] The user collects video data using a recording device to document the craftsman's techniques in high resolution. This input data captures the specific movements of the craftsman's hands and tools. The user saves this video data to their device using a USB cable or Wi-Fi. The output here is a high-resolution video data file.

[0515] Step 2:

[0516] Users upload video and image data stored on their devices to the server. The input is video and image files stored on the user's device, and the output is data stored on the server. Users complete the upload by using a web browser and easily dragging and dropping files through a dedicated online portal.

[0517] Step 3:

[0518] The server passes the received video data to the data processing unit. Based on the video data as input, it launches a generative AI model (e.g., using TensorFlow) to perform analysis. At this stage, the server analyzes the artisan's movements in the video frame by frame and extracts motion vector data and features. The output is a set of digital data that shows the structure of the artisan's movements.

[0519] Step 4:

[0520] The server transforms the extracted behavioral feature information using an information transformation mechanism and stores it in a database. The input is the technical feature data resulting from the analysis, and the output is data in a format that can be passed on to future generations. Here, the server uses a database such as MongoDB as the transformation process to structure and store the technical information.

[0521] Step 5:

[0522] The server creates detailed guidelines using guide generation tools based on structured technical information. It references a technical database as input and generates guidelines to be provided to the user as output. The server provides the guidelines in JSON or other formats, including practical procedures and tools to be used.

[0523] Step 6:

[0524] Users receive and review guidelines provided by the server through a viewer application. The input is guideline data from the server, and the output is guide information in a visually easy-to-understand format. Users can then use this information to implement the technology.

[0525] Step 7:

[0526] Users fill out a form with their practical results and feedback on the guidelines, and send it from their device to the server. The input is feedback information based on the user's practical experience, and the output is recorded as feedback data on the server. This allows for system improvements and increased accuracy of technical information.

[0527] (Application Example 1)

[0528] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0529] Conventional technologies have made it difficult to effectively standardize the skills and techniques of experienced craftsmen and for robots to replicate them in production sites such as factories. As a result, the know-how of highly skilled workers has not been widely shared, leading to challenges in achieving sufficient production efficiency and quality stability.

[0530] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0531] In this invention, the server includes data processing means for receiving motion image data acquired by a camera and analyzing the movement of an object based on the motion image data; information conversion means for extracting technical information based on the analyzed motion data and converting it into a format that can be passed on to future generations; guide generation means for providing the converted technical information to a user and generating guidelines for practical application; and control means for supplying motion instructions to a robot control device so that the robot can reproduce human movements. This makes it possible to teach the skills of skilled craftsmen to robots in the factory, thereby improving production efficiency and product quality.

[0532] A "filming device" is a device used to capture the movements of a craftsman in high resolution and record them as video data.

[0533] "Motion image data" refers to digital data that meticulously records the technical movements of a craftsman, and serves as the basis for analysis.

[0534] "Data processing means" refers to methods and devices for analyzing acquired video data and capturing the movement of an object.

[0535] "Information conversion means" refers to methods and devices for converting analyzed technical information into a format that can be transmitted to future generations.

[0536] A "guide generation means" is a method and apparatus for creating guidelines based on converted technical information so that users can put the technology into practice.

[0537] "Control means" refers to methods and devices that supply operation instructions to a robot control device, enabling the robot to reproduce human movements.

[0538] To realize this invention, a recording device is first required. This recording device acquires the technical actions performed by the craftsman as high-resolution video and generates dynamic image data. This allows for detailed recording of the craftsman's hand movements and the operation of the tools they use.

[0539] Next, the server receives this video data and performs analysis using data processing tools. This analysis uses software libraries such as OpenCV to break down the video data into frames. Then, generative AI models such as TensorFlow and PyTorch are used to extract the actions and characteristics of the craftsman and convert them into technical information in a digital format.

[0540] The converted technical information is processed by information conversion means into a format that can be passed down to future generations. This information is then provided to users as guidelines by guide generation means. These guidelines, including specific procedures and the selection of tools to be used, are utilized as educational materials for learning new technologies.

[0541] Furthermore, based on the analysis results, the control system supplies operation instructions to the robot control device. This allows the robots in the factory to replicate the movements of skilled workers and perform standardized, high-quality work.

[0542] A concrete example is the precision welding process. The movements of a skilled welder are filmed by a camera, and these movements are then programmed into a robot to perform the work. As a result of this process, uniform welding quality is achieved, and production efficiency is improved.

[0543] A concrete example of a prompt message for a generated AI model is: "Analyze the welding motion from the video data. Extract the motion characteristics and convert them into motion instructions for a factory robot."

[0544] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0545] Step 1:

[0546] The user uses a recording device to record the high-precision movements of a craftsman. In this process, the camera captures the craftsman's movements as video data, which is then saved to the device. The input is the craftsman's movements, and the output is high-resolution video data.

[0547] Step 2:

[0548] The terminal sends the captured video data to the server. The server supplies the received data to a data processing system, which uses libraries such as OpenCV to decompose the video data into individual frames. The input is video data, and the output is the video data decomposed into frames.

[0549] Step 3:

[0550] The server inputs the decomposed image data into a generating AI model, which then analyzes the characteristics of the craftsman's movements using TensorFlow and PyTorch. The goal of the analysis is to extract the core elements of the movements. The input is image data for each frame, and the output is the extracted technical information.

[0551] Step 4:

[0552] The server passes the extracted technical information to an information conversion system, which converts it into a format that can be passed down to future generations. This conversion process formats the technical information into text or guideline format. The input is the technical information, and the output is the converted guideline.

[0553] Step 5:

[0554] The server provides the converted guidelines to the user. The user is then helped to acquire the necessary skills using these guidelines. The input is the converted guidelines, and the output is the guide information presented to the user.

[0555] Step 6:

[0556] The server further generates the necessary motion instructions for the robot control device and provides them to the robot via the control means. This allows the robot to replicate the movements of a skilled worker. The input is technical information, and the output is control signals.

[0557] Step 7:

[0558] The user sends the results as feedback from their device to the server. The server collects this feedback and uses it to improve the accuracy of the generated AI model and the overall system. The input is user feedback, and the output is information for improving the system.

[0559] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0560] This invention provides a system that, in addition to inheriting traditional techniques, enables flexible learning support that takes into account the user's emotional state. This system consists of a generation AI that analyzes video data and an emotion engine that analyzes the user's emotional state.

[0561] Data acquisition and preprocessing

[0562] Users film the technical actions performed by craftsmen and send this video data to the server via their devices. It is recommended that this video data be filmed in high definition from multiple angles. After receiving this data, the server performs pre-processing such as noise reduction and image quality adjustment to prepare it for analysis.

[0563] Motion analysis and conversion to technical information

[0564] The server uses data processing tools to analyze video data and accurately recognize the hand movements and technical actions of the craftsman. The analyzed data is converted into technical information by information conversion tools, and detailed guidelines are created outlining specific procedures, tools to be used, and points to note.

[0565] Analysis of user state using an emotion engine

[0566] The server utilizes audio data and additional video data from the user to analyze the user's emotional state in real time using an emotion engine. This engine identifies the levels of stress and motivation the user experiences during learning and enables appropriate responses.

[0567] Providing guidelines and optimizing the user experience

[0568] The device provides the user with generated guidelines, but their content and order are personalized based on insights from the emotion engine. For example, if the user is feeling anxious, detailed explanations and reminders can be added for steps that require particular attention. If motivation is low, suggestions to start with easier steps will be made.

[0569] Feedback processing and continuous system improvement

[0570] Users send learning results, impressions, and assignments as feedback from their devices to the server. Based on this feedback, the server accumulates information to improve technical information, guidelines, and the emotion engine algorithm, thereby enhancing the user experience in the future.

[0571] For example, when transmitting the techniques of the tea ceremony, the system can detect tension through the user's facial expressions and tone of voice, and then display advice for relaxation and guidance on specific breathing techniques. In this way, the present invention is groundbreaking not only in the transmission of techniques but also in providing a learning experience that takes the user's emotions into consideration.

[0572] The following describes the processing flow.

[0573] Step 1:

[0574] Users will use a high-quality camera to record videos of the craftsman's techniques and save the data to their device. It is recommended to film from multiple viewpoints to capture detailed hand movements.

[0575] Step 2:

[0576] The device uploads the captured video data to the server. During the upload process, the data format is converted to one suitable for analysis, enabling efficient data transfer.

[0577] Step 3:

[0578] The server receives the uploaded video data and performs pre-processing such as noise reduction and brightness adjustment. This optimizes the data for analysis.

[0579] Step 4:

[0580] The server analyzes pre-processed data using a generative AI model to recognize the hand movements and technical actions of the craftsman. This process identifies the flow of movement and key points of action.

[0581] Step 5:

[0582] The server extracts technical information based on recognized operational data and uses information conversion means to transform it into guidelines that can be passed down to future generations. The guidelines include detailed instructions for each step and the intent behind the actions.

[0583] Step 6:

[0584] The terminal displays guidelines provided by the server to the user. The user can then practice the technology while referring to these guidelines.

[0585] Step 7:

[0586] The server uses additional audio and video data provided by the user to activate the emotion engine and analyze the user's emotional state in real time during the learning process.

[0587] Step 8:

[0588] The user's emotional data, analyzed by the emotion engine, is used to adjust the learning content. For example, if the user is feeling stressed, simple steps are emphasized, and if motivation needs to be increased, encouraging messages are added.

[0589] Step 9:

[0590] After putting the technology into practice, users send feedback to the server via their device. This feedback includes the results of the practice, any problems encountered, and changes in their feelings.

[0591] Step 10:

[0592] The server analyzes the feedback and improves technical information, guidelines, and the sentiment engine algorithms. These improvements are intended to make the user experience better in the future.

[0593] (Example 2)

[0594] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0595] Conventional technology transfer systems have problems such as inaccurate motion analysis and a lack of personalized learning support that takes into account the user's emotional state. Furthermore, there are challenges in effectively utilizing user feedback and continuously improving the system.

[0596] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0597] In this invention, the server includes information processing means for receiving video data acquired by an imaging device and analyzing the target's actions based on the video data; information conversion means for extracting technical information based on the analyzed action data and converting it into a transferable format; and guide generation means for providing the converted technical information to the user and generating guidance guidelines for practical application. This enables accurate analysis of the technology and personalized learning support that takes into account emotional states.

[0598] An "imaging device" is a device used to acquire video data, and typically includes cameras or the camera function of a smartphone.

[0599] "Reception" refers to the acquisition of data sent from another device or instrument.

[0600] "Video data" refers to moving image data acquired by an imaging device, which is used for motion analysis.

[0601] An "information processing means" is a device that has the function of analyzing video data to recognize the actions of a subject and extracting information about those actions.

[0602] An "information conversion means" is a means for converting technical information into a specific format based on analyzed operational information.

[0603] A "guide generation method" is a means of generating practical guidance guidelines for users based on converted technical information.

[0604] "Emotional analysis means" refers to a method for analyzing a user's voice and video data to analyze their emotional state in real time.

[0605] "Network data storage" refers to a database that stores relevant technical information and makes it accessible online.

[0606] A "feedback processing mechanism" is a means that receives opinions from users and processes them to help improve the system.

[0607] The embodiments for carrying out the present invention are described below.

[0608] The system of the present invention provides flexible learning support that takes into account the transmission of traditional techniques and the emotional state of the user, through the collaboration of the user, terminal, and server. First, the user acquires video data of the technique to be learned using an imaging device, such as a smartphone or a dedicated camera. This video data is transmitted to the server via the user's terminal.

[0609] The server uses information processing tools to analyze the received video data in order to process it quickly and accurately. Processing of video image data includes pre-processing such as noise reduction and image quality correction. The generative AI model used here analyzes the technical operations from the video data in real time, and based on the results, extracts and formats specific technical information using information conversion tools.

[0610] Furthermore, the server uses emotion analysis to evaluate the user's emotional state in real time based on audio data and additional video data received from the user. Based on this evaluation, the generated technical information is personalized by a guide generation mechanism, providing more effective learning support. The terminal presents personalized guidance to the user at the appropriate time, improving the user's learning experience.

[0611] For example, if emotional analysis reveals that a user is feeling tense during the process of learning the tea ceremony, the server will add instructions on breathing techniques to promote relaxation to the instruction guidelines. This information will then be displayed on the user's device.

[0612] An example of a prompt message could be, "I would like to learn the basic movements of the tea ceremony. I sometimes feel nervous, so please also teach me how to relax." In this way, the present invention efficiently transfers technical knowledge and provides a learning experience that incorporates individualized considerations.

[0613] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0614] Step 1:

[0615] The user acquires video footage of the traditional technique to be studied using an imaging device. It is recommended to use a smartphone or dedicated camera to capture high-definition video from multiple angles. The input is the acquired video data, and the output is a video file stored on the user's device.

[0616] Step 2:

[0617] The user sends video data captured using their device to the server. The input is a video file stored on the user's device, and the output is the video data received by the server.

[0618] Step 3:

[0619] The server uses information processing tools to pre-process the received video data, performing tasks such as noise reduction and image quality correction. This prepares the data for high-quality analysis. The input is the video data received by the server, and the output is the pre-processed video data.

[0620] Step 4:

[0621] The server analyzes pre-processed video data using a generative AI model to identify and extract target actions. This yields the results of the technical action analysis. The input is pre-processed video data, and the output is the analyzed action data.

[0622] Step 5:

[0623] The server converts the analyzed operational data into technical information through an information conversion mechanism. This process formats the data into a format that includes specific procedures, tools used, and points to note. The input is the analyzed operational data, and the output is guideline data as technical information.

[0624] Step 6:

[0625] The server analyzes audio and additional video data from the user in real time and uses sentiment analysis to understand the user's emotional state. The input is audio and video data from the user, and the output is the analysis result of the user's emotional state.

[0626] Step 7:

[0627] The server personalizes technical information based on sentiment analysis results and generates optimized guidance guidelines using a guide generation mechanism. The input consists of guideline data as technical information and the results of emotional state analysis, while the output is personalized guidance guidelines.

[0628] Step 8:

[0629] The terminal displays personalized guidance received from the server to the user. The input is the guidance received from the server, and the output is the guidance content displayed to the user.

[0630] Step 9:

[0631] Users learn based on the instructional guidelines and send feedback to the server via their device. The input is the user's learning results and feedback, while the output is the feedback information sent to the server.

[0632] Step 10:

[0633] The server uses user feedback to improve technical information, guidelines, and sentiment analysis algorithms, optimizing them for future system use. Input is user feedback, and output is improved technical information and system functionality.

[0634] (Application Example 2)

[0635] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0636] In the transmission of traditional techniques, there is a problem in providing instruction optimized for each individual learner. Furthermore, there is a need to provide flexible learning support that takes into account the emotional state of learners, but conventional systems have made it difficult to achieve this. In addition, there is a need for a reliable method to accurately analyze the movements of the techniques themselves and transmit them to future generations as technical information.

[0637] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0638] In this invention, the server includes data processing means for receiving video data acquired by a camera and analyzing the subject's actions based on the video data; information conversion means for extracting technical data based on the analyzed action data and converting it into a format that can be transmitted to future generations; guide generation means for providing the converted technical data to the user and generating guidelines for implementation; emotion analysis means for analyzing the user's emotional state in real time using audio data and additional video data; and guideline optimization means for individualizing the content and order of the guidelines based on the emotional state. This enables personalized instruction according to the learner's emotional state, making it possible to achieve more effective and user-friendly technology transfer.

[0639] "Photography equipment" refers to devices used to acquire moving image data and to capture the movement of an object from multiple angles.

[0640] A "data processing means" is a device that has the function of analyzing the movement of an object based on video data received from a camera.

[0641] An "information conversion means" is a device that generates technical documents based on analyzed operational data and converts them into a format that can be transmitted to future generations.

[0642] A "guide generation means" is a device that provides converted technical documents to users and has the function of generating guidelines for implementation.

[0643] "Voice data" refers to data obtained from a user's speech and voice, and is used for analyzing their emotional state.

[0644] "Emotional analysis tools" are devices that utilize audio data and additional video data to analyze the user's emotional state in real time.

[0645] A "guideline optimization method" is a device that has the function of individualizing the content and order of guidelines based on emotional state.

[0646] The system for realizing this invention is configured as follows: The server first receives video data acquired using a camera. This data is analyzed by a data processing means using OpenCV or the like to analyze the movement of the object. Based on the analysis results, an information conversion means generates technical data and converts it into a format that can be transmitted to future generations.

[0647] Furthermore, the server utilizes audio data and additional video data to analyze the user's emotional state in real time using frameworks such as TensorFlow and PyTorch. Based on the insights obtained from this emotion analysis, the guide generation system provides the user with transformed technical materials via the terminal, personalizing the guidelines according to the user's emotional state. The guide optimization system takes into account the user's tension and decreased motivation, appropriately adjusting the learning order and content.

[0648] For example, if the system analyzes a user's facial expressions and tone of voice while they are learning the procedures of the Japanese tea ceremony and detects that they are tense, it will display relaxation suggestions and guidance on specific breathing techniques on the screen. In this way, the system can optimize the learning experience according to the user's emotional state.

[0649] By using a generative AI model, one possible example of a prompt message regarding the user's learning progress is: "Analyze how the user is learning the technology, and analyze their emotional state from their facial expressions and voice. Based on this, suggest the optimal learning method."

[0650] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0651] Step 1:

[0652] The server receives video data transmitted from the camera. At this stage, noise reduction and image quality adjustments are performed to prepare the acquired data for proper processing. The input is unprocessed video data, and the output is video data that can be analyzed.

[0653] Step 2:

[0654] The server analyzes the pre-processed video data using OpenCV. This analysis recognizes the movement of the object frame by frame and extracts specific movement information. The input is the analyzable video data obtained in step 1, and the output is detailed information about the movement.

[0655] Step 3:

[0656] The server generates technical documentation based on the analyzed operational information using an information conversion mechanism. This documentation includes text and diagrams and is converted into a format that can be transmitted to future generations. The input is the operational information obtained in step 2, and the output is the technical documentation.

[0657] Step 4:

[0658] The terminal provides the user with generated technical documents. In doing so, it sends audio data and additional video data to a server for real-time analysis of the user's emotional state. The input consists of the technical documents and audio data from the user, while the output is the user's emotional information.

[0659] Step 5:

[0660] The server uses TensorFlow or PyTorch to perform sentiment analysis and estimate the user's emotional state. Based on the insights gained, a guide generation system creates personalized guidelines. The input is the user's emotional information, and the output is personalized guidelines.

[0661] Step 6:

[0662] The device supports learning by displaying user-adjusted guidance based on their emotional state. Specifically, it suggests relaxation when the user is feeling stressed and adds detailed guidance where attention is needed. The input is personalized guidance, and the output is an optimized learning experience.

[0663] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0664] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0665] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0666] [Fourth Embodiment]

[0667] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0668] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0669] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0670] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0671] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0672] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0673] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0674] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0675] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0676] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0677] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0678] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0679] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0680] This invention provides a system that utilizes generative AI to pass on traditional techniques to future generations, and offers a method for analyzing and saving the technical actions of craftsmen as digital data. This system performs a series of processes, including the acquisition of dynamic image data, its analysis, the conversion of technical information, the provision of information to the user, and the processing of feedback.

[0681] Acquisition of video data

[0682] The user uses a camera to record the actions of a craftsman as they perform their technique, and imports the high-resolution video data into their device. It is crucial that this video data accurately captures the movements of the craftsman's hands and tools.

[0683] Motion analysis and extraction of technical information

[0684] The server receives video data transmitted from the terminal and analyzes the movements using data processing equipment. Using a generative AI model, it analyzes the movements and their subtle characteristics, extracting the core elements of the relevant technology. This allows the traditional gestures and movements of the technology to be preserved as digital data.

[0685] Technical information conversion and guide generation

[0686] The server converts the extracted technical information into a format that can be passed down to future generations using information conversion means, and generates guidelines to provide to users. In this process, the guidelines are created that explain in detail practical procedures and the selection of tools to be used, based on the analyzed data.

[0687] Feedback processing and system improvement

[0688] Users implement the technology based on the provided guidelines and send feedback about the process and results from their device to the server. The server uses the information collected through the feedback processing mechanism to improve the quality of the technical information and the overall system.

[0689] As a concrete example, consider the preservation of traditional pottery manufacturing procedures. Users meticulously photograph the craftsman's work process, and the server analyzes this data to record the specific movements of shaping and painting as digital data. The generated guidelines are used as teaching materials for new craftsmen, contributing to the standardization and preservation of skills. This makes it possible to pass on techniques that are in danger of disappearing to future generations.

[0690] The following describes the processing flow.

[0691] Step 1:

[0692] Users capture the technical movements of artisans using a high-quality camera and save the video data to their devices. This filming is done from multiple angles, and it is important to capture the details of the technique.

[0693] Step 2:

[0694] The device uploads the captured video data to the server. During this process, the data format and quality must be converted to meet the server's processing requirements.

[0695] Step 3:

[0696] The server receives the uploaded video data and performs preprocessing such as noise reduction and frame rate adjustment. This process prepares the data for analysis.

[0697] Step 4:

[0698] The server inputs pre-processed data into a generating AI model to analyze the craftsman's movements. This analysis recognizes details such as hand movements, tool usage, and timing of actions.

[0699] Step 5:

[0700] The server extracts the characteristics of the technology based on the analysis results and converts them into technical information in a form that can be passed down to future generations. Using information conversion means, the core elements of the technology are digitized and stored.

[0701] Step 6:

[0702] The server creates guidelines for providing users with the generated technical information. These guidelines include practical procedures, precautions, and information on necessary tools.

[0703] Step 7:

[0704] The device displays the generated guidelines to the user, supporting the practical application of the technology. Users can learn the technology by referring to the provided guidelines.

[0705] Step 8:

[0706] Users practice the technology according to the guidelines and send feedback on their results and areas for improvement to the server via their devices. This feedback is used to improve the technical information.

[0707] Step 9:

[0708] The server analyzes the feedback received and updates technical information and guidelines as needed. This update ensures continuous system improvement.

[0709] (Example 1)

[0710] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0711] The transmission of traditional techniques is an extremely difficult challenge, especially as the number of skilled artisans declines. In particular, there is a lack of appropriate means to accurately understand the technical movements of artisans and pass them on to the next generation. Traditional methods make it difficult to accurately record and transmit the characteristics and subtle movements of skills, raising concerns about the loss of these techniques. Therefore, a system is needed to analyze artisans' movements in detail and ensure the reliable transmission of skills to future generations.

[0712] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0713] In this invention, the server includes data processing means for receiving motion image data acquired by a camera and analyzing the movement of an object based on the motion image data; information conversion means for extracting technical information using a generated AI model based on the analyzed motion data and converting it into a data format that can be passed on to future generations; and guide generation means for providing the converted technical information to the user and generating guidelines that explain detailed procedures for practice and the tools to be used. This makes it possible to accurately analyze the technical movements of craftsmen and save and convert them as digital data. This contributes to the standardization and transmission of technology and realizes the protection of traditional techniques for the future.

[0714] A "camera" is a device used to capture the technical movements of a craftsman in high resolution, and is used to record the craftsman's skills in detail.

[0715] "Motion image data" refers to video-format data that records the actions and techniques of craftsmen acquired by a camera, and it forms the basis for analysis.

[0716] A "server" is a computer system that manages a series of processes, including processing received video and image data, analyzing and converting technical information, and providing it to the user.

[0717] "Data processing means" refers to technology that analyzes video data within a server to understand the actions of the craftsman who is the subject of the analysis.

[0718] A "generative AI model" is a machine learning algorithm used to analyze information extracted from video and image data and derive detailed technical information.

[0719] "Information conversion means" are methods for converting analyzed technical information into a format that can be passed down to future generations, and they play a role in organizing technical information as digital data.

[0720] A "guide generation method" is a means of constructing guidelines that explain procedures and usage methods in a format that is easy for users to understand, based on the converted technical information.

[0721] "Feedback processing means" refers to devices and methods for improving systems and technical information based on feedback received from users.

[0722] To implement this invention, it is possible to analyze, preserve, and transmit the technology following the steps outlined below.

[0723] The user first records the craftsman demonstrating their skills using a high-resolution camera. A portable, high-definition camera like a GoPro is recommended. The recorded video and image data are transferred and saved to the user's device. Laptops and smartphones are used as devices, and file transfer is done via USB cable or Wi-Fi.

[0724] Next, the user uploads this video data to the server via a dedicated web application. The uploaded data is analyzed by a data processing unit on the server. The server performs this analysis using generative AI models such as TensorFlow. This AI model analyzes the specific actions and tool usage of the craftsman frame by frame, extracting the core elements of the technique as data. For example, shaping techniques and painting methods in pottery are analyzed.

[0725] The analyzed data is stored in a database such as MongoDB using an information transformation mechanism. There, it is formalized and organized as data that can be transmitted. Based on this data, the server uses a guide generation mechanism to create guidelines to provide to the user. These guidelines are generated, for example, in JSON format and are provided in a way that allows the user to visually understand the technology, including detailed procedures, tools to be used, and precautions.

[0726] Ultimately, users review the generated guidelines on a dedicated viewer application to aid in their technical learning. Any questions or suggestions for improvement during practical application are sent to the server as feedback, and the server uses this information to further refine the technical details and improve the system.

[0727] A concrete example is the preservation of traditional pottery manufacturing techniques. Users meticulously photograph the work procedures of artisans, and the server analyzes this data to record the specific actions involved in shaping and painting pottery as digital data, generating educational materials useful for training the next generation of artisans. An example of a prompt would be, "Describe the traditional pottery shaping procedure in three steps and list the names of the tools used."

[0728] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0729] Step 1:

[0730] The user collects video data using a recording device to document the craftsman's techniques in high resolution. This input data captures the specific movements of the craftsman's hands and tools. The user saves this video data to their device using a USB cable or Wi-Fi. The output here is a high-resolution video data file.

[0731] Step 2:

[0732] Users upload video and image data stored on their devices to the server. The input is video and image files stored on the user's device, and the output is data stored on the server. Users complete the upload by using a web browser and easily dragging and dropping files through a dedicated online portal.

[0733] Step 3:

[0734] The server passes the received video data to the data processing unit. Based on the video data as input, it launches a generative AI model (e.g., using TensorFlow) to perform analysis. At this stage, the server analyzes the artisan's movements in the video frame by frame and extracts motion vector data and features. The output is a set of digital data that shows the structure of the artisan's movements.

[0735] Step 4:

[0736] The server transforms the extracted behavioral feature information using an information transformation mechanism and stores it in a database. The input is the technical feature data resulting from the analysis, and the output is data in a format that can be passed on to future generations. Here, the server uses a database such as MongoDB as the transformation process to structure and store the technical information.

[0737] Step 5:

[0738] The server creates detailed guidelines using guide generation tools based on structured technical information. It references a technical database as input and generates guidelines to be provided to the user as output. The server provides the guidelines in JSON or other formats, including practical procedures and tools to be used.

[0739] Step 6:

[0740] Users receive and review guidelines provided by the server through a viewer application. The input is guideline data from the server, and the output is guide information in a visually easy-to-understand format. Users can then use this information to implement the technology.

[0741] Step 7:

[0742] Users fill out a form with their practical results and feedback on the guidelines, and send it from their device to the server. The input is feedback information based on the user's practical experience, and the output is recorded as feedback data on the server. This allows for system improvements and increased accuracy of technical information.

[0743] (Application Example 1)

[0744] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0745] Conventional technologies have made it difficult to effectively standardize the skills and techniques of experienced craftsmen and for robots to replicate them in production sites such as factories. As a result, the know-how of highly skilled workers has not been widely shared, leading to challenges in achieving sufficient production efficiency and quality stability.

[0746] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0747] In this invention, the server includes data processing means for receiving motion image data acquired by a camera and analyzing the movement of an object based on the motion image data; information conversion means for extracting technical information based on the analyzed motion data and converting it into a format that can be passed on to future generations; guide generation means for providing the converted technical information to a user and generating guidelines for practical application; and control means for supplying motion instructions to a robot control device so that the robot can reproduce human movements. This makes it possible to teach the skills of skilled craftsmen to robots in the factory, thereby improving production efficiency and product quality.

[0748] A "filming device" is a device used to capture the movements of a craftsman in high resolution and record them as video data.

[0749] "Motion image data" refers to digital data that meticulously records the technical movements of a craftsman, and serves as the basis for analysis.

[0750] "Data processing means" refers to methods and devices for analyzing acquired video data and capturing the movement of an object.

[0751] "Information conversion means" refers to methods and devices for converting analyzed technical information into a format that can be transmitted to future generations.

[0752] A "guide generation means" is a method and apparatus for creating guidelines based on converted technical information so that users can put the technology into practice.

[0753] "Control means" refers to methods and devices that supply operation instructions to a robot control device, enabling the robot to reproduce human movements.

[0754] To realize this invention, a recording device is first required. This recording device acquires the technical actions performed by the craftsman as high-resolution video and generates dynamic image data. This allows for detailed recording of the craftsman's hand movements and the operation of the tools they use.

[0755] Next, the server receives this video data and performs analysis using data processing tools. This analysis uses software libraries such as OpenCV to break down the video data into frames. Then, generative AI models such as TensorFlow and PyTorch are used to extract the actions and characteristics of the craftsman and convert them into technical information in a digital format.

[0756] The converted technical information is processed by information conversion means into a format that can be passed down to future generations. This information is then provided to users as guidelines by guide generation means. These guidelines, including specific procedures and the selection of tools to be used, are utilized as educational materials for learning new technologies.

[0757] Furthermore, based on the analysis results, the control system supplies operation instructions to the robot control device. This allows the robots in the factory to replicate the movements of skilled workers and perform standardized, high-quality work.

[0758] A concrete example is the precision welding process. The movements of a skilled welder are filmed by a camera, and these movements are then programmed into a robot to perform the work. As a result of this process, uniform welding quality is achieved, and production efficiency is improved.

[0759] A concrete example of a prompt message for a generated AI model is: "Analyze the welding motion from the video data. Extract the motion characteristics and convert them into motion instructions for a factory robot."

[0760] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0761] Step 1:

[0762] The user uses a recording device to record the high-precision movements of a craftsman. In this process, the camera captures the craftsman's movements as video data, which is then saved to the device. The input is the craftsman's movements, and the output is high-resolution video data.

[0763] Step 2:

[0764] The terminal sends the captured video data to the server. The server supplies the received data to a data processing system, which uses libraries such as OpenCV to decompose the video data into individual frames. The input is video data, and the output is the video data decomposed into frames.

[0765] Step 3:

[0766] The server inputs the decomposed image data into a generating AI model, which then analyzes the characteristics of the craftsman's movements using TensorFlow and PyTorch. The goal of the analysis is to extract the core elements of the movements. The input is image data for each frame, and the output is the extracted technical information.

[0767] Step 4:

[0768] The server passes the extracted technical information to an information conversion system, which converts it into a format that can be passed down to future generations. This conversion process formats the technical information into text or guideline format. The input is the technical information, and the output is the converted guideline.

[0769] Step 5:

[0770] The server provides the converted guidelines to the user. The user is then helped to acquire the necessary skills using these guidelines. The input is the converted guidelines, and the output is the guide information presented to the user.

[0771] Step 6:

[0772] The server further generates the necessary motion instructions for the robot control device and provides them to the robot via the control means. This allows the robot to replicate the movements of a skilled worker. The input is technical information, and the output is control signals.

[0773] Step 7:

[0774] The user sends the results as feedback from their device to the server. The server collects this feedback and uses it to improve the accuracy of the generated AI model and the overall system. The input is user feedback, and the output is information for improving the system.

[0775] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0776] This invention provides a system that, in addition to inheriting traditional techniques, enables flexible learning support that takes into account the user's emotional state. This system consists of a generation AI that analyzes video data and an emotion engine that analyzes the user's emotional state.

[0777] Data acquisition and preprocessing

[0778] Users film the technical actions performed by craftsmen and send this video data to the server via their devices. It is recommended that this video data be filmed in high definition from multiple angles. After receiving this data, the server performs pre-processing such as noise reduction and image quality adjustment to prepare it for analysis.

[0779] Motion analysis and conversion to technical information

[0780] The server uses data processing tools to analyze video data and accurately recognize the hand movements and technical actions of the craftsman. The analyzed data is converted into technical information by information conversion tools, and detailed guidelines are created outlining specific procedures, tools to be used, and points to note.

[0781] Analysis of user state using an emotion engine

[0782] The server utilizes audio data and additional video data from the user to analyze the user's emotional state in real time using an emotion engine. This engine identifies the levels of stress and motivation the user experiences during learning and enables appropriate responses.

[0783] Providing guidelines and optimizing the user experience

[0784] The device provides the user with generated guidelines, but their content and order are personalized based on insights from the emotion engine. For example, if the user is feeling anxious, detailed explanations and reminders can be added for steps that require particular attention. If motivation is low, suggestions to start with easier steps will be made.

[0785] Feedback processing and continuous system improvement

[0786] Users send learning results, impressions, and assignments as feedback from their devices to the server. Based on this feedback, the server accumulates information to improve technical information, guidelines, and the emotion engine algorithm, thereby enhancing the user experience in the future.

[0787] For example, when transmitting the techniques of the tea ceremony, the system can detect tension through the user's facial expressions and tone of voice, and then display advice for relaxation and guidance on specific breathing techniques. In this way, the present invention is groundbreaking not only in the transmission of techniques but also in providing a learning experience that takes the user's emotions into consideration.

[0788] The following describes the processing flow.

[0789] Step 1:

[0790] Users will use a high-quality camera to record videos of the craftsman's techniques and save the data to their device. It is recommended to film from multiple viewpoints to capture detailed hand movements.

[0791] Step 2:

[0792] The device uploads the captured video data to the server. During the upload process, the data format is converted to one suitable for analysis, enabling efficient data transfer.

[0793] Step 3:

[0794] The server receives the uploaded video data and performs pre-processing such as noise reduction and brightness adjustment. This optimizes the data for analysis.

[0795] Step 4:

[0796] The server analyzes pre-processed data using a generative AI model to recognize the hand movements and technical actions of the craftsman. This process identifies the flow of movement and key points of action.

[0797] Step 5:

[0798] The server extracts technical information based on recognized operational data and uses information conversion means to transform it into guidelines that can be passed down to future generations. The guidelines include detailed instructions for each step and the intent behind the actions.

[0799] Step 6:

[0800] The terminal displays guidelines provided by the server to the user. The user can then practice the technology while referring to these guidelines.

[0801] Step 7:

[0802] The server uses additional audio and video data provided by the user to activate the emotion engine and analyze the user's emotional state in real time during the learning process.

[0803] Step 8:

[0804] The user's emotional data, analyzed by the emotion engine, is used to adjust the learning content. For example, if the user is feeling stressed, simple steps are emphasized, and if motivation needs to be increased, encouraging messages are added.

[0805] Step 9:

[0806] After putting the technology into practice, users send feedback to the server via their device. This feedback includes the results of the practice, any problems encountered, and changes in their feelings.

[0807] Step 10:

[0808] The server analyzes the feedback and improves technical information, guidelines, and the sentiment engine algorithms. These improvements are intended to make the user experience better in the future.

[0809] (Example 2)

[0810] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0811] Conventional technology transfer systems have problems such as inaccurate motion analysis and a lack of personalized learning support that takes into account the user's emotional state. Furthermore, there are challenges in effectively utilizing user feedback and continuously improving the system.

[0812] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0813] In this invention, the server includes information processing means for receiving video data acquired by an imaging device and analyzing the target's actions based on the video data; information conversion means for extracting technical information based on the analyzed action data and converting it into a transferable format; and guide generation means for providing the converted technical information to the user and generating guidance guidelines for practical application. This enables accurate analysis of the technology and personalized learning support that takes into account emotional states.

[0814] An "imaging device" is a device used to acquire video data, and typically includes cameras or the camera function of a smartphone.

[0815] "Reception" refers to the acquisition of data sent from another device or instrument.

[0816] "Video data" refers to moving image data acquired by an imaging device, which is used for motion analysis.

[0817] An "information processing means" is a device that has the function of analyzing video data to recognize the actions of a subject and extracting information about those actions.

[0818] An "information conversion means" is a means for converting technical information into a specific format based on analyzed operational information.

[0819] A "guide generation method" is a means of generating practical guidance guidelines for users based on converted technical information.

[0820] "Emotional analysis means" refers to a method for analyzing a user's voice and video data to analyze their emotional state in real time.

[0821] "Network data storage" refers to a database that stores relevant technical information and makes it accessible online.

[0822] A "feedback processing mechanism" is a means that receives opinions from users and processes them to help improve the system.

[0823] The embodiments for carrying out the present invention are described below.

[0824] The system of the present invention provides flexible learning support that takes into account the transmission of traditional techniques and the emotional state of the user, through the collaboration of the user, terminal, and server. First, the user acquires video data of the technique to be learned using an imaging device, such as a smartphone or a dedicated camera. This video data is transmitted to the server via the user's terminal.

[0825] The server uses information processing tools to analyze the received video data in order to process it quickly and accurately. Processing of video image data includes pre-processing such as noise reduction and image quality correction. The generative AI model used here analyzes the technical operations from the video data in real time, and based on the results, extracts and formats specific technical information using information conversion tools.

[0826] Furthermore, the server uses emotion analysis to evaluate the user's emotional state in real time based on audio data and additional video data received from the user. Based on this evaluation, the generated technical information is personalized by a guide generation mechanism, providing more effective learning support. The terminal presents personalized guidance to the user at the appropriate time, improving the user's learning experience.

[0827] For example, if emotional analysis reveals that a user is feeling tense during the process of learning the tea ceremony, the server will add instructions on breathing techniques to promote relaxation to the instruction guidelines. This information will then be displayed on the user's device.

[0828] An example of a prompt message could be, "I would like to learn the basic movements of the tea ceremony. I sometimes feel nervous, so please also teach me how to relax." In this way, the present invention efficiently transfers technical knowledge and provides a learning experience that incorporates individualized considerations.

[0829] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0830] Step 1:

[0831] The user acquires video footage of the traditional technique to be studied using an imaging device. It is recommended to use a smartphone or dedicated camera to capture high-definition video from multiple angles. The input is the acquired video data, and the output is a video file stored on the user's device.

[0832] Step 2:

[0833] The user sends video data captured using their device to the server. The input is a video file stored on the user's device, and the output is the video data received by the server.

[0834] Step 3:

[0835] The server uses information processing tools to pre-process the received video data, performing tasks such as noise reduction and image quality correction. This prepares the data for high-quality analysis. The input is the video data received by the server, and the output is the pre-processed video data.

[0836] Step 4:

[0837] The server analyzes pre-processed video data using a generative AI model to identify and extract target actions. This yields the results of the technical action analysis. The input is pre-processed video data, and the output is the analyzed action data.

[0838] Step 5:

[0839] The server converts the analyzed operational data into technical information through an information conversion mechanism. This process formats the data into a format that includes specific procedures, tools used, and points to note. The input is the analyzed operational data, and the output is guideline data as technical information.

[0840] Step 6:

[0841] The server analyzes audio and additional video data from the user in real time and uses sentiment analysis to understand the user's emotional state. The input is audio and video data from the user, and the output is the analysis result of the user's emotional state.

[0842] Step 7:

[0843] The server personalizes technical information based on sentiment analysis results and generates optimized guidance guidelines using a guide generation mechanism. The input consists of guideline data as technical information and the results of emotional state analysis, while the output is personalized guidance guidelines.

[0844] Step 8:

[0845] The terminal displays personalized guidance received from the server to the user. The input is the guidance received from the server, and the output is the guidance content displayed to the user.

[0846] Step 9:

[0847] Users learn based on the instructional guidelines and send feedback to the server via their device. The input is the user's learning results and feedback, while the output is the feedback information sent to the server.

[0848] Step 10:

[0849] The server uses user feedback to improve technical information, guidelines, and sentiment analysis algorithms, optimizing them for future system use. Input is user feedback, and output is improved technical information and system functionality.

[0850] (Application Example 2)

[0851] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0852] In the transmission of traditional techniques, there is a problem in providing instruction optimized for each individual learner. Furthermore, there is a need to provide flexible learning support that takes into account the emotional state of learners, but conventional systems have made it difficult to achieve this. In addition, there is a need for a reliable method to accurately analyze the movements of the techniques themselves and transmit them to future generations as technical information.

[0853] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0854] In this invention, the server includes data processing means for receiving video data acquired by a camera and analyzing the subject's actions based on the video data; information conversion means for extracting technical data based on the analyzed action data and converting it into a format that can be transmitted to future generations; guide generation means for providing the converted technical data to the user and generating guidelines for implementation; emotion analysis means for analyzing the user's emotional state in real time using audio data and additional video data; and guideline optimization means for individualizing the content and order of the guidelines based on the emotional state. This enables personalized instruction according to the learner's emotional state, making it possible to achieve more effective and user-friendly technology transfer.

[0855] "Photography equipment" refers to devices used to acquire moving image data and to capture the movement of an object from multiple angles.

[0856] A "data processing means" is a device that has the function of analyzing the movement of an object based on video data received from a camera.

[0857] An "information conversion means" is a device that generates technical documents based on analyzed operational data and converts them into a format that can be transmitted to future generations.

[0858] A "guide generation means" is a device that provides converted technical documents to users and has the function of generating guidelines for implementation.

[0859] "Voice data" refers to data obtained from a user's speech and voice, and is used for analyzing their emotional state.

[0860] "Emotional analysis tools" are devices that utilize audio data and additional video data to analyze the user's emotional state in real time.

[0861] A "guideline optimization method" is a device that has the function of individualizing the content and order of guidelines based on emotional state.

[0862] The system for realizing this invention is configured as follows: The server first receives video data acquired using a camera. This data is analyzed by a data processing means using OpenCV or the like to analyze the movement of the object. Based on the analysis results, an information conversion means generates technical data and converts it into a format that can be transmitted to future generations.

[0863] Furthermore, the server utilizes audio data and additional video data to analyze the user's emotional state in real time using frameworks such as TensorFlow and PyTorch. Based on the insights obtained from this emotion analysis, the guide generation system provides the user with transformed technical materials via the terminal, personalizing the guidelines according to the user's emotional state. The guide optimization system takes into account the user's tension and decreased motivation, appropriately adjusting the learning order and content.

[0864] For example, if the system analyzes a user's facial expressions and tone of voice while they are learning the procedures of the Japanese tea ceremony and detects that they are tense, it will display relaxation suggestions and guidance on specific breathing techniques on the screen. In this way, the system can optimize the learning experience according to the user's emotional state.

[0865] By using a generative AI model, one possible example of a prompt message regarding the user's learning progress is: "Analyze how the user is learning the technology, and analyze their emotional state from their facial expressions and voice. Based on this, suggest the optimal learning method."

[0866] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0867] Step 1:

[0868] The server receives video data transmitted from the camera. At this stage, noise reduction and image quality adjustments are performed to prepare the acquired data for proper processing. The input is unprocessed video data, and the output is video data that can be analyzed.

[0869] Step 2:

[0870] The server analyzes the pre-processed video data using OpenCV. This analysis recognizes the movement of the object frame by frame and extracts specific movement information. The input is the analyzable video data obtained in step 1, and the output is detailed information about the movement.

[0871] Step 3:

[0872] The server generates technical documentation based on the analyzed operational information using an information conversion mechanism. This documentation includes text and diagrams and is converted into a format that can be transmitted to future generations. The input is the operational information obtained in step 2, and the output is the technical documentation.

[0873] Step 4:

[0874] The terminal provides the user with generated technical documents. In doing so, it sends audio data and additional video data to a server for real-time analysis of the user's emotional state. The input consists of the technical documents and audio data from the user, while the output is the user's emotional information.

[0875] Step 5:

[0876] The server uses TensorFlow or PyTorch to perform sentiment analysis and estimate the user's emotional state. Based on the insights gained, a guide generation system creates personalized guidelines. The input is the user's emotional information, and the output is personalized guidelines.

[0877] Step 6:

[0878] The device supports learning by displaying user-adjusted guidance based on their emotional state. Specifically, it suggests relaxation when the user is feeling stressed and adds detailed guidance where attention is needed. The input is personalized guidance, and the output is an optimized learning experience.

[0879] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0880] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0881] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0882] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0883] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0884] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0885] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0886] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0887] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0888] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0889] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0890] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0891] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0892] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0893] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0894] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0895] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0896] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0897] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0898] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0899] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0900] The following is further disclosed regarding the embodiments described above.

[0901] (Claim 1)

[0902] A data processing means that receives motion image data acquired by a camera and analyzes the movement of an object based on the motion image data,

[0903] An information conversion means that extracts technical information based on analyzed operational data and converts it into a format that can be passed down to future generations,

[0904] A guide generation means that provides converted technical information to the user and generates guidelines for practical application,

[0905] A system that includes this.

[0906] (Claim 2)

[0907] The system according to claim 1, further comprising a function to compare motion data analyzed from motion image data with an online database and to add relevant technical information.

[0908] (Claim 3)

[0909] The system according to claim 1, further comprising a feedback processing means for receiving user feedback and using it to improve technical information and guidelines.

[0910] "Example 1"

[0911] (Claim 1)

[0912] A data processing means that receives motion image data acquired by a camera and analyzes the movement of an object based on the motion image data,

[0913] An information conversion means that extracts technical information using a generated AI model based on analyzed motion data and converts it into a data format that can be passed down to future generations,

[0914] A guide generation means that provides converted technical information to the user and generates guidelines that explain detailed procedures and tools to be used for practical application,

[0915] A feedback processing means that receives user feedback and uses that feedback to improve technical information and the entire system,

[0916] A system that includes this.

[0917] (Claim 2)

[0918] The system according to claim 1, further comprising a function to compare the analyzed motion data with a database and to add relevant knowledge in order to improve the accuracy of the technical information.

[0919] (Claim 3)

[0920] The system according to claim 1, characterized in that a user uploads video data to a server via a terminal, and the server performs a procedure to analyze the data.

[0921] "Application Example 1"

[0922] (Claim 1)

[0923] A data processing means that receives motion image data acquired by a camera and analyzes the movement of an object based on the motion image data,

[0924] An information conversion means that extracts technical information based on analyzed operational data and converts it into a format that can be passed down to future generations,

[0925] A guide generation means that provides converted technical information to the user and generates guidelines for practical application,

[0926] A control means that supplies motion instructions to a robot control device, allowing the robot to reproduce human movements,

[0927] A system that includes this.

[0928] (Claim 2)

[0929] The system according to claim 1, further comprising a function to compare motion data analyzed from motion image data with an online database and to add relevant technical information.

[0930] (Claim 3)

[0931] The system according to claim 1, further comprising a feedback processing means for receiving user feedback and using it to improve technical information and guidelines.

[0932] "Example 2 of combining an emotion engine"

[0933] (Claim 1)

[0934] Information processing means that receives video data acquired by an imaging device and analyzes the movement of the target based on the video data,

[0935] An information conversion means that extracts technical information based on the analyzed operational data and converts it into a format that can be inherited,

[0936] A guide generation means that provides converted technical information to users and generates guidance guidelines for practical application,

[0937] An emotion analysis means that analyzes the emotional state in real time using the user's voice data and additional video data,

[0938] A means to personalize instructional guidelines based on emotion analysis results and optimize the user's learning experience,

[0939] A system that includes this.

[0940] (Claim 2)

[0941] The system according to claim 1, further comprising a function to compare motion data analyzed from video data with data storage on a network and to add relevant technical information.

[0942] (Claim 3)

[0943] The system according to claim 1, further comprising a feedback processing means for receiving opinions from users and using them to improve technical information and guidance guidelines.

[0944] "Application example 2 when combining with an emotional engine"

[0945] (Claim 1)

[0946] A data processing means that receives video data acquired by a camera and analyzes the movement of a target based on the video data,

[0947] An information conversion means that extracts technical data based on the analyzed operational data and converts it into a format that can be transmitted to future generations,

[0948] A guide generation means that provides converted technical documents to users and generates guidelines for implementation,

[0949] An emotion analysis method that utilizes audio data and additional video data to analyze the user's emotional state in real time,

[0950] A guideline optimization method that personalizes the content and order of guidelines based on emotional state,

[0951] A system that includes this.

[0952] (Claim 2)

[0953] The system according to claim 1, further comprising a function to compare motion data analyzed from motion image data with an online information base and to add relevant technical documentation.

[0954] (Claim 3)

[0955] The system according to claim 1, further comprising evaluation processing means for receiving evaluations from users and contributing to the improvement of technical documents and guidelines. [Explanation of Symbols]

[0956] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A data processing means that receives motion image data acquired by a camera and analyzes the movement of an object based on the motion image data, An information conversion means that extracts technical information based on analyzed operational data and converts it into a format that can be passed down to future generations, A guide generation means that provides converted technical information to the user and generates guidelines for practical application, A system that includes this.

2. The system according to claim 1, further comprising a function to compare motion data analyzed from motion image data with an online database and to add relevant technical information.

3. The system according to claim 1, further comprising a feedback processing means for receiving user feedback and using it to improve technical information and guidelines.