Information processing system and information processing method
The information processing system facilitates the recognition of three-dimensional writing information through sensor-equipped input devices and machine learning, addressing the challenge of complex trajectories and enhancing user experience.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- ZEBRA
- Filing Date
- 2024-12-19
- Publication Date
- 2026-07-01
Smart Images

Figure 2026109334000001_ABST
Abstract
Description
Technical Field
[0001] The present disclosure relates to an information processing system and an information processing method.
Background Art
[0002] Patent Document 1 discloses a technique for recognizing characters written on a medium such as paper. In the technique described in Patent Document 1, whether a writing instrument is in contact with the medium is detected by a sensor, and characters separated from the trajectory of the movement of the writing instrument are recognized. In the technique described in Patent Document 1, characters are recognized by collating reference characters stored in a database with the extracted characters.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In the technique described in Patent Document 1, in order to recognize characters written on a medium such as paper, it is premised that a writing instrument is in contact with the medium. Therefore, with this technique, it is possible to recognize two-dimensional characters from the trajectory of the movement of the writing instrument while the writing instrument is in contact with the medium. On the other hand, when considering recognizing three-dimensional characters from the trajectory of the movement of the writing instrument in the air, in addition to the trajectory of the movement of the writing instrument in a two-dimensional plane, the trajectory of the movement of the writing instrument in a direction intersecting that plane is included. It is required to extract and recognize three-dimensional characters from a complex trajectory, so it is not easy to realize the recognition of three-dimensional characters.
[0005] An object of the present disclosure is to provide an information processing system and an information processing method capable of easily recognizing three-dimensional writing information generated by a user's writing operation. [Means for solving the problem]
[0006] An information processing system according to one embodiment of the present disclosure comprises: an input device held by a user and movable in accordance with the user's writing operation; a display device that displays a virtual space corresponding to the user's position in the real space around the user; a sensor positioned on or around the input device that detects the trajectory of the input device accompanying the writing operation; and a processing unit that is communicatively connected to the display device and the sensor and processes writing information generated by the trajectory of the input device detected by the sensor. The processing unit includes: an input unit that acquires three-dimensional writing information generated by a three-dimensional trajectory when the sensor detects the three-dimensional trajectory of the input device accompanying a writing operation to the virtual space; a recognition unit that recognizes the writing information using a recognition model that has been trained to take the writing information as input and output the recognition result of the writing information; and an output unit that outputs recognition result information indicating the recognition result.
[0007] An information processing method according to one form of the present disclosure includes the steps of: displaying a virtual space associated with the position of the user in the real space around the user; detecting the trajectory of an input device accompanying a user's writing operation; and processing writing information generated by the trajectory of the input device, wherein the step of processing writing information includes, when a three-dimensional trajectory of an input device accompanying a writing operation to the virtual space is detected, acquiring three-dimensional writing information generated by the three-dimensional trajectory as writing information; recognizing the writing information using a recognition model that has been trained to take the writing information as input and output the recognition result of the writing information; and outputting recognition result information indicating the recognition result.
[0008] In the above configuration, three-dimensional write information generated by the three-dimensional trajectory of the input device accompanying a write operation to the virtual space is acquired as write information, and the write information is recognized using a machine learning recognition model that takes the write information as input and outputs the recognition result of the write information. By using such a machine learning recognition model, even if the trajectory of the input device accompanying a write operation to the virtual space is a complex trajectory that changes in three dimensions, the three-dimensional write information generated by that trajectory can be easily recognized. Therefore, according to the above configuration, it is possible to easily realize the recognition of three-dimensional write information.
[0009] The sensor includes a first trajectory detection sensor placed on the input device to detect the trajectory of the input device accompanying a writing operation on a virtual space, and a second trajectory detection sensor placed around the input device to detect the trajectory of the input device accompanying a writing operation on a writing surface in real space. The input device is a pen-type device that includes the first trajectory detection sensor and is capable of writing on a writing surface in real space. The input unit may acquire three-dimensional writing information as writing information when the first trajectory detection sensor detects a three-dimensional trajectory of the input device accompanying a writing operation on a virtual space, acquire two-dimensional writing information generated on the virtual writing surface by the two-dimensional trajectory when the first trajectory detection sensor detects a two-dimensional trajectory of the input device accompanying a writing operation on a virtual writing surface set in virtual space, and acquire two-dimensional writing information generated on the real writing surface by the two-dimensional trajectory when the second trajectory detection sensor detects a two-dimensional trajectory of the input device accompanying a writing operation on a real writing surface existing in real space. In this configuration, the recognition model can easily recognize written information regardless of whether the input is 3D written information to a virtual space, 2D written information to a virtual writing surface in a virtual space, or 2D written information to a real writing surface in a real space. By increasing the options for user input to the recognition model in this way, user convenience can be improved.
[0010] The recognition result information may include at least one of the following: text information, image information, and 3D model information, reflecting the written information. In this case, the recognition result information can be provided to the user in the most optimal format that aligns with the user's intent. This can improve user satisfaction.
[0011] The sensor detects at least one of the physical quantities of the input device's velocity and acceleration, which change in conjunction with the write operation. The processing unit may further include an association unit that associates additional information indicating the physical quantity with the write information acquired by the input unit. In this case, for example, by using the additional information associated with the write information as input to the recognition model, it becomes possible to recognize the write information with greater accuracy based on more information about the write operation.
[0012] The recognition model may take written information and additional information as input and output the converted written information as recognition result information, which is then converted into an output format that matches the user's intent identified from the written information and additional information. In this case, since the user is provided with recognition result information in an output format that matches the user's intent, user satisfaction can be improved.
[0013] The output unit may output the recognition result information to a display device, thereby displaying the recognition result information in the virtual space. In this case, providing the recognition result information to the user through the virtual space can give the user a high level of immersion in the virtual space. [Effects of the Invention]
[0014] According to one form of this disclosure, it becomes possible to easily recognize three-dimensional written information generated by a user's writing operation. [Brief explanation of the drawing]
[0015] [Figure 1] Figure 1 is a diagram illustrating a concept of an information processing system according to one embodiment. [Figure 2] Figure 2 is a schematic configuration diagram of an information processing system. [Figure 3] Figure 3 is a diagram showing an example of the application of an information processing system. [Figure 4] Figure 4 is a cross-sectional view showing an example of the configuration of a writing instrument included in the information processing system. [Figure 5] Figure 5 is a diagram showing an example of a hardware configuration related to the information processing system. [Figure 6] Figure 6 is a diagram showing an example of a functional configuration related to the information processing system. [Figure 7] Figure 7 is a diagram showing in more detail the functional configuration related to the response generation unit shown in Figure 6. [Figure 8] Figure 8 is a diagram showing an example of processing by a response generation model. [Figure 9] Figure 9 is a diagram showing in more detail the functional configuration related to the recognition unit shown in Figure 6. [Figure 10] Each of FIGS. 10(a), 10(b), and 10(c) is a diagram showing an example of writing information input to a recognition model. [Figure 11] Figure 11 is a diagram showing an example of processing by a recognition model. [Figure 12] Figure 12 is a diagram showing in more detail the functional configuration related to the analysis unit shown in Figure 6. [Figure 13] Figure 13 is a diagram showing an example of processing by an analysis model. [Figure 14] Figure 14 is a flowchart showing an example of the processing content of an information processing method implemented in the information processing system. [Figure 15] Figure 15 is a flowchart showing an example of the generation process of response information performed by the response generation unit. [Figure 16] Figure 16 is a flowchart showing an example of the recognition process of writing information performed by the recognition unit. [Figure 17] Figure 17 is a flowchart showing an example of the recognition process of writing information performed by the recognition unit. [Figure 18]Figure 18 is a flowchart showing the overall process of analyzing the written information. [Figure 19] Figure 19 is a flowchart showing the process of recording written information. [Figure 20] Figure 20 is a flowchart showing the process of retrieving recorded information. [Figure 21] Figure 21 is a flowchart showing the metadata editing process. [Figure 22] Figure 22 is a flowchart showing the projection process of written information. [Modes for carrying out the invention]
[0016] [Overview of the Information Processing System] The information processing system according to this embodiment is a computer system for communicating with users using virtual objects that operate in a virtual space realized by VR (Virtual Reality), AR (Augmented Reality), MR (Mixed Reality), etc.
[0017] A "user" is a person who uses an information processing system. "Virtual reality" is a technology that provides a virtual space to the user. "Augmented reality" is a technology that overlays a virtual space onto the real space. "Mixed reality" is a technology that overlays a virtual space onto the real space, similar to augmented reality, but it can provide users with more realistic information than augmented reality, such as 3D objects. A "virtual object" is an object that is displayed in a virtual space and does not exist in the real space. An "virtual object" is, for example, an AI character that can interact with the user. An "AI character" is, for example, modeled after a person, but may also be modeled after an animal or a fictional character.
[0018] Figure 1 is a diagram illustrating the concept of the information processing system 1 according to this embodiment. As shown in Figure 1, the information processing system 1 is capable of seamlessly transferring information to multiple virtual spaces W1 and real spaces W2, and processes multiple modal information D10 received from each of the multiple virtual spaces W1. Each of the multiple virtual spaces W1 is a space realized by either virtual reality, augmented reality, or mixed reality. Real space W2 is the real space that exists around the user. The coordinate positions of the virtual spaces W1 are associated with the coordinate positions of the real spaces W2. The multiple modal information D10 is multimodal information that includes multiple different types of information, such as visual information, auditory information, tactile information, olfactory information, gustatory information, and text information.
[0019] "Visual information" is information that includes a two-dimensional or three-dimensional image (still image or moving image, etc.) showing at least one of characters, symbols, and illustrations. "Auditory information" is information that includes voice during conversations with the user. "Tactile information" is information that includes tactile feedback stimuli in interaction with the user. Tactile feedback stimuli may be, for example, vibrations or temperature changes. "Olfactory information" is information that includes smells perceived by the user. "Gustatory information" is information that includes tastes perceived by the user. "Textual information" is information that includes sentences containing one or more strings of characters. "Textual information" may also include metadata in addition to sentences. Multiple modal information D10 may include sensor information and other types of information such as information on the internet. Multiple modal information D10 may be obtained from one virtual space W1 or from each of multiple virtual spaces W1.
[0020] Information processing system 1 has multiple AI models 2 (trained models), each responsible for a specific task or function. In Figure 1, each of the multiple AI models 2 is represented as "AI". Information processing system 1 generates communication information D20 for communicating with the user by processing multiple modal information D10 from the virtual space W1 using the multiple AI models 2. Communication information D20 includes at least one of multiple types of information, such as visual information, auditory information, tactile information, olfactory information, gustatory information, text information, and AI character control information. "AI character control information" is information for controlling the actions and voice of the AI character as a virtual object. Information processing system 1 provides the communication information D20 to the user, for example, through a character displayed in the virtual space W1. Communication information D20 may be multimodal information containing multiple types of information.
[0021] The information processing system 1 can generate appropriate communication information D20 for communicating with the user by simultaneously processing multiple modal information D10 from the virtual space W1 using multiple AI models 2. In other words, the information processing system 1 can accurately grasp the user's intentions by combining multiple input modal information D10 and provide the user with communication information D20 that reflects the user's intentions.
[0022] [Configuration of the Information Processing System] Figure 2 is a schematic diagram of the information processing system 1 according to this embodiment. Figure 3 is a diagram showing an example of the application of the information processing system 1. The information processing system 1 includes, for example, user terminals 10, 10A, 10B (processing units), a writing instrument 20 (input device), a database 30 (recording medium), and a server 40. The user terminals 10, 10A, 10B, the writing instrument 20, the database 30, and the server 40 are connected to each other via a communication network N. The communication network N may include, for example, the internet or an intranet.
[0023] The writing instrument 20 is a pen-type device capable of writing on a writing surface (e.g., a notebook) in real space W2. The writing instrument 20 is held by user U and is movable in conjunction with user U's writing actions. The writing instrument 20 is used for writing, for example, letters, symbols, and illustrations. The writing instrument 20 may be a pen that can write using ink or graphite, such as a ballpoint pen, fountain pen, marker, or mechanical pencil, or it may be a pointing device such as a stylus pen. User U may be a writer who writes using the writing instrument 20.
[0024] The writing instrument 20 is connected to the user terminal 10 by short-range wireless communication. The short-range wireless communication may be a communication method such as Bluetooth® or Wi-Fi®. The communication method between the writing instrument 20 and the user terminal 10 is not limited. In this embodiment, the case in which one writing instrument 20 communicates with the user terminal 10 is illustrated, but the number of writing instruments 20 is not limited. For example, two or more writing instruments 20 may communicate with one user terminal 10.
[0025] User terminal 10 is a computer used by user U. User terminal 10 has the function of presenting a screen representing the virtual space W1 to user U, the function of detecting user U's writing actions using the writing instrument 20, the function of detecting user U's gestures, and the function of detecting user U's spoken voice. The type and configuration of user terminal 10 are not limited. For example, user terminal 10 may be a mobile device such as a high-function mobile phone (smartphone), tablet device, wearable device (e.g., head-mounted display (HMD), smart glasses, or smartwatch), laptop personal computer, or mobile phone. User terminal 10 may also be a stationary terminal such as a desktop personal computer.
[0026] In this embodiment, we illustrate the case where the user terminal 10 is an HMD. In this case, as shown in Figure 3, the user terminal 10 is worn by the user U as an HMD. The user terminal 10 may be held by the user U or it may be placed around the user U. The area around the user U is within a certain distance from the user U and within a range in which the functions of the user terminal 10 described above can be performed.
[0027] Each of user terminals 10A and 10B is a computer used by a user other than user U, and may have the same configuration as user terminal 10. Each of user terminals 10A and 10B may be connected to a writing instrument (input device) used by another user, similar to user terminal 10. Figure 1 illustrates three user terminals 10, 10A, and 10B, but the number of user terminals is not particularly limited and may be four or more.
[0028] The user terminal 10 may display the AI character as a virtual object in the virtual space W1. The user terminal 10 may also control the AI character so that it can move seamlessly between multiple virtual spaces W1.
[0029] The user terminal 10 may display the trajectory of the writing instrument 20 (for example, the trajectory indicated as "A" in Figure 3) in the virtual space W1 as a result of the user U's writing operation. This allows user U to see the characters, etc., written in the air in the virtual space W1. In other words, user U can write characters, etc., in the virtual space W1 displayed on the user terminal 10 by manipulating the writing instrument 20 in the air. The trajectory of the writing instrument 20 is detected based on observed values detected by the motion sensor 208 (first trajectory detection sensor), which will be described later.
[0030] Database 30 is a non-temporary recording device that records various types of data used by the information processing system 1. In this embodiment, database 30 records, for example, profile information D1, user status information D2, recording information D3, virtual space information D4, and model information D5 (all of which are shown in Figure 6). Database 30 may record, for example, profile information D1, user status information D2, recording information D3, virtual space information D4, and model information D5 for each of multiple users U. Database 30 may be constructed as a single database or as a collection of multiple databases. The location of database 30 is not limited. For example, database 30 may be located in a computer system separate from the information processing system 1, or it may be a component of the information processing system 1.
[0031] Server 40 is a virtual server (cloud server) located in a cloud environment. Server 40 relays communication between user terminals 10, 10A, and 10B. Therefore, user terminals 10, 10A, and 10B can synchronize data with Server 40.
[0032] Figure 4 is a schematic diagram showing an example of the configuration of the writing instrument 20. Figure 4 shows a cross-section of the writing instrument 20 when it is cut along a plane in the axial direction L. In this embodiment, the case in which the writing instrument 20 is a ballpoint pen is illustrated. As shown in Figure 4, the writing instrument 20 comprises, for example, a cylindrical part 201, a refill 203, a substrate 206, a pressure sensor 207, and a motion sensor 208 (first trajectory detection sensor).
[0033] The cylindrical portion 201 is a substantially cylindrical member that extends along the axial direction L of the writing instrument 20. The cylindrical portion 201 has an opening 202 at its tip in the axial direction L. The refill 203 is a cylindrical member filled with ink. The refill 203 has an outer diameter smaller than the inner diameter of the cylindrical portion 201 and is housed inside the cylindrical portion 201. When the refill 203 is housed in the cylindrical portion 201, the tip of the refill 203 (hereinafter referred to as the "pen tip 204") is exposed through the opening 202 of the cylindrical portion 201. The substrate 206 is housed inside the cylindrical portion 201 behind the base end 205 of the refill 203 (i.e., the end located opposite to the pen tip 204 in the axial direction L). When the writer grasps the barrel portion 201 and presses the pen tip 204 against the medium, ink from inside the refill 203 seeps out from the pen tip 204. Therefore, the writer can write by moving the writing instrument 20 while pressing the pen tip 204 against the medium.
[0034] The pressure sensor 207 is provided, for example, inside the cylindrical portion 201 between the base end 205 of the refill 203 and the substrate 206. The pressure sensor 207 has a detection surface that detects the pressure that the pen tip 204 receives from the medium when writing with the writing instrument 20 as writing pressure. The pressure sensor 207 detects writing pressure by changing the position of the detection surface in response to the pressure received from the medium. Note that when writing is performed in the air (i.e., writing to a virtual space W1), no writing pressure is generated on the writing instrument 20, so the pressure sensor 207 does not detect writing pressure. When the pressure sensor 207 detects pressure applied to the pen tip 204 by a button or the like, detection by various sensors may be initiated. In other words, detection by the pressure sensor 207 may be treated as an input trigger for the writing instrument 20.
[0035] The motion sensor 208 is provided, for example, on a circuit board 206 inside the cylindrical portion 201. The motion sensor 208 detects the movement of the writing instrument 20. For example, the motion sensor 208 is a 9-axis sensor that includes an acceleration sensor that detects acceleration in three mutually orthogonal axis directions, a gyroscope sensor that detects angular velocity in the same three axis directions, and a compass sensor that detects magnetism in the same three axis directions. In this case, the motion sensor 208 detects observed values of acceleration in the three axis directions, observed values of angular velocity in the three axis directions, and magnetism in the three axis directions when writing with the writing instrument 20. As will be described later, the user terminal 10 uses these observed values to detect the trajectory of the writing instrument 20 accompanying the writing operation in the air.
[0036] The information processing system 1 may be equipped with a camera capable of detecting the trajectory of the writing instrument 20 during an airborne writing operation, instead of the motion sensor 208. In this case, the camera may be installed on the user terminal 10 attached to the user U, or it may be installed around the user U. The area around the user U is within a certain distance from the user U and within a range in which the trajectory of the writing instrument 20 during an airborne writing operation can be detected.
[0037] Figure 5 shows an example of a hardware configuration related to the information processing system 1. Figure 5 shows a terminal computer 100 that functions as a user terminal 10. The terminal computer 100 includes, as hardware components, a processor 101, a main memory unit 102, an auxiliary memory unit 103, a communication unit 104, an input interface 105, and an output interface 106. The processor 101 is an arithmetic unit that executes the operating system and application programs. The processor 101 may be, for example, a CPU or a GPU, but the type of processor 101 is not limited to these.
[0038] The main memory unit 102 is a device that stores programs for realizing the user terminal 10, and calculation results output from the processor 101. The main memory unit 102 is composed of, for example, at least one of ROM and RAM. The auxiliary memory unit 103 is a device that can generally store a larger amount of data than the main memory unit 102. The auxiliary memory unit 103 is composed of, for example, a non-volatile storage medium such as a hard disk or flash memory. The auxiliary memory unit 103 stores the client program P1 for making the terminal computer 100 function as a user terminal 10, and various types of data. The communication unit 104 is a device that performs data communication with other computers via a communication network. The communication unit 104 is composed of, for example, a network card or a wireless communication module.
[0039] The input interface 105 is a device that receives information based on the operation or actions of user U. For example, the input interface 105 includes at least one of a keyboard, operation buttons, a pointing device, a microphone, a sensor, and a camera. The keyboard and operation buttons may be displayed on a touch panel. The data input to the input interface 105 is not limited. For example, the input interface 105 may receive input or selected information by a keyboard, operation buttons, or a pointing device. Alternatively, the input interface 105 may receive audio input by a microphone. Alternatively, the input interface 105 may receive images (e.g., still images or moving images) captured by a camera.
[0040] The output interface 106 is a device that outputs information processed by the terminal computer 100. For example, the output interface 106 includes at least one of a monitor, a touch panel, and a speaker. The monitor and touch panel display the processed information on a display. The speaker outputs the processed audio.
[0041] Each functional configuration of the user terminal 10, as described later, is realized by loading a client program P1, which is an example of the information processing system 1, into the processor 101 or main memory unit 102, and having the processor 101 execute the program. The client program P1 includes code for realizing each functional element of the user terminal 10. The processor 101 operates the communication unit 104, the input interface 105, or the output interface 106 according to the client program P1, and reads and writes data to the main memory unit 102 or the auxiliary memory unit 103. Through this process, each functional element of the user terminal 10 is realized.
[0042] The client program P1 may be provided after being non-temporarily recorded on a tangible recording medium such as a CD-ROM, DVD-ROM, or semiconductor memory. Alternatively, the client program P1 may be provided via a communication network N as a data signal superimposed on a carrier wave.
[0043] As shown in Figure 3, the user terminal 10 is equipped with a camera 5 and a microphone 6 as input interfaces 105. The camera 5 and microphone 6 are mounted, for example, on the main body of the user terminal 10, which is worn by the user U. The camera 5 acquires an image of the real space W2. The camera 5 functions as a sensor (second trajectory detection sensor) that detects the trajectory of the writing instrument 20 accompanying the writing operation of the user U on the real writing surface SU2 (see Figure 10(c)) in the real space W2. In this case, the camera 5 acquires an image that includes the writing information written on the real writing surface SU2, such as a notebook, in the real space W2.
[0044] Furthermore, camera 5 also functions as a sensor that detects visual information, including user U's gestures. Visual information includes user U's body language, such as moving fingers, waving arms, or shaking head. Microphone 6 functions as a sensor that detects auditory information, including voice emitted by user U.
[0045] As described above, the camera 5 and microphone 6 are sensors (second sensors) capable of detecting user U's actions other than writing operations. The user terminal 10 may further include sensors that detect user U's actions other than writing operations, gestures, and speaking.
[0046] Furthermore, the user terminal 10 is equipped with a display 7 and a speaker 8 as output interfaces 106. The display 7 and speaker 8 are, for example, mounted on the main body of the user terminal 10, which is attached to the user U, similar to the camera 5 and microphone 6. The display 7 (display device) is a screen that displays a virtual space W1 and can display communication information D20 in a visual representation in the virtual space W1. The speaker 8 is an audio device that can provide auditory information to the user U and can output communication information D20 in an auditory representation. The user terminal 10 may include a sensor that can detect the vibrations of the user U and may include an actuator that can apply vibrations to the user U. In this case, the user terminal 10 can detect tactile information generated by the vibrations of the user U and can output communication information D20 in a tactile representation. The user terminal 10 may be equipped with a device that can output communication information D20 in an olfactory representation as an output interface 106, or a device that can output communication information D20 in a gustatory representation as an output interface 106. Furthermore, the user terminal 10 may detect changes in the user U's biometric information (i.e., the user U's reaction actions) that occur in conjunction with user U's actions other than writing operations. For example, the user terminal 10 may detect health information such as user U's brain waves, heart rate, skin temperature, or blood pressure as the user U's reaction actions. "Detecting user U's actions other than writing operations" includes not only directly detecting user U's actions other than writing operations, but also detecting changes in user U's biometric information that occur in conjunction with user U's actions other than writing operations as the user U's reaction actions.
[0047] The specific methods for visual, auditory, tactile, olfactory, and gustatory expression are not limited. Visual expression is an expression that can be perceived by human sight and may include, for example, at least one of text, images, videos, and the actions of an AI character. Auditory expression is an expression that can be perceived by human hearing and may include, for example, at least one of voice and sound effects. Tactile expression is an expression that can be perceived by human touch and may include, for example, at least one of vibration and temperature change. Olfactory expression is an expression that can be perceived by human smell and may be expressed by, for example, the type or intensity of a scent. Gustatory expression is an expression that can be perceived by human taste and may be expressed by, for example, a taste such as sweetness or sourness.
[0048] Figure 6 shows an example of a functional configuration related to the information processing system 1. The writing instrument 20 includes a processor 210. The writing instrument 20 also includes an acquisition unit 21 and a communication unit 22 as functional components. These functional components are realized by the processor 210 executing a predetermined program. The acquisition unit 21 continuously or intermittently at a predetermined frequency acquires the observed values of acceleration, angular velocity, and magnetism detected by the motion sensor 208 in conjunction with the user U's writing operation. The communication unit 22 transmits the observed values of acceleration, angular velocity, and magnetism acquired by the acquisition unit 21 to the user terminal 10. As mentioned above, the communication unit 22 can communicate with the user terminal 10 by short-range wireless communication such as Bluetooth®. The acquisition unit 21 and the communication unit 22 may be incorporated into the writing instrument 20 as a simple computer device.
[0049] The user terminal 10 comprises, as a functional configuration, an input unit 11, a display control unit 12, a response generation unit 13, a recognition unit 14, an analysis unit 15, an output unit 16, a feedback unit 17, a recording unit 18, and a communication unit 19. The display control unit 12 displays the virtual space W1 contained in the virtual space information D4 on the display 7. The virtual space information D4 is electronic information that indicates multiple types of virtual spaces W1. Each of the multiple types of virtual spaces W1 includes information indicating the arrangement of objects that constitute the background. The display control unit 12 further displays the AI character set based on the model information D5 on the display 7.
[0050] Model information D5 is electronic information used to define the specifications of an AI character. AI character specifications refer to the arrangements or methods for controlling the AI character. For example, AI character specifications include at least one of the following: attributes, composition, behavior, and voice. AI character attributes are any information set to characterize the AI character, including, for example, personality (virtual personality) and voice quality. AI character composition includes, for example, shape and dimensions.
[0051] The communication unit 19 is communicatively connected to the pressure sensor 207 and motion sensor 208 of the writing instrument 20. Furthermore, the communication unit 19 is communicatively connected to the camera 5, microphone 6, display 7, and speaker 8. The communication unit 19 receives information transmitted from the communication unit 22 of the writing instrument 20, as well as information transmitted from the camera 5 and microphone 6. The information transmitted from the communication unit 22 of the writing instrument 20 includes, for example, observed values of acceleration, angular velocity, and magnetism acquired by the acquisition unit 21 of the writing instrument 20. The information transmitted from the camera 5 includes, for example, writing information including the trajectory of the writing instrument 20 accompanying the writing action of user U on the writing surface in real space W2, and visual information including gestures of user U detected by the camera 5. The information transmitted from the microphone 6 includes, for example, auditory information including sounds made by user U.
[0052] [Generating response information] In the examples shown in Figures 7 and 8, the user terminal 10 generates response information D21 for user U as communication information D20 using the response generation model M1. Details of the response generation model M1 and response information D21 will be described later. Figure 7 is a diagram showing the functional configuration related to the response generation unit 13 in more detail. Figure 8 is a diagram showing an example of processing by the response generation model M1.
[0053] As shown in Figure 7, the input unit 11 includes a calculation unit 111, a first acquisition unit 112, a second acquisition unit 113, an input format determination unit 114, an input format integration unit 115, a write information determination unit 116, and a relationship determination unit 117.
[0054] The calculation unit 111 performs calculations on the observed values of acceleration, angular velocity, and magnetism acquired by the acquisition unit 21 of the writing instrument 20. The calculation unit 111 determines, for example, the position (trajectory) of the pen tip 204. For example, the calculation unit 111 determines the position (trajectory) of the pen tip 204 as follows: First, the calculation unit 111 determines the position of the motion sensor 208 by integral calculation of at least one of the acceleration and angular velocity. Here, since the position of the pen tip 204 relative to the motion sensor 208 is constant, the calculation unit 111 estimates the position of the pen tip 204 from the position of the motion sensor 208.
[0055] The first acquisition unit 112 acquires the trajectory of the writing instrument 20 detected by the motion sensor 208 (i.e., the trajectory determined by the calculation unit 111) as written information D1A (first input information) written to the virtual space W1. The second acquisition unit 113 acquires the trajectory of the writing instrument 20 detected by the camera 5 as written information D1B (first input information) written to the writing surface in the real space W2. In other words, the input unit 11 can acquire both the written information D1A written to the virtual space W1 and the written information D1B written to the real writing surface SU2 in the real space W2. Hereafter, when the written information D1A and D1B are not distinguished, they will be referred to as written information D11.
[0056] In the information processing system 1, the motion sensor 208 detects the three-dimensional trajectory of the writing instrument 20 associated with the user U's writing operation on the virtual space W1. The user U's writing operation on the virtual space W1 is not limited to a three-dimensional writing operation, but may also be a writing operation on a virtual writing surface SU1 (see Figure 10(b)) set in the virtual space W1. The motion sensor 208 detects the two-dimensional trajectory of the writing instrument 20 associated with the user U's writing operation on the virtual writing surface SU1. Furthermore, the camera 5 detects the two-dimensional trajectory of the writing instrument 20 associated with the user U's writing operation on a real writing surface SU2 (see Figure 10(c)) existing in the real space W2. In this way, the information processing system 1 can detect the trajectory of the writing instrument 20 associated with various writing operations by the user U.
[0057] As shown in Figure 8, the input unit 11 acquires user operation information D12 (second input information) in addition to the write information D11 described above. User operation information D12 is generated by user U's actions other than write operations, which are detected by at least one of the camera 5 and the microphone 6. User operation information D12 includes at least one of visual information including gestures of user U detected by the camera 5, auditory information including sounds emitted by user U, and tactile information including tactile feedback stimuli to user U. As shown in Figure 8, the write information D11 and user operation information D12 are input to the response generation model M1 as a plurality of modal information D10.
[0058] The input unit 11 further acquires profile information D1 and user status information D2 recorded in the database 30. Profile information D1 includes, for example, at least one of user U's personal information, behavioral history, and preference information. User status information D2 includes, for example, at least one of user U's cognitive ability, user U's learning style, user U's environment, and the spatial situation of the virtual space W1. Cognitive ability is user U's ability to recognize information, for example, "sleepy and low cognitive ability." Learning style indicates how user U is learning, for example, "studying at a cram school" or "studying in a school classroom." Environment is information indicating the space user U is in, for example, "at school" or "at home." Spatial situation of virtual space W1 is information indicating the virtual space W1 displayed on the user terminal 10 (the theme of the space user U is immersed in), for example, "English-speaking country" or "tropical country." User status information D2 may be directly input to the user terminal 10 by user U, or it may be acquired in real time using detection results from sensors such as the camera 5.
[0059] The input format determination unit 114 shown in Figure 7 determines the input format of the write information D11 and the user operation information D12. The input format determination unit 114 determines the input format of the write information D11 and the user operation information D12 by determining which sensor (in this embodiment, the camera 5, the microphone 6, or the motion sensor 208) the information was acquired from. For example, the input format determination unit 114 pre-records the correspondence between a certain sensor and the input format of the information from that sensor, and determines the input format based on that correspondence.
[0060] The input format integration unit 115 shown in Figure 7 integrates the respective input formats of the write information D11 and user operation information D12, as determined by the input format determination unit 114, into a common format for input to the response generation model M1. For example, the input format integration unit 115 integrates the respective input formats of the write information D11 and user operation information D12 into an intermediate language format suitable for input to the response generation model M1. In other words, the input format integration unit 115 converts the write information D11 and user operation information D12 into an intermediate language having an input format that the response generation model M1 can process.
[0061] The recording unit 18 shown in Figure 6 records the write information D11 and user operation information D12 acquired by the input unit 11 in the database 30. The response generation model M1, which will be described next, generates response information D21 using the current write information D11 and user operation information D12 input to the input unit 11 and the past write information D11 and user operation information D12 recorded in the database 30. In the following description, the current write information D11 and the past write information D11 will be collectively referred to as write information D11, and the current user operation information D12 and the past user operation information D12 will be collectively referred to as user operation information D12. The response generation model M1 does not necessarily have to use the past write information D11 and user operation information D12.
[0062] The response generation unit 13 shown in Figure 7 generates various types of information using the response generation model M1 shown in Figure 8. As shown in Figure 8, the response generation model M1 is an AI model that has been trained to take written information D11, user operation information D12, profile information D1, and user status information D2 as inputs and output response information D21. The response generation unit 13 comprises an analysis unit 131, an AI character processing unit 132, and a response information generation unit 133.
[0063] The analysis unit 131 outputs an analysis result that analyzes at least one of the following using the response generation model M1: the emotional state of user U, the context of user U's dialogue, and the intent of user U's dialogue. In this case, the response generation model M1 is also an AI model that has been trained to take written information D11 and user action information D12 as input and output analysis results of the emotional state, the context of the dialogue, and the intent of the dialogue. The analysis unit 131 comprises a context analysis unit 134, an emotion analysis unit 135, and an intent recognition unit 136.
[0064] The context analysis unit 134 inputs the written information D11 and user action information D12 to the response generation model M1 and analyzes the context of the dialogue content. The context analysis unit 134 outputs analysis results such as the genre of the dialogue content or the atmosphere of the dialogue.
[0065] The emotion analysis unit 135 inputs the written information D11 and user action information D12 into the response generation model M1 to analyze the emotional state. The emotion analysis unit 135 outputs an analysis result, for example, that user U's emotional state is impatient.
[0066] The intent recognition unit 136 inputs the written information D11 and user action information D12 to the response generation model M1 and analyzes the intent of the dialogue. The intent recognition unit 136 outputs the analysis results, for example, what user U wants to do.
[0067] The context analysis unit 134, the sentiment analysis unit 135, and the intention recognition unit 136 may use each other's analysis results as input to the response generation model M1, in addition to the written information D11 and the user action information D12. For example, the sentiment analysis unit 135 may use the analysis result of the context analysis unit 134 as input to the response generation model M1. The intention recognition unit 136 may use the analysis results of the context analysis unit 134 and the analysis results of the sentiment analysis unit 135 as input to the response generation model M1.
[0068] The AI character processing unit 132 selects or adjusts the personality of the AI character (hereinafter referred to as "AI personality") using the response generation model M1. In this case, the response generation model M1 is also an AI model that has been trained to select or adjust the AI personality by taking profile information D1, written information D11, and user action information D12 as input. The AI personality is set to enrich communication between the AI character and the user U, and may be, for example, friendly, kind, serious, sociable, or polite. The AI character processing unit 132 includes an AI personality selection unit 137 and an AI personality adjustment unit 138.
[0069] The AI personality selection unit 137 inputs profile information D1 into the response generation model M1 and selects (sets) an AI personality suitable for profile information D1. For example, the response generation model M1 sets an AI character to one or more AI personalities from among the multiple AI personalities included in the model information D5 recorded in the database 30 that are suitable for profile information D1. In this case, the recording unit 18 pre-records model information D5, which includes multiple AI personalities that can be set to an AI character, in the database 30.
[0070] The AI personality adjustment unit 138 inputs the written information D11 and user action information D12 into the response generation model M1 to adjust the AI personality selected by the AI personality selection unit 137. The AI personality adjustment unit 138 may also input the analysis results generated by at least one of the context analysis unit 134, the emotion analysis unit 135, and the intention recognition unit 136 into the response generation model M1 to adjust (determine) the AI personality. The AI personality adjustment unit 138 may also input user situation information D2 into the response generation model M1 to adjust the AI personality. For example, if the user situation information D2 includes "English-speaking country" as the spatial situation of the virtual space W1, the AI personality adjustment unit 138 adjusts the AI personality to an English-speaking personality.
[0071] The response information generation unit 133 generates response information D21 using the response generation model M1. The response generation model M1 is also an AI model that has been machine-trained to take written information D11, user action information D12, and user situation information D2 as inputs and output response information D21 that is in line with the AI personality of the AI character. In addition to written information D11, user action information D12, and user situation information D2, the response information generation unit 133 may also input the analysis results generated by at least one of the context analysis unit 134, sentiment analysis unit 135, and intention recognition unit 136 into the response generation model M1 to generate response information D21.
[0072] Response information D21 is a response to user U in response to the information written by user U D11 and user action information D12, and is expressed in a manner consistent with the AI character's personality. Response information D21 is provided to user U via the AI character. Response information D21 includes, for example, answers to inputs such as questions or requests from user U. For example, if user U asks "What will the weather be like tomorrow?" using a writing instrument 20, and the AI personality is polite, response information D21 will be expressed in a polite manner such as, "The weather tomorrow will be sunny. The temperature will be 25°C, and the probability of precipitation is 30%."
[0073] Response information D21 includes, for example, the emotional state of user U, the intent of the dialogue, or suggestions regarding the context of the dialogue, as analyzed by the analysis unit 131. For example, if user U's emotional state is anxiety, response information D21 may be various gestures, facial expressions, sentences, or voices that make user U actually feel the message "Calm down," or it may be know-how information regarding attempts to calm user U. For example, if the context of the dialogue is movies, response information D21 may be various pieces of information such as information about the latest movies, information about various anecdotes about user U's favorite movies, information about actors or voice actors, or information about jokes that use that information.
[0074] Response information D21 may include learning support information that proposes a learning schedule to user U to achieve the goals that user U should accomplish based on profile information D1. For example, if profile information D1 includes information that user U wants to work hard at learning English, the learning support information may include an English learning schedule necessary to pass an English qualification exam. In this case, the response information generation unit 133 may input profile information D1 into the response generation model M1 and output response information D21 including the learning schedule.
[0075] The AI personality adjustment unit 138 may adjust the AI personality after the response information generation unit 133 has generated the response information D21. In this case, the response information generation unit 133 outputs response information D21 that is in line with the AI personality before adjustment (the initial AI personality set by the AI personality selection unit 137). The response information generation unit 133 may then output response information D21 that is in line with the adjusted AI personality at the next time it outputs response information D21. The AI personality adjustment unit 138 may also adjust the AI personality before the response information generation unit 133 generates the response information D21. In this case, the response information generation unit 133 outputs response information D21 that is in line with the adjusted AI personality.
[0076] The output unit 16 shown in Figure 7 provides response information D21 to user U in real time through an AI character with an AI personality. As shown in Figure 7, the output unit 16 includes an output format selection unit 161, a first information provision unit 162, a second information provision unit 163, and an output information selection unit 164.
[0077] The output format selection unit 161 uses the written information D11 and user operation information D12 to select the output format of the response information D21 from a plurality of output formats defined by at least one of visual, auditory, tactile, olfactory, and gustatory expressions. The output format selection unit 161 estimates the output format desired by user U from the written information D11 and user operation information D12. The output format selection unit 161 may, for example, select the output format of the response information D21 using an AI model that has been trained to select the output format of the response information D21 with the written information D11 and user operation information D12 as input. This AI model may be a response generation model M1.
[0078] The first information provision unit 162 and the second information provision unit 163 provide response information D21 to user U in the output format selected by the output format selection unit 161. The first information provision unit 162 provides response information D21 to user U through an AI character existing in the virtual space W1. The second information provision unit 163 provides response information D21 (communication information D20) to user U through an output device existing in the real space W2. The output device may be, for example, a display, a speaker, or a vibration device. The output device may be located on the user terminal 10, on the writing instrument 20, or in a location other than the user terminal 10 and the writing instrument 20.
[0079] For example, if the output format is a visual representation, the first information providing unit 162 provides the response information D21 in the form of a visual representation to the user U through the AI character displayed on the display 7. In this case, the second information providing unit 163 provides the response information D21 in visual representation to the user U through an output device such as a display in the real space W2. For example, if the output format is an auditory or tactile representation, the first information providing unit 162 provides the response information D21 in auditory or tactile representation to the user U through the voice of the AI character from the speaker 8 or vibrations from a vibration device (actuator). In this case, the second information providing unit 163 provides the response information D21 in auditory or tactile representation to the user U through an output device such as a speaker or vibration device.
[0080] The feedback unit 17 analyzes the user U's response and feeds the analysis results back to the profile information D1. The feedback unit 17 includes a feedback analysis unit 171 and a profile update unit 172.
[0081] The feedback analysis unit 171 analyzes the trend of user U's response to response information D21 by statistically processing the written information D11 and user action information D12 acquired by the input unit 11 as user U's response to the response information D21 provided to user U. For example, the feedback analysis unit 171 outputs an analysis result such as user U not showing interest in the response information D21.
[0082] The profile update unit 172 updates the profile information D1 recorded in the database 30 based on the analysis results of the feedback analysis unit 171. In other words, the profile update unit 172 updates the profile information D1 based on the write information D11 and user action information D12 for the response information D21. For example, if user U has not shown interest in the response information D21, the profile update unit 172 updates the preference information included in the profile information D1.
[0083] The input unit 11 retrieves the profile information D1, which has been updated by the profile update unit 172, from the database 30. The AI personality selection unit 137 inputs the updated profile information D1 into the response generation model M1 and selects (sets) an AI personality suitable for the updated profile information D1. As a result, the response information generation unit 133 outputs response information D21 that is in line with the AI personality suitable for the updated profile information D1.
[0084] As explained above, in the information processing system 1, the user terminal 10 generates response information D21 for user U using the response generation model M1. In the examples of Figures 7 and 8, the response information generation unit 133 outputs response information D21 that is in line with the AI personality of the AI character. However, the response information generation unit 133 may output response information D21 without being in line with the AI personality of the AI character (without selecting an AI personality in the AI personality selection unit 137). The response generation model M1 may output information other than response information D21 as communication information D20. The response generation model M1 may output information that can be provided to user U without going through the AI character as communication information D20. Information that can be provided to user U without going through the AI character includes, for example, a summary of the written information D11, information such as equipment malfunctions, and various types of information such as disaster information from specific external sources.
[0085] The response information generation unit 133 may input only the write information D11 and user operation information D12 to the response generation model M1 to generate the response information D21. In this case, processing by the AI character processing unit 132 may be omitted, and the display control unit 12 may not display the AI character in the virtual space W1. The output unit 16 may provide the response information D21 to the user U without going through the AI character.
[0086] In the examples of Figures 7 and 8, user terminal 10 processed multiple modal information D10 from a single user U, but user terminal 10 may also process multiple modal information from a user other than user U. The other user is, for example, a user of user terminals 10A, 10B (see Figure 2). In this case, input unit 11 may acquire multiple modal information D10 from user U (first user input information) and multiple modal information from the other user (second user input information). Response information generation unit 133 may input the first user input information and the second user input information into response generation model M1 to generate response information to facilitate communication between user U and the other user. Output unit 16 may provide the response information to at least one of user U and the other user through an AI character. The response information may be, for example, the intent of the input information from user U (or the other user), or auditory information to soothe at least one of user U and the other user. Furthermore, the AI character may also perform the functions of a true facilitation, such as presenting the initial agenda or timeline for the entire meeting, explaining the meeting rules, managing the progress of the meeting, soliciting opinions from each user as needed, simplifying the content of those opinions and presenting it to other users in an easy-to-understand way, or creating summaries of the content up to that point in text or diagrams and showing them to other users. If a certain path to a conclusion is required, the AI character may also guide the members in that direction in a natural way, summarize the final conclusion or action, confirm it with the members, save it as meeting minutes in a database, and report it to the supervisor, thus managing the entire process.
[0087] [Information recognition for posts] In the examples of Figures 9, 10, and 11, the user terminal 10 recognizes the write information D11 using the recognition model M2. Details of the recognition model M2 will be described later. Figure 9 is a diagram showing the functional configuration related to the recognition unit 14 in more detail. Figures 10(a), 10(b), and 10(c) are diagrams showing examples of write information input to the recognition model M2. Figure 11 is a diagram showing an example of processing by the recognition model M2.
[0088] The input unit 11 shown in Figure 9 acquires write information D11, similar to the input unit 11 shown in Figure 7. Specifically, the first acquisition unit 112 acquires write information D1A, and the second acquisition unit 113 acquires write information D1B.
[0089] The write information determination unit 116 determines the input format of the write information D11 (i.e., write information D1A, D1B) input to the input unit 11. By determining the input format of the write information D11, the write information determination unit 116 obtains at least one of the following: 3D write information D14 for the virtual space W1 (see Figure 10(a)), 2D write information D15 for the virtual writing surface SU1 (see Figure 10(b)), and 2D write information D16 for the real writing surface SU2 (see Figure 10(c)).
[0090] For example, if the motion sensor 208 detects a three-dimensional trajectory of the writing instrument 20, the writing information determination unit 116 acquires three-dimensional writing information D14 as writing information D11. If the motion sensor 208 detects a two-dimensional trajectory of the writing instrument 20 relative to the virtual writing surface SU1, the writing information determination unit 116 acquires two-dimensional writing information D15 as writing information D11. If the camera 5 detects a two-dimensional trajectory of the writing instrument 20 relative to the actual writing surface SU2, the writing information determination unit 116 acquires two-dimensional writing information D16 as writing information D11.
[0091] The virtual writing surface SU1 is a virtual writing surface displayed in the virtual space W1. The virtual writing surface SU1 is used to recognize the writing operation of the writing instrument 20 by the user U. The virtual writing surface SU1 may also be part of a virtual object displayed in the virtual space W1. The virtual writing surface SU1 may be represented by a two-dimensional plane or by a curved surface that changes in three dimensions. The real writing surface SU2 is a writing surface, such as a notebook, that actually exists in the real space W2. The real writing surface SU2 may be represented by a two-dimensional plane or by a curved surface that changes in three dimensions.
[0092] Thus, in the information processing system 1, three-dimensional write information D14 and two-dimensional write information D15, D16 can be acquired as write information D11 in response to the user U's write operation. In the following explanation, when three-dimensional write information D14 and two-dimensional write information D15, D16 are not distinguished, they will each be referred to as "write information D11".
[0093] The recognition unit 14 shown in Figure 9 generates various types of information using the recognition model M2 shown in Figure 11. As shown in Figure 11, the recognition model M2 is an AI model that has been trained to take written information D11 and additional information D18 as inputs and to output recognition result information D30. Details of the additional information D18 and recognition result information D30 will be described later. The recognition unit 14 comprises a first preprocessing unit 141, a second preprocessing unit 142, and a recognition processing unit 143.
[0094] The first preprocessor unit 141 associates additional information D18 with the written information D11. The first preprocessor unit 141 includes a feature analysis unit 144 and an association unit 145.
[0095] The feature analysis unit 144 acquires additional information D18. The feature analysis unit 144 performs calculations on the observed values of acceleration, angular velocity, and magnetism acquired by the acquisition unit 21 of the writing instrument 20 to acquire feature quantities (physical quantities) of the writing information D11. For example, the feature analysis unit 144 acquires the acceleration of the pen tip 204 and the movement speed of the pen tip 204 as feature quantities. The feature analysis unit 144 also acquires the acceleration of the pen tip 204 and the movement speed of the pen tip 204 as feature quantities for the two-dimensional writing information D16 for the actual writing surface SU2. For the two-dimensional writing information D16, the feature analysis unit 144 may also acquire the pen pressure acquired by the acquisition unit 21 of the writing instrument 20 as a feature quantity. The feature analysis unit 144 acquires additional information D18 indicating the acquired feature quantities.
[0096] The association unit 145 associates the additional information D18 with the written information D11. The association unit 145 associates the additional information D18, for example, the acceleration of the pen tip 204 and the movement speed of the pen tip 204, with the written information D11. In this specification, associating one piece of information with another piece of information means linking or associating these pieces of information bidirectionally so that one piece of information can be obtained from the other piece of information.
[0097] The second preprocessor unit 142 converts the input format of the write information D11. The second preprocessor unit 142 includes an expression format selection unit 146 and a conversion processing unit 147.
[0098] The expression format selection unit 146 uses the recognition model M2 to select the expression format (output format) of the recognition result information D30 intended by user U. In this case, the recognition model M2 is also an AI model that has been trained to take the written information D11 and the additional information D18 as input and output the expression format. In this way, the expression format selection unit 146 selects an expression format that matches the intention of user U, which is identified from the written information D11 and the additional information D18.
[0099] The representation format of the recognition result information D30 may be text information, image information, or 3D model information. "Image information" includes information such as still images or moving images. "Image information" may also include processed characters (for example, jagged characters) or a virtual background set as the background of the virtual space W1. "3D model information" includes information that includes three-dimensional objects. "3D model information" may also include objects that represent characters in three dimensions. The representation format output by the recognition model M2 may include information indicating which of the AI recognition processing unit 148 and the generation AI processing unit 149, described later, will perform.
[0100] The conversion processing unit 147 converts the input format of the written information D11 into an input format suitable for outputting the recognition result information D30 of the expression format selected by the expression format selection unit 146. For example, if the expression format is text information processed by the AI recognition processing unit 148, the conversion processing unit 147 converts the input format of the written information D11 into an input format suitable for input to the AI recognition processing unit 148, which includes an instruction to output text information.
[0101] The recognition processing unit 143 recognizes the written information D11 and outputs recognition result information D30 indicating the recognition result. The recognition processing unit 143 includes an AI recognition processing unit 148 and a generation AI processing unit 149. The AI recognition processing unit 148 mainly performs character recognition of the written information D11, while the generation AI processing unit 149 generates new content from the written information D11 using a generation AI model. If the converted input format of the written information D11 indicates that it should be processed by the AI recognition processing unit 148, the recognition processing unit 143 performs the processing by the AI recognition processing unit 148. If the converted input format of the written information D11 indicates that it should be processed by the generation AI processing unit 149, the recognition processing unit 143 performs the processing by the generation AI processing unit 149.
[0102] The AI recognition processing unit 148 inputs the written information D11 and the additional information D18 to the recognition model M2 to generate recognition result information D30. The recognition result information D30 generated by the AI recognition processing unit 148 includes at least one of text information, image information, and 3D model information, in which the written information D11 is reflected. The text information in which the written information D11 is reflected includes, for example, a string of characters which is the result of the written information D11 being recognized as characters. The image information in which the written information D11 is reflected includes, for example, an image showing the recognized characters in a processed state, or an image that the recognized characters represent (for example, an image of an apple if the recognized characters are apples). The 3D model information in which the written information D11 is reflected includes an object that represents the recognized characters in three dimensions.
[0103] The generation AI processing unit 149 inputs the write information D11 and the additional information D18 to the recognition model M2 to generate the recognition result information D30. The recognition model M2 used in the generation AI processing unit 149 is a generation AI model that generates content (in this embodiment, the recognition result information D30) in response to prompt input including the write information D11 and the additional information D18. The generation AI model may be configured to include, for example, a Large Language Model (LLM). An example of a generation AI model is ChatGPT.
[0104] The recognition result information D30 generated by the AI processing unit 149 includes at least one of text information, image information, and 3D model information, which reflect the written information D11. The text information reflecting the written information D11 includes, for example, a story themed on the content of a string of characters, which is the result of the written information D11 being recognized as characters. The image information reflecting the written information D11 includes, for example, an image of the object intended by user U as the written information D11. For example, if the written information D11 shows a rough drawing of an apple, the image information includes a clean drawing of an apple. The 3D model information reflecting the written information D11 includes, for example, a three-dimensional object intended by user U as the written information D11.
[0105] The recognition processing unit 143 may also output the analysis results obtained by analyzing the recognition result information D30 using the recognition model M2. In this case, the recognition model M2 is also an AI model that has been trained to take the recognition result information D30 as input and output the analysis results of the recognition result information D30. The analysis results of the recognition result information D30 may, for example, indicate that the user U's situation is normal as seen from the camera, or that the characters, sentences, or symbols written by user U are correct or incorrect. The analysis results of the recognition result information D30 may indicate abnormal situations such as user U not being present, user U sleeping, user U looking sleepy, user U looking dissatisfied, user U lying down, or user U being a different person. The analysis results of the recognition result information D30 may indicate that user U is doing something other than what was instructed, or that there is an abnormal sound inside or outside the room, an intrusion by an abnormal person, or an interruption from the management side.
[0106] The output unit 16 shown in Figure 9 provides the recognition result information D30 to user U. The first information providing unit 162 displays the recognition result information D30 in the virtual space W1 by outputting the recognition result information D30 to the display 7. The second information providing unit 163 provides the recognition result information D30 to user U through an output device such as a display in the real space W2.
[0107] [Analysis of recorded information] In the examples shown in Figures 12 and 13, the user terminal 10 analyzes the recorded information D3, which includes the written information D11, using the analysis model M3. Details of the analysis model M3 and the recorded information D3 will be described later. Figure 12 is a diagram showing the functional configuration related to the analysis unit 15 in more detail. Figure 13 is a diagram showing an example of processing by the analysis model M3.
[0108] The input unit 11 shown in Figure 12 acquires the write information D11, similar to the input unit 11 shown in Figure 7. That is, the first acquisition unit 112 acquires the write information D1A, and the second acquisition unit 113 acquires the write information D1B. The input format determination unit 114 shown in Figure 12 determines the input format of the write information D11 (i.e., the write information D1A and D1B), similar to the input format determination unit 114 shown in Figure 7. The input format determination unit 114 determines the input format of the write information D11 by determining which sensor (in this embodiment, the camera 5 or the motion sensor 208) each of the write information D1A and D1B was acquired from.
[0109] The input format integration unit 115 shown in Figure 12 integrates the input format of the write information D11 determined by the input format determination unit 114 into a common format for input to the analysis model M3. For example, the input format integration unit 115 integrates the input format of the write information D11 into an intermediate language format suitable for input to the analysis model M3. In other words, the input format integration unit 115 converts the write information D11 into an intermediate language having an input format that the analysis model M3 can process.
[0110] The input unit 11 shown in Figure 12 receives a query Q from user U. Query Q is a search condition for searching recorded information D3. Query Q includes at least one of the following: information expressed in natural language, visual information including user U's gestures, user U's emotional state, and user U's environment. Information expressed in natural language may be visual information such as text, or auditory information such as speech. The input unit 11 may acquire the written information D11 and user action information D12 as information expressed in natural language or visual information including gestures. The input unit 11 may input the written information D11 and user action information D12 into an analysis model M3 to acquire user U's emotional state. In this case, the analysis model M3 is also an AI model trained to take written information D11 and user action information D12 as input and output user U's emotional state. User U's environment may be directly input to the user terminal 10 by user U, or it may be acquired in real time using detection results from sensors such as the camera 5.
[0111] The analysis unit 15 shown in Figure 12 generates various types of information using the analysis model M3 shown in Figure 13. As shown in Figure 13, the analysis model M3 is an AI model trained with machine learning to take query Q and recorded information D3 as inputs and output the analysis result D40 of the recorded information D3 for query Q. Details of the analysis result D40 will be described later. The analysis unit 15 includes a recording unit 151, a search unit 152, and an editing unit 153.
[0112] The recording unit 151 records the recording information D3 in the database 30. The recording unit 151 includes an information acquisition unit 154, a metadata generation unit 155, and a recording processing unit 156. The information acquisition unit 154 acquires the write information D11 acquired by the input unit 11.
[0113] The metadata generation unit 155 acquires metadata D19 of the write information D11 acquired by the information acquisition unit 154. The metadata D19 includes spatiotemporal information indicating at least one of the time and position of the write information D11, and write status information indicating the user U's writing status other than spatiotemporal information.
[0114] The time of the written information D11 is the time when the user U performed the write operation that generated the written information D11, for example, the time when the trajectory was detected by the motion sensor 208 and the camera 5. The position of the written information D11 may be an absolute position or a relative position with respect to the user terminal 10. The position of the written information D11 may be detected by the motion sensor 208 and the camera 5, or by GPS (Global Positioning System), etc.
[0115] The writing status information includes at least one of the user U's emotional state when the writing information D11 was acquired, an event (phenomenon) related to the writing information D11, and the user U's environment when the writing information D11 was acquired. The metadata generation unit 155 may input the writing information D11 into the analysis model M3 to acquire the user U's emotional state. The event is, for example, an event that occurred near the location of the writing information D11, or an event that occurred around the time of the writing information D11. The metadata generation unit 155 may access a search site and acquire events that are hit by the search. The user U's environment may be directly input by the user U to the user terminal 10, or it may be acquired in real time using the detection results of sensors such as the camera 5.
[0116] The metadata generation unit 155 may acquire metadata D19 using the method described above, or it may acquire metadata D19 directly from the write information D11 and user operation information D12. For example, the metadata generation unit 155 may acquire write information D11 and user operation information D12 that indicate the time, location, or emotional state at the time of the write operation, and acquire metadata D19 directly from that information.
[0117] The recording processing unit 156 acquires information including metadata D19 associated with the write information D11 as recording information D3. The recording processing unit 156 records the recording information D3 in the database 30. The recording processing unit 156 may acquire recording information D3 from each of the multiple virtual spaces W1 and record it in the database 30. If the time scales differ in each of the multiple virtual spaces W1, the recording processing unit 156 may perform time synchronization on the recording information D3 acquired from each of the multiple virtual spaces W1. This makes it possible to maintain time consistency between multiple virtual spaces W1 with different time scales.
[0118] The search unit 152 analyzes the recorded information D3 using query Q. The search unit 152 comprises a query reception unit 157, an analysis unit 158, and an analysis result output unit 159.
[0119] The query receiver 157 obtains the query Q entered into the input unit 11. The query receiver 157 may analyze the intent or context of user U by analyzing query Q. For example, the query receiver 157 may input query Q into the analysis model M3 and output the intent or context of user U. In this case, the analysis model M3 is also an AI model that has been trained to take query Q as input and output the intent or context of user U.
[0120] The analysis unit 158 searches for recorded information D3 in the database 30 according to query Q and obtains the recorded information D3 as a search result. The analysis unit 158 inputs query Q into the analysis model M3 and obtains the search result for recorded information D3 for query Q. In addition to query Q, the analysis unit 158 may also input the intent or context of user U obtained by the query reception unit 157 into the analysis model M3. The analysis unit 158 inputs the search result for recorded information D3 into the analysis model M3 and outputs an analysis result D40 obtained by extracting and analyzing information related to query Q and metadata D19 from the search result.
[0121] Analysis result D40 includes a summary result compiled by analysis model M3 from the search results of record information D3 related to query Q. In this case, analysis model M3 can generate the summary result by extracting information related to query Q and metadata D19 from the search results of record information D3, and then summarizing the key points of the extracted information. The summary result may include, for example, a summary of record information D3, a detailed report on record information D3, or an interactive dashboard on record information D3. Query Q may include, for example, "What happened around this time last year?" In this case, analysis result D40 for record information D3 corresponding to this time last year will be obtained. By generating such analysis result D40, insights into record information D3 and trends in record information D3 can be provided to user U.
[0122] The analysis result output unit 159 outputs the analysis result D40 obtained by the analysis unit 158 to the output unit 16.
[0123] The editorial department 153 edits metadata D19. The editorial department 153 includes an information modification unit 180, a metadata update unit 181, and a change saving unit 182.
[0124] The information modification unit 180 modifies the metadata D19 of the recorded information D3 recorded in the database 30 using the write information D11 and user operation information D12 input to the input unit 11. The information modification unit 180 obtains the modified metadata D19 from the write information D11 and user operation information D12. The information modification unit 180 may also input the write information D11 and user operation information D12 into the analysis model M3 to obtain the modified content. If the write information D11 and user operation information D12 indicate a modified content, the information modification unit 180 may directly obtain the modified content from the write information D11 and user operation information D12.
[0125] The metadata update unit 181 obtains the modified metadata D19 by reflecting the modifications acquired by the information modification unit 180 in the metadata D19. The metadata update unit 181 associates the modified metadata D19 with the write information D11 in place of the original metadata D19. As a result, the metadata update unit 181 updates the metadata D19.
[0126] The change saving unit 182 saves the modifications (changes) made in the metadata update unit 181. The saved changes may be shared with other users. In other words, metadata D19 acquired by other users may be updated in accordance with the changes made to metadata D19 by user U.
[0127] The relationship determination unit 117 of the input unit 11 determines the relationship between the written information D11 (current input information) input to the input unit 11 and the written information D11 (past input information) recorded in the database 30. The relationship determination unit 117 determines whether the degree of relationship between the written information D11, which is the current input information, and the written information D11, which is the past input information, is higher than a threshold. The degree of relationship is an indicator that shows the strength of the relationship between the current input information and the past input information; in other words, it is an indicator that shows the degree to which the past input information belongs to the current input information. For example, if the current input information and the past input information contain content related to an event common to both, the degree of relationship between the current input information and the past input information will be high.
[0128] The relevance determination unit 117 may, for example, input the current input information and past input information into the analysis model M3 and output the degree of relevance between the current input information and the past input information. In this case, the analysis model M3 is also an AI model that has been trained to take the current input information and past input information as input and output the degree of relevance between the current input information and the past input information. The relevance determination unit 117 may also output the degree of relevance between the current input information and the past input information using the results of a predefined function calculation regarding the similarity of various feature vectors or graph information stored in the database 30 or the like. The relevance determination unit 117 may perform the relevance check periodically.
[0129] The recording unit 151 records the current input information in the database 30, associating it with the past input information, when the relationship determination unit 117 determines that the degree of relationship between the current input information and the past input information is higher than a threshold.
[0130] The output unit 16 shown in Figure 12 provides the analysis result D40 to user U. The first information providing unit 162 displays the analysis result D40 in the virtual space W1 by outputting the analysis result D40 to the display 7. The second information providing unit 163 provides the analysis result D40 to user U through an output device such as a display in the real space W2.
[0131] The output information selection unit 164 selects recorded information D3 (written information D11 included in recorded information D3) that matches the intent of user U, which is identified from the written information D11 and user action information D12. The output information selection unit 164 may also input the written information D11 and user action information D12 into the analysis model M3 and select recorded information D3 that matches the intent of user U. In this case, the analysis model M3 is also an AI model that has been trained to take the written information D11 and user action information D12 as input and output recorded information D3 that matches the intent of user U. The first information provision unit 162 and the second information provision unit 163 provide user U with the written information D11 included in the recorded information D3 selected by the output information selection unit 164 (projected onto a display 7, etc.). User U can perform operations such as touching on the written information D11 projected onto the display 7, etc.
[0132] [Operation of the Information Processing System] Next, the operation of the information processing system 1 according to this embodiment will be described. Figure 14 is a flowchart showing an example of the processing content of the information processing method implemented in the information processing system 1.
[0133] Referring to Figure 14, the overall process by the information processing system 1 will be explained. First, the information processing system 1 is initialized (step S11). Next, the input unit 11 reads the profile information D1 recorded in the database 30 (step S12). Next, the AI personality selection unit 137 initializes the AI personality using the profile information D1 (step S13). Next, various sensor information is detected in response to the user U's writing operations and other operations of the user U (step S14). For example, the pressure sensor 207 acquires the observed value of pen pressure, and the motion sensor 208 acquires the observed values of acceleration, angular velocity, and magnetism. In addition, the camera 5 detects visual information including the user U's gestures, and the microphone 6 detects auditory information including the user U's voice.
[0134] Next, the input unit 11 acquires write information D11 and user operation information D12 using various sensor information (step S15). For example, the first acquisition unit 112 acquires the trajectory of the writing instrument 20 detected by the motion sensor 208 as write information D11 written to the virtual space W1. The second acquisition unit 113 acquires the trajectory of the writing instrument 20 detected by the camera 5 as write information D11 written to the writing surface in the real space W2. The input unit 11 acquires the actions of user U detected by the camera 5 and microphone 6 as user operation information D12.
[0135] In the processing performed by the information processing system 1, the following processes are executed: a process to generate response information D21 using the write information D11 and user operation information D12 (step S2), a process to recognize the write information D11 (step S4), and a process to analyze the write information D11 (step S6).
[0136] Figure 15 is a flowchart showing an example of the process by which the response generation unit 13 generates response information D21. First, the input format determination unit 114 determines the input format of the write information D11 and the user operation information D12 (step S21). Next, the input format integration unit 115 integrates the input formats of the write information D11 and the user operation information D12 into a common format for input to the response generation model M1 (step S22).
[0137] Next, the analysis unit 131 inputs the written information D11 and user action information D12 into the response generation model M1 and outputs the analysis results for emotional state, context of dialogue content, and intent of dialogue content (step S23). Next, the response information generation unit 133 inputs the written information D11, user action information D12, and user situation information D2 into the response generation model M1 and generates response information D21 (step S24). The response information generation unit 133 may also input the analysis results of user U's intent, etc., obtained in step S23 into the response generation model M1.
[0138] Next, the AI personality adjustment unit 138 inputs the written information D11 and user operation information D12 to the response generation model M1 to adjust the AI personality selected by the AI personality selection unit 137 (step S25). Step S25 may be performed before step S24. In this case, the response information generation unit 133 outputs response information D21 that is in line with the adjusted AI personality.
[0139] Next, the output format selection unit 161 uses the write information D11 and user operation information D12 to select the output format of the response information D21 from a plurality of output formats defined by at least one of visual, auditory, tactile, olfactory, and gustatory expressions (step S26). Then, the first information provision unit 162 and the second information provision unit 163 provide the response information D21 to the user U in the output format selected in step S26 (step S27).
[0140] Next, the feedback analysis unit 171 analyzes the user U's response trends to the response information D21 (step S28). For example, the feedback analysis unit 171 analyzes the user U's response trends by statistically processing the written information D11 and user action information D12 in relation to the response information D21. Next, the profile update unit 172 updates the profile information D1 recorded in the database 30 based on the analysis results in step S28 (step S29). After steps S21 to S29, the process of generating the response information D21 is executed.
[0141] Figures 16 and 17 are flowcharts illustrating an example of the recognition process for write information D11 performed by the recognition unit 14. First, the write information determination unit 116 determines the input format of the write information D11 (step S41). If the motion sensor 208 detects a three-dimensional trajectory of the writing instrument 20, the write information determination unit 116 acquires three-dimensional write information D14 as write information D11 (step S42). If the motion sensor 208 detects a two-dimensional trajectory of the writing instrument 20 relative to the virtual writing surface SU1, the write information determination unit 116 acquires two-dimensional write information D15 as write information D11 (step S43). If the camera 5 detects a two-dimensional trajectory of the writing instrument 20 relative to the real writing surface SU2, the write information determination unit 116 acquires two-dimensional write information D16 as write information D11 (step S44).
[0142] Next, the recognition processing unit 143 recognizes the write information D11 (step S45). In step S45, steps S49 to S52 shown in Figure 17 are executed. In step S49, the feature analysis unit 144 analyzes the write information D11. For example, the feature analysis unit 144 obtains the feature quantities (physical quantities) of the write information D11, and the association unit 145 associates the additional information D18, which indicates the feature quantities, with the write information D11. Next, the representation format selection unit 146 inputs the write information D11 and the additional information D18 into the recognition model M2 and selects the representation format (output format) of the recognition result information D30 intended by the user U (step S50). In step S50, the conversion processing unit 147 converts the input format of the write information D11 into an input format suitable for outputting the recognition result information D30 in the representation format selected by the representation format selection unit 146.
[0143] If the write information D11 obtained in step S50 indicates that processing will be performed by the AI recognition processing unit 148, then processing by the AI recognition processing unit 148 is executed (step S51). In step S51, the AI recognition processing unit 148 inputs the write information D11 and the additional information D18 to the recognition model M2 to generate recognition result information D30.
[0144] If the write information D11 obtained in step S50 indicates that it will be processed by the generation AI processing unit 149, then the processing by the generation AI processing unit 149 is executed (step S52). In step S52, the generation AI processing unit 149 inputs the write information D11 and the additional information D18 to the generation AI model, which is the recognition model M2, and generates the recognition result information D30.
[0145] Next, as shown in Figures 16 and 17, the recognition result information D30 generated by the AI recognition processing unit 148 or the generation AI processing unit 149 is output to the output unit 16 (step S46).
[0146] Next, the first information provision unit 162 outputs the recognition result information D30 to the display 7, thereby displaying the recognition result information D30 in the virtual space W1 (step S47). The recognition processing unit 143 may output the analysis results obtained by analyzing the recognition result information D30 using the recognition model M2 (step S48). After steps S41 to S52, the recognition process of the write information D11 is executed.
[0147] Figure 18 is a flowchart showing the overall process of analyzing the written information D11. First, the input format determination unit 114 determines the input format of the written information D11 (step S61). Next, the input format integration unit 115 integrates the input formats of the written information D11 into a common format for input into the analysis model M3 (step S62). In the processing by the information processing system 1, the following are performed: recording of the written information D11 (step S63), searching of the recorded information D3 (step S64), editing of the metadata D19 (step S65), and projection of the written information D11 (step S66).
[0148] Figure 19 is a flowchart showing the recording process of the write information D11. In the recording process, first, the information acquisition unit 154 acquires the write information D11 acquired by the input unit 11 (step S67). Next, the metadata generation unit 155 generates metadata D19 for the write information D11, which includes spatiotemporal information and write status information indicating the writing status of user U other than spatiotemporal information (step S68).
[0149] Next, the recording processing unit 156 acquires information including metadata D19 associated with the write information D11 as recording information D3, and records the recording information D3 in the database 30 (step S69). If the time scales are different in each of the multiple virtual spaces W1, the recording processing unit 156 performs time synchronization on the recording information D3 acquired from each of the multiple virtual spaces W1 (step S70).
[0150] Figure 20 is a flowchart showing the search process for recorded information D3. In the search process, first, the query reception unit 157 obtains the query Q entered in the input unit 11 (step S71). Next, the query reception unit 157 analyzes the user U's intent or context by analyzing the query Q (step S72).
[0151] Next, the analysis unit 158 searches the database 30 for recorded information D3 according to query Q and obtains the search results for recorded information D3 (step S73). For example, the analysis unit 158 inputs query Q into the analysis model M3 and obtains the search results for recorded information D3 for query Q. Next, the analysis unit 158 inputs the search results for recorded information D3 into the analysis model M3 and outputs the analysis result D40 for the search results for recorded information D3 (step S74). Next, the analysis result output unit 159 outputs the analysis result D40 obtained in step S74 to the output unit 16 (step S75).
[0152] Figure 21 is a flowchart showing the editing process for metadata D19. In the editing process, first, the input unit 11 acquires the write information D11 and the user operation information D12 (step S76). Next, the information modification unit 180 uses the write information D11 and the user operation information D12 to acquire the modification content of the metadata D19 of the recorded information D3 recorded in the database 30. The metadata update unit 181 reflects the modification content in the metadata D19 to acquire the modified metadata D19 (step S77).
[0153] Next, the metadata update unit 181 associates the modified metadata D19 with the write information D11, replacing the original metadata D19. This causes the metadata update unit 181 to update the metadata D19 (step S78). Next, the change saving unit 182 saves the modified information (changes) from the metadata update unit 181 (step S79).
[0154] Figure 22 is a flowchart showing the projection process of the written information D11. In the projection process, first, the output information selection unit 164 selects the recorded information D3 (written information D11 included in the recorded information D3) that matches the intention of user U, which is identified from the written information D11 and user operation information D12 (step S80). Next, the first information provision unit 162 projects the recorded information D3 selected in step S80 onto the display 7 (step S81). As a result, the recorded information D3 that matches the intention of user U is fed back to user U (step S82). After steps S61 to S82, the analysis process of the written information D11 is executed.
[0155] [Effects and Effects] The effects and benefits of the information processing system 1 and information processing method according to this embodiment, as described above, will now be explained. In this embodiment, three-dimensional writing information D14 generated by the three-dimensional trajectory of the writing instrument 20 accompanying a writing operation on the virtual space W1 is acquired, and the three-dimensional writing information D14 is recognized using a machine learning recognition model M2 that takes the three-dimensional writing information D14 as input and outputs the recognition result of the three-dimensional writing information D14. By using the machine learning recognition model M2 in this way, even if the trajectory of the writing instrument 20 accompanying a writing operation on the virtual space W1 is a complex trajectory that changes in three dimensions, the three-dimensional writing information D14 generated by that trajectory can be easily recognized. Therefore, according to the above embodiment, it is possible to easily realize the recognition of three-dimensional writing information D14.
[0156] As in this embodiment, the input unit 11 may acquire three-dimensional writing information D14 as writing information when the motion sensor 208 detects the three-dimensional trajectory of the writing instrument 20 accompanying a writing operation on the virtual space W1, acquire two-dimensional writing information D15 generated on the virtual writing surface SU1 by the two-dimensional trajectory as writing information D11 when the motion sensor 208 detects the two-dimensional trajectory of the writing instrument 20 accompanying a writing operation on the virtual writing surface SU1 set in the virtual space W1, and acquire two-dimensional writing information D16 generated on the real writing surface SU2 by the two-dimensional trajectory as writing information D11 when the camera 5 detects the two-dimensional trajectory of the writing instrument 20 accompanying a writing operation on the real writing surface SU2 in the real space W2. In this configuration, regardless of whether the input to the recognition model M2 is 3D write information D14 for the virtual space W1, 2D write information D15 for the virtual write surface SU1 of the virtual space W1, or 2D write information D16 for the real write surface SU2 of the real space W2, the write information D11 can be easily recognized. In this way, by increasing the options for the input method of user U to the recognition model M2, the convenience of user U can be improved.
[0157] As in this embodiment, the recognition result information D30 may include at least one of text information, image information, and 3D model information that reflects the written information D11. In this case, the recognition result information D30 can be provided to user U in an optimal format that aligns with user U's intentions. This can improve user U's satisfaction.
[0158] As in this embodiment, the motion sensor 208 detects at least one of the physical quantities of the speed and acceleration of the writing instrument 20 that change with the writing operation, and the user terminal 10 may include an association unit 145 that associates additional information D18 indicating physical quantities with the writing information D11 acquired by the input unit 11. In this case, for example, by using the additional information D18 associated with the writing information D11 as input to the recognition model M2, it becomes possible to accurately recognize the three-dimensional writing operation based on more information about the writing operation.
[0159] As in this embodiment, the recognition model M2 may take the write information D11 and the additional information D18 as input and output the converted write information D11, which has been converted into an output format that matches the intent of user U identified from the write information D11 and the additional information D18, as recognition result information D30. In this case, since recognition result information D30 in an output format that matches the intent of user U is provided to user U, user U satisfaction can be improved.
[0160] As in this embodiment, the output unit 16 may output the recognition result information D30 to the display 7, thereby displaying the recognition result information D30 in the virtual space W1. In this case, by providing the recognition result information D30 to the user U through the virtual space W1, a high sense of immersion in the virtual space W1 can be given to the user.
[0161] [Differentiation] The embodiments described above have been explained in detail. However, the disclosure is not limited to the embodiments described above. Various modifications are possible without departing from the spirit of the disclosure.
[0162] In the embodiment described above, a writing instrument 20 was used as an example of an input device. However, the input device may be any device other than the writing instrument 20, as long as it is movable in conjunction with the user's writing operation. For example, the input device may be a mouse held in the user U's hand, or a ring-shaped device worn on the user U's finger.
[0163] In the information processing system 1, the confidentiality level of the information may be labeled for the written information D11 and the user operation information D12. The confidentiality level of various types of information may be set by the user U, or it may be set by the AI model using the various types of information as input. In the information processing system 1, various types of information may be generated according to the confidentiality level of the information.
[0164] The processing procedures performed in the information processing system 1 are not limited to the examples shown in the embodiments described above. For example, some of the steps (processes) described above may be omitted, or each step may be performed in a different order. In addition, any two or more of the steps described above may be combined, or some of the steps may be modified or deleted. Alternatively, other steps may be performed in addition to each of the above steps.
[0165] [Note] The gist of the present invention is outlined below. [1] An input device held by the user and movable in conjunction with the user's write operation, A display device that displays a virtual space corresponding to the user's position in the real space surrounding them, A sensor disposed around the input device or the input device, which detects the trajectory of the input device during the writing operation, A processing unit which is communicatively connected to the display device and the sensor and processes write information generated by the trajectory of the input device detected by the sensor, Equipped with, The aforementioned processing unit is An input unit that, when the sensor detects the three-dimensional trajectory of the input device accompanying the writing operation to the virtual space, acquires the three-dimensional writing information generated by the three-dimensional trajectory as the writing information, A recognition unit that recognizes the written information using a machine learning-trained recognition model that takes the written information as input and outputs the recognition result of the written information, An information processing system including an output unit that outputs recognition result information indicating the aforementioned recognition result. [2] The sensor is A first trajectory detection sensor is placed on the input device and detects the trajectory of the input device accompanying the write operation to the virtual space, The input device includes a second trajectory detection sensor positioned around the input device and detecting the trajectory of the input device accompanying the writing operation on the writing surface in real space, The input device includes the first trajectory detection sensor and is a pen-type device capable of writing on the writing surface in the real space. The aforementioned input unit is When the first trajectory detection sensor detects the three-dimensional trajectory of the input device associated with the writing operation to the virtual space, the three-dimensional writing information is acquired as the writing information. When the first trajectory detection sensor detects the two-dimensional trajectory of the input device associated with the writing operation on the virtual writing surface set in the virtual space, the two-dimensional writing information generated on the virtual writing surface by the two-dimensional trajectory is acquired as the writing information. The information processing system according to [1], wherein when the second trajectory detection sensor detects a two-dimensional trajectory of the input device associated with the writing operation on the real writing surface existing in the real space, the system acquires two-dimensional writing information generated on the real writing surface by the two-dimensional trajectory as the writing information. [3] The information processing system according to [1] or [2], wherein the recognition result information includes at least one of text information, image information, and three-dimensional model information, in which the written information is reflected. [4] The sensor detects at least one of the physical quantities of the speed and acceleration of the input device that change in conjunction with the write operation, The processing unit further includes an association unit that associates additional information indicating the physical quantity with the written information acquired by the input unit, as described in any one of [1] to [3]. [5] The information processing system according to [4], wherein the recognition model takes the written information and the additional information as input and outputs the converted written information, which has been converted into an output format that matches the user's intent identified from the written information and the additional information, as the recognition result information. [6] The information processing system according to any one of [1] to [5], wherein the output unit outputs the recognition result information to the display device, thereby displaying the recognition result information in the virtual space. [7] A step of displaying a virtual space corresponding to the user's position in the real space around them, The steps include detecting the trajectory of the input device associated with the user's writing operation, The steps include processing the write information generated by the trajectory of the input device, Equipped with, The step of processing the aforementioned written information is: When the three-dimensional trajectory of the input device associated with the writing operation to the virtual space is detected, the three-dimensional writing information generated by the three-dimensional trajectory is acquired as the writing information. A step of recognizing the written information using a machine learning-trained recognition model that takes the written information as input and outputs the recognition result of the written information, An information processing method comprising the step of outputting recognition result information indicating the aforementioned recognition result. [Explanation of Symbols]
[0166] 1...Information processing system, 5...Camera (second trajectory detection sensor), 7...Display (display device), 10, 10A, 10B...User terminal (processing unit), 11...Input unit, 14...Recognition unit, 16...Output unit, 20...Writing instrument (input device), 145...Association unit, 208...Motion sensor (first trajectory detection sensor), D11...Writing information (first input information), D14...3D writing information, D15, D16...2D writing information, D18...Additional information, D30...Recognition result information, M2...Recognition model, SU1...Virtual writing surface, SU2...Real writing surface, U...User, W1...Virtual space, W2...Real space.
Claims
1. An input device held by the user and movable in conjunction with the user's write operation, A display device that displays a virtual space corresponding to the user's position in the real space surrounding them, A sensor disposed around the input device or the input device, which detects the trajectory of the input device during the writing operation, A processing unit which is communicatively connected to the display device and the sensor and processes write information generated by the trajectory of the input device detected by the sensor, Equipped with, The aforementioned processing unit is An input unit that, when the sensor detects the three-dimensional trajectory of the input device accompanying the writing operation to the virtual space, acquires the three-dimensional writing information generated by the three-dimensional trajectory as the writing information, A recognition unit that recognizes the written information using a machine learning-trained recognition model that takes the written information as input and outputs the recognition result of the written information, An information processing system including an output unit that outputs recognition result information indicating the aforementioned recognition result.
2. The aforementioned sensor is A first trajectory detection sensor is placed on the input device and detects the trajectory of the input device accompanying the write operation to the virtual space, The input device includes a second trajectory detection sensor positioned around the input device and detecting the trajectory of the input device accompanying the writing operation on the writing surface in real space, The input device includes the first trajectory detection sensor and is a pen-type device capable of writing on the writing surface in the real space. The aforementioned input unit is When the first trajectory detection sensor detects the three-dimensional trajectory of the input device associated with the writing operation to the virtual space, the three-dimensional writing information is acquired as the writing information. When the first trajectory detection sensor detects the two-dimensional trajectory of the input device associated with the writing operation on the virtual writing surface set in the virtual space, the two-dimensional writing information generated on the virtual writing surface by the two-dimensional trajectory is acquired as the writing information. The information processing system according to claim 1, wherein when the second trajectory detection sensor detects a two-dimensional trajectory of the input device associated with the writing operation on the actual writing surface existing in the real space, the system acquires two-dimensional writing information generated on the actual writing surface by the two-dimensional trajectory as the writing information.
3. The information processing system according to claim 1 or 2, wherein the recognition result information includes at least one of text information, image information, and three-dimensional model information, in which the written information is reflected.
4. The sensor detects at least one of the physical quantities of the input device's speed and acceleration, which change in conjunction with the writing operation. The information processing system according to claim 1 or 2, wherein the processing unit further includes an association unit that associates additional information indicating the physical quantity with the written information acquired by the input unit.
5. The information processing system according to claim 4, wherein the recognition model takes the written information and the additional information as input and outputs the converted written information, which has been converted into an output format that matches the user's intent identified from the written information and the additional information, as the recognition result information.
6. The information processing system according to claim 1 or 2, wherein the output unit outputs the recognition result information to the display device, thereby causing the recognition result information to be displayed in the virtual space.
7. The steps include: displaying a virtual space corresponding to the user's location in the real world; The steps include detecting the trajectory of the input device associated with the user's writing operation, The steps include processing the write information generated by the trajectory of the input device, Equipped with, The step of processing the aforementioned written information is: When the three-dimensional trajectory of the input device associated with the writing operation to the virtual space is detected, the three-dimensional writing information generated by the three-dimensional trajectory is acquired as the writing information. A step of recognizing the written information using a machine learning-trained recognition model that takes the written information as input and outputs the recognition result of the written information, An information processing method comprising the step of outputting recognition result information indicating the aforementioned recognition result.