Information processing system, information processing method, and program

The information processing system addresses the challenge of generating diverse and consistent motion data by masking and reconstructing motion features based on user intent, effectively reducing costs and improving efficiency in virtual model operations.

WO2026120940A1PCT designated stage Publication Date: 2026-06-11SONY GROUP CORP

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SONY GROUP CORP
Filing Date
2025-10-24
Publication Date
2026-06-11

AI Technical Summary

Technical Problem

Existing technologies struggle to generate diverse motion data that reflects user intent and ensures overall consistency, particularly in operating virtual models using motion data.

Method used

An information processing system that includes a masking unit to mask motion features and a reconstruction unit to generate new motion data based on user intent features, using a neural network to reconstruct motion features in masked areas.

🎯Benefits of technology

Generates diverse motion data that reflects user intentions while ensuring overall consistency, reducing the time and cost associated with generating motion data from scratch.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure JP2025037417_11062026_PF_FP_ABST
    Figure JP2025037417_11062026_PF_FP_ABST
Patent Text Reader

Abstract

[Problem] To generate various motion data items that reflect intentions of users. [Solution] Provided is an information processing system comprising: a mask unit that masks a part of motion features extracted from motion data items; and a reconstruction unit that reconstructs, on the basis of an intention feature extracted from a user instruction, the motion features in a mask region masked by the mask unit.
Need to check novelty before this filing date? Find Prior Art

Description

Information Processing System, Information Processing Method, and Program 【0001】 The present disclosure relates to an information processing system, an information processing method, and a program. 【0002】 In recent years, technologies for operating virtual models using motion data have become widespread. Also, for example, as disclosed in Patent Document 1, technologies for generating new motion data by utilizing existing motion data have been developed. 【0003】 Japanese Patent Application Laid-Open No. 2024-113487 【0004】 However, the technology disclosed in Patent Document 1 simply connects parts of existing motion data and it is difficult to newly generate diverse motion data. 【0005】 According to one aspect of the present disclosure, there is provided an information processing system including a masking unit that masks a part of motion features extracted from motion data, and a reconstruction unit that reconstructs the motion features in a masked area masked by the masking unit based on intent features extracted from an instruction by a user. 【0006】 Also, according to another aspect of the present disclosure, there is provided an information processing method in which a processor masks a part of motion features extracted from motion data and reconstructs the motion features in a masked area based on intent features extracted from an instruction by a user. 【0007】 Also, according to another aspect of the present disclosure, there is provided a program for causing a computer to function as an information processing system including a masking unit that masks a part of motion features extracted from motion data, and a reconstruction unit that reconstructs the motion features in a masked area masked by the masking unit based on intent features extracted from an instruction by a user. 【0008】This is a diagram illustrating an overview of an information processing method according to one embodiment of the present disclosure. This is a block diagram showing an example of a functional configuration for reconstructing motion data according to the same embodiment. This is a block diagram showing an example of the device configuration of the information processing system 1 according to the same embodiment. This is a block diagram showing an example of the functional configuration of a user terminal 20 according to the same embodiment. This is a block diagram showing an example of the functional configuration of a server 30 according to the same embodiment. This is a flowchart showing an example of the motion data generation flow according to the same embodiment. This is a diagram showing an example of the configuration of a neural network that generates motion data according to the same embodiment. This is a diagram showing an example of an interface 315 controlled by an interface control unit 310 according to the same embodiment. This is a diagram showing an example of the presentation of the results of reconstructing motion data and applying the reconstructed motion data to a model according to the same embodiment. This is a diagram illustrating an interface 315 provided to a head-mounted display according to the same embodiment. This is a block diagram showing an example of the hardware configuration of an information processing device 90 according to the same embodiment. 【0009】 Preferred embodiments of this disclosure will be described in detail below with reference to the attached drawings. In this specification and the drawings, components having substantially the same functional configuration are denoted by the same reference numerals, and redundant descriptions will be omitted. 【0010】 Furthermore, in this specification and drawings, when describing multiple identical components to distinguish them, letters or other symbols may be added to the end of the reference numerals. On the other hand, when there is no need to distinguish multiple identical components, the letters or other symbols may be omitted, and a description common to all identical components may be provided. 【0011】 The explanation will be presented in the following order: 1. Embodiments 1.1. Overview 1.2. Example Functional Configuration 1.3. Processing Details 2. Example Hardware Configuration 3. Summary 【0012】 <1. Embodiments> <<1.1. Overview>> As described above, the technology of using motion data to operate virtual models is becoming widespread. 【0013】Motion data is generated, for example, by capturing human movement using tracking devices. However, generating motion data from scratch requires time, money, and other costs. 【0014】 To reduce the costs mentioned above, technologies have been developed to generate new motion data by utilizing existing motion data. 【0015】 However, many existing technologies have room for improvement in areas such as generating diverse motion data, generating motion data that reflects user intent, and generating motion data that ensures overall consistency. 【0016】 The technical concept of one embodiment of this disclosure was conceived with the above-mentioned points in mind, and aims to generate diverse motion data that reflects the user's intentions while ensuring overall consistency. 【0017】 Figure 1 is a diagram illustrating an overview of an information processing method according to one embodiment of the present disclosure. 【0018】 One of the features of the information processing method according to this embodiment is that it generates new motion data M1 to M3, etc., by reconstructing a part of the existing motion data M0 based on instructions from the user 50. 【0019】 More specifically, the information processing method according to this embodiment reconstructs a portion of the features (also referred to as motion features) extracted from existing motion data M0 based on features (also referred to as intention features) extracted from instructions by user 50. 【0020】 As illustrated in Figure 1, instructions from user 50 may be given by voice S1. Alternatively, instructions from user 50 may be given by text, images, etc. 【0021】 Furthermore, user 50 may change the intervals of the motion features to be reconstructed using the interface 315 (see Figure 8), which will be described later. 【0022】The information processing system 1 (see Figures 2 to 4) that executes the information processing method according to this embodiment generates a variety of motion data M1 to M3, etc., with different reconstruction intervals, etc., based on the various inputs from the user 50 as described above, while reflecting the intentions of the user 50. 【0023】 The following describes in detail an example of the functional configuration of the information processing system 1 that realizes the above. In the following description, the main example is the case in which the information processing system 1 according to this embodiment generates motion data to be applied to a model placed in a music live event held in the metaverse space. However, the information processing system 1 may also generate motion data to be applied to a model placed in another virtual space such as a game space. 【0024】 <<1.2. Example of Functional Configuration>> First, an example of the functional configuration related to the reconstruction of motion data among the configurations of the information processing system 1 according to this embodiment will be described. 【0025】 Figure 2 is a block diagram showing an example of a functional configuration related to the reconstruction of motion data according to this embodiment. 【0026】 As shown in Figure 2, the information processing system 1 according to this embodiment includes a voice input unit 112, a text input unit 114, and an image input unit 116. 【0027】 The voice input unit 112, the text input unit 114, and the image input unit 116 are mainly used for inputting instructions from the user. 【0028】 The voice input unit 112 is equipped with a microphone for collecting user speech. 【0029】 The text input unit 114 includes a keyboard, touch panel, etc., used by the user to input text. 【0030】 The image input unit 116 is equipped with a camera that captures the user's actions. 【0031】 Furthermore, as shown in Figure 2, the information processing system 1 according to this embodiment includes a speech recognition unit 122, a text feature extraction unit 124, and an image feature extraction unit 126. 【0032】 The speech recognition unit 122 uses a recognition device to convert the speech collected by the speech input unit 112 into text. 【0033】 The text feature extraction unit 124 extracts features (also referred to as text features) of the text input from the text input unit 114 or the text input from the speech recognition unit 122. 【0034】 The text feature extraction unit 124 may be implemented using, for example, a natural language processing model such as LLM (Large Language Models). 【0035】 The image feature extraction unit 126 extracts features (also referred to as image features) from the image input from the image input unit 116. 【0036】 Furthermore, if the image feature extraction unit 126 receives multiple images (multiple still images constituting a video) in a time-series order from the image input unit 116, it may extract a single image feature from the set of multiple images. 【0037】 The text features extracted by the text feature extraction unit 124 and the image features extracted by the image feature extraction unit 126 can be said to be characteristics of the user's intent (also referred to as intent features). 【0038】 Users input instructions using text or voice, including their intentions regarding how they want to reconstruct (edit) the motion data, such as "raise both hands." In this case, the text features can be considered features that reflect the user's intentions, such as "raise both hands." 【0039】 Furthermore, users may provide instructions that include the above-mentioned intentions through images (videos). For example, a user might input "an image of someone raising both hands." In this case, the image features can be said to be features that reflect the user's intention, such as "raising both hands." 【0040】 As shown in Figure 2, the information processing system 1 according to this embodiment includes a motion selection unit 130, a motion clipping unit 132, a motion feature extraction unit 134, a music clipping unit 136, and a music feature extraction unit 138. 【0041】The motion selection unit 130 is configured for the user to specify the motion data serving as the editing source and the section of the motion data. 【0042】 The motion selection unit 130 includes, for example, an input device such as a mouse, a touch panel, or a keyboard. 【0043】 Based on the input from the motion selection unit 130, the motion cutting unit 132 cuts out the section specified by the user from the motion data specified by the user, and outputs the cut motion data to the motion feature extraction unit 134. 【0044】 The motion feature extraction unit 134 extracts features (also referred to as motion features) from the motion data input from the motion cutting unit 132. 【0045】 Based on the input from the motion selection unit 130, the music cutting unit 136 cuts out the music data associated with the motion data of the section specified by the user, and outputs the cut music data to the music feature extraction unit 138. 【0046】 The music feature extraction unit 138 extracts features (also referred to as music features) from the music data input from the music cutting unit 136. 【0047】 The music features reflect the tone, pitch, timbre, harmony, etc. of the music data. 【0048】 [[ID=2Y]]As shown in FIG. 2, the information processing system 1 according to the present embodiment includes a model attribute input unit 140. 【0049】 The model attribute input unit 140 is configured for the user to specify the attributes of the model to which the reconfigured motion data is applied. 【0050】 The model attribute input unit 140 includes, for example, an input device such as a mouse, a touch panel, or a keyboard. 【0051】 As shown in FIG. 2, the information processing system 1 according to the present embodiment includes a mask area determination unit 152, a motion feature mask unit 154, a motion reconstruction unit 156, and a motion combination unit 158. 【0052】 The mask region determination unit 152 determines a portion of the motion features to be masked (also referred to as the mask region). 【0053】 For example, the mask region determination unit 152 may determine the mask region based on the model attributes input from the model attribute input unit 140. 【0054】 For example, the mask area determination unit 152 may determine the mask area based on a degree specified by the user. 【0055】 For example, the mask region determination unit 152 may determine the mask region based on musical characteristics. 【0056】 For example, the mask region determination unit 152 may determine the mask region randomly. 【0057】 Furthermore, for example, the mask region determination unit 152 may determine the mask region based on multiple elements, including model attributes, user-specified degrees, musical features, and randomness. 【0058】 By determining the mask area based on the elements described above, it is possible to increase the variations in the reconstructed motion data. 【0059】 The motion feature masking unit 154 masks motion features that correspond to the mask region determined by the mask region determination unit 152 in the motion features input from the motion feature extraction unit 134. 【0060】 The motion feature mask unit 154 according to this embodiment is an example of a mask unit that masks a portion of the motion features extracted from motion data. 【0061】 The motion reconstruction unit 156 (also simply referred to as the reconstruction unit) reconstructs the motion features in the masked region masked by the motion feature mask unit 154 based on the intent features. 【0062】 By reconstructing motion features based on intent features, it is possible to generate motion data that reflects the user's editing intent. 【0063】 Furthermore, the motion reconstruction unit 156 may reconstruct the motion features in the mask region based on reference features extracted from time-series reference data associated with the motion data. 【0064】 One example of the above-mentioned reference data is music data. 【0065】 In a music live performance within the metaverse, it can be said that there is a correspondence between the music data (time-series data) related to the performers' performance and the motion data (time-series data) applied to the model representing the audience. 【0066】 In this way, by reconstructing motion features based on reference features extracted from time-series reference data associated with motion data, it is possible to generate consistent motion data suitable for target events such as music concerts. 【0067】 Other examples of reference data include audio data related to performers' microphone performances, call-and-response interactions between performers and audiences, and motion data applied to existing models placed in the metaverse space. 【0068】 The motion coupling unit 158 ​​combines the motion features reconstructed by the motion reconstruction unit 156 with the motion features outside the mask region and decodes them into motion data. 【0069】 Furthermore, the motion coupling unit 158 ​​may perform processing such as linear interpolation in order to achieve smoother coupling of motion features. 【0070】 The above describes an example of the functional configuration related to the reconstruction of motion data among the components of the information processing system 1. 【0071】 Next, we will describe an example of the device configuration of the information processing system 1. Figure 3 is a block diagram showing an example of the device configuration of the information processing system 1 according to this embodiment. 【0072】As shown in Figure 3, the information processing system 1 according to this embodiment may include a user terminal 20 and a server 30. 【0073】 The user terminal 20 is a terminal used by the user. The user terminal 20 may be, for example, a PC (Personal Computer), a tablet, a head-mounted display (HMD), etc. 【0074】 Server 30 is a computer that performs tasks such as reconstructing motion data and controlling the interface 315. 【0075】 The user terminal 20 and the server 30 are connected via a network to enable them to communicate information with each other. 【0076】 Figure 4 is a block diagram showing an example of the functional configuration of the user terminal 20 according to this embodiment. 【0077】 As shown in Figure 4, the user terminal 20 according to this embodiment includes an input unit 210, a display unit 220, a control unit 230, and a communication unit 240. 【0078】 The input unit 210 accepts input from the user. The input unit 210 functions as the voice input unit 112, text input unit 114, image input unit 116, motion selection unit 130, and model attribute input unit 140 described above. 【0079】 The display unit 220 displays visual information according to the control unit 230. For this purpose, the display unit 220 is equipped with a display. 【0080】 The control unit 230 controls each component of the user terminal 20. 【0081】 Furthermore, the control unit 230 cooperates with the server 30 to control the interface 315. 【0082】 The functions of the control unit 230 are realized through the cooperation of various processors and memory. 【0083】 The communication unit 240 communicates information with the server 30 in accordance with the control unit 230. 【0084】For example, the communication unit 240 transmits the information input using the input unit 110 to the server 30. 【0085】 Furthermore, for example, the communication unit 240 receives control signals related to the interface 315. 【0086】 Next, an example of the functional configuration of the server 30 according to this embodiment will be described. Figure 5 is a block diagram showing an example of the functional configuration of the server 30 according to this embodiment. 【0087】 As shown in Figure 5, the server 30 according to this embodiment includes an interface control unit 310, a cropping unit 320, a feature extraction unit 330, a motion generation unit 340, a model application unit 350, and a communication unit 360. 【0088】 The interface control unit 310 controls the interface 315 in cooperation with the user terminal 20. 【0089】 Details of the interface 315 according to this embodiment will be described later. 【0090】 The trimming unit 320 functions as the motion trimming unit 132 and the music trimming unit 136 described above. 【0091】 The feature extraction unit 330 functions as the speech recognition unit 122, text feature extraction unit 124, image feature extraction unit 126, motion feature extraction unit 134, and music feature extraction unit 138 described above. 【0092】 The motion generation unit 340 functions as the mask region determination unit 152, motion feature mask unit 154, motion reconstruction unit 156, and motion coupling unit 158 ​​described above. 【0093】 The model application unit 350 applies the motion data reconstructed by the motion generation unit 340 to the model. 【0094】 The model application unit 350 may apply motion data to an existing model, or it may generate a new model and apply motion data to that model. 【0095】The functions of each of the interface control unit 310, the cropping unit 320, the feature extraction unit 330, the motion generation unit 340, and the model application unit 350 are realized through the cooperation of various processors and memory. 【0096】 The communication unit 360 communicates information with the user terminal 20 in accordance with the control of the interface control unit 310. 【0097】 For example, the communication unit 360 transmits control signals related to the interface 315 to the user terminal 20. 【0098】 Furthermore, for example, the communication unit 360 receives various types of information entered by the user from the user terminal 20. 【0099】 The above describes an example of the functional configuration of the information processing system 1 according to this embodiment. However, the above functional configuration described with reference to Figures 2 to 5 is merely an example, and the functional configuration of the information processing system 1 is not limited to this example. 【0100】 For example, each configuration shown in Figure 5 may be distributed and provided across multiple computers. 【0101】 Furthermore, while the above examples illustrate how user intent may be indicated by text, audio, or images, such indications may also be provided by information captured from user actions. Alternatively, such indications may be provided by specifying other motion data to be used as reference. 【0102】 The functional configuration of the information processing system 1 according to this embodiment can be flexibly modified according to specifications, operation, etc. 【0103】 <<1.3. Processing Details>> Next, the processing performed by the information processing system 1 according to this embodiment will be described in more detail. 【0104】 First, we will explain, with an example, the flow of motion data generation (reconstruction) by the information processing system 1 according to this embodiment. 【0105】 Figure 6 is a flowchart showing an example of the motion data generation flow according to this embodiment. 【0106】In the example shown in Figure 6, the information processing system 1 first receives input from the user (S101). 【0107】 In step S101, the information processing system 1 receives inputs of information such as instructions related to the editing intent, motion data to be used as the basis for reconstruction, the section to be reconstructed, and the attributes of the model to which the reconstructed motion data will be applied. 【0108】 Next, the information processing system 1 performs intent feature extraction (S102), motion feature extraction (S103), and music feature extraction (S104) based on the information received in step S101. 【0109】 Furthermore, the information processing system 1 determines the mask region (S105). The information processing system 1 may determine the mask region based on the model attributes received in step S101, the musical features extracted in step S104, etc. 【0110】 Next, the information processing system 1 generates (reconstructs) motion data based on the intent features extracted in step S102, the musical features extracted in step S104, the mask region determined in step S105, etc. (S106). 【0111】 Next, the information processing system 1 controls the presentation of the motion data generated in step S106 (S107). 【0112】 The above provides an example of the flow of motion data generation (reconstruction) by the information processing system 1 according to this embodiment. 【0113】 Next, the generation of motion data according to this embodiment will be described in more detail. 【0114】 The generation of motion data according to this embodiment may be achieved using a neural network. 【0115】 Figure 7 shows an example of the configuration of a neural network that generates motion data according to this embodiment. 【0116】As shown in Figure 7, the motion generation unit 340 may include an encoder 342, a mask network 344, a mask filler 346, and a decoder 348. 【0117】 The encoder 342 is a network that encodes the input motion data M0 into feature vectors. 【0118】 The encoder 342 corresponds to the motion feature extraction unit 134 described above. 【0119】 The encoder 342 may, for example, divide the motion data M0, which is time-series data, into predetermined time intervals and convert them into predetermined feature vectors. 【0120】 In the example shown in Figure 7, the encoder 342 converts the motion data M0 into six feature vectors. 【0121】 The mask network 344 is a network that determines a mask region and masks that mask region. 【0122】 The mask network 344 corresponds to the mask region determination unit 152 and the motion feature mask unit 154 described above. 【0123】 The mask network 344 determines the mask region based on the attributes of the input model. In this case, the mask network 344 may determine the mask region in units of the feature vectors transformed by the encoder 342. 【0124】 In the example shown in Figure 7, the mask network 344 determines the third and fifth feature vectors from the top as mask regions and masks these feature vectors. 【0125】 In Figure 7, unmasked feature vectors are shown as dots, while masked feature vectors are shown as solid colors. 【0126】 Furthermore, the ability of the mask network 344 to determine the mask region may be acquired through learning that involves manual evaluation of the attributes of the input model and the output mask region. 【0127】The mask filler 346 is a network that reconstructs motion features in the mask region based on intent features, musical features, etc. 【0128】 The mask filler 346 corresponds to the motion reconstruction unit 156 described above. 【0129】 In the example shown in Figure 7, the mask filler 346 reconstructs the third and fifth feature vectors from the top that were masked. 【0130】 In Figure 7, the reconstructed feature vectors are indicated by diagonal lines. 【0131】 The mask filler 346 may be implemented using, for example, a recurrent neural network (RNN), a Transformer's self-attention mechanism, or the like. 【0132】 Decoder 348 is a network that decodes feature vectors and converts them into new motion data Mn. 【0133】 The decoder 348 corresponds to the motion coupling unit 158 ​​described above. 【0134】 In the example shown in Figure 7, the decoder 348 converts the six feature vectors, including the reconstructed feature vectors, into motion data Mn. 【0135】 The above describes an example of the configuration of a neural network that generates motion data according to this embodiment. 【0136】 With the network configuration described above, it becomes possible to generate diverse motion data that reflects the user's intentions. 【0137】 However, the network configuration described above with reference to Figure 7 is merely an example, and the network configuration according to this embodiment is not limited to such an example. 【0138】For example, the motion generation unit 340 may reconstruct motion data using a Generative AI. In this case, the user's intent may be given to the Generative AI as a prompt. 【0139】 Next, a specific example of the interface 315 according to this embodiment will be described. 【0140】 Figure 8 shows an example of an interface 315 controlled by the interface control unit 310 according to this embodiment. 【0141】 As shown in Figure 8, the preview 500 is displayed on the interface 315. 【0142】 Preview 500 is time-series data rendering the content of a music live performance in the metaverse space. 【0143】 The user may use bars, buttons, etc., that are positioned in conjunction with Preview 500 to perform actions such as playback, stopping, and specifying the playback position. 【0144】 Furthermore, the user may select motion data to be used for motion feature extraction, i.e., motion data to be used as the basis for reconstruction, by, for example, clicking on an existing model 510 included in the preview 500. 【0145】 In the example shown in Figure 8, the user selects the motion data applied to model 510B by clicking on model 510B included in preview 500. 【0146】 Thus, the interface control unit 310 according to this embodiment accepts the selection of motion data to be extracted via the interface 315. 【0147】 The interface 315 displays motion data 520 selected by the user, and music data 530 associated with the motion data 520. 【0148】The user may select the section for which motion features are to be extracted by specifying a range on the motion data 520, the music data 530, etc. 【0149】 In the example shown in Figure 8, the section selected by the user is indicated by a thick line. 【0150】 Thus, the interface control unit 310 according to this embodiment accepts the selection of a section in the motion data from which motion features are to be extracted via the interface 315. 【0151】 Furthermore, the interface 315 is equipped with fields, icons, and other components for the user to input instructions related to their intentions. 【0152】 Field 542 is used for the user to enter text indicating their intent. 【0153】 Icon 544 is used for the user to input voice to indicate their intention. 【0154】 Additionally, icon 546 is used for the user to input an image to indicate their intention. 【0155】 Thus, the interface control unit 310 according to this embodiment receives input of instructions relating to the user's intentions via the interface 315. 【0156】 Furthermore, the interface 315 includes components for specifying the degree of change when reconstructing motion data. 【0157】 The indicator 550 shown in Figure 8 is an example of the above-mentioned component. 【0158】 In the example shown in Figure 8, the user may specify the degree to which the motion data to be reconstructed from the motion data 520 is modified by operating the indicator 550. 【0159】 Thus, the interface control unit 310 according to this embodiment receives a specification of the degree of modification to the reconfiguration via the interface 315. 【0160】Furthermore, if the user specifies the degree of change, the motion generation unit 340 determines the mask area based at least on the specified degree of change. 【0161】 Furthermore, the interface 315 includes components such as fields and checkboxes for specifying the attributes of the model to which the reconstructed motion data is applied. 【0162】 Region 560 in Figure 8 is an example of a region where the above-mentioned component is placed. 【0163】 In the example shown in Figure 8, the user may be able to specify the number of models to which the reconstructed motion data is applied for each type of fan base (core, casual, first-time viewer, companion, etc.). 【0164】 The user may be able to specify the emotion of the model to which the reconstructed motion data is applied. 【0165】 Furthermore, the user may be able to specify the position where the model to which the reconstructed motion data will be applied will be placed. 【0166】 Note that the types of fan bases, emotions, and the placement of the models are examples of model attributes. 【0167】 Other examples of model attributes include gender, age, and social status (including occupation, etc.). 【0168】 Thus, the interface control unit 310 according to this embodiment receives specifications such as the attributes, number, and position of the models to which the reconstructed motion data will be applied via the interface 315. 【0169】 Furthermore, the interface 315 is equipped with components for the user to instruct the execution of motion reconstruction. 【0170】 The button 570 shown in Figure 8 is an example of the above-mentioned component. 【0171】 When the user presses button 570, the interface control unit 310 acquires the information entered at interface 315. 【0172】Furthermore, the cropping unit 320, feature extraction unit 330, motion generation unit 340, and model application unit 350 perform processing based on the information acquired by the interface 315. 【0173】 Furthermore, the interface control unit 310 controls the reconstruction of motion data and the presentation of the results of applying the reconstructed motion data to the model. 【0174】 Figure 9 shows an example of the reconstruction of motion data according to this embodiment and the results of applying the reconstructed motion data to a model. 【0175】 The upper part of Figure 9 shows a preview 500A before the motion data is reconstructed, and the lower part of Figure 9 shows a preview 500B after the motion data has been reconstructed and applied to the model. 【0176】 Thus, the interface control unit 310 may reflect the above results in the preview 500 and present a metaverse space (an example of a virtual space) in which the model to which the reconstructed motion data has been applied is placed. 【0177】 With an interface 315 like the one exemplified above, users can work while checking the reconstructed motion data and the virtual space where the model to which the motion data is applied is placed, greatly improving work efficiency. 【0178】 Note that the interface 315 described using Figure 8 is merely an example, and the functions of the interface control unit 310 according to this embodiment are not limited to the control described above. 【0179】 For example, the interface control unit 310 may provide one or more examples of pairs of a set of pre-configured attributes or a set of randomly selected attributes and a virtual space generated based on said set. 【0180】 In this case, users can intuitively understand the relationship between the set of attributes and the generated result, and simply by selecting their preferred set, they can reflect the generated result in the virtual space. 【0181】 Furthermore, Figure 8 shows an example where the interface 315 is provided to a user terminal 20, such as a PC or tablet. 【0182】 On the other hand, the interface control unit 310 can also provide an interface 315 to the user terminal 20, which is a head-mounted display. 【0183】 Figure 10 is a diagram illustrating the interface 315 provided to the head-mounted display according to this embodiment. 【0184】 As shown in Figure 10, the interface control unit 310 causes the preview 500 to be displayed on the user terminal 20, which is a head-mounted display. 【0185】 According to this, user 50 can enter the metaverse space and perform editing work, and can grasp the results of the editing more realistically. 【0186】 In this case, user 50 may input the editing content by uttering S0 or by using the virtual panel 600. The virtual panel 600 will have components equivalent to those illustrated in Figure 8. 【0187】 Furthermore, if the tracking device can capture the user 50's actions, the user 50 may also input their intentions through their own actions. 【0188】 When user 50 performs an action along with an utterance S0 such as "make this kind of movement," the feature extraction unit 330 extracts intent features from the information of user 50's actions captured by the tracking device. 【0189】 The processing performed by the information processing system 1 according to this embodiment has been described above. According to the information processing system 1 according to this embodiment, it is possible to easily edit motion data and models used in music live performances and the like in the metaverse space. 【0190】 Furthermore, the information processing system 1 can also generate motion data during a live music performance and apply it to the model. 【0191】 In this case, the information processing system 1, for example, performs a reconstruction of motion data from M to N seconds after the current time. 【0192】 In this case, the information processing system 1 may use music data from M to N seconds later, estimated from the currently playing music or a pre-set progression scenario, as reference data. Alternatively, the information processing system 1 may use motion data from O to P seconds prior to the current time as reference data. 【0193】 Regarding the model attributes, the attributes that have been pre-configured for the model to be reconstructed may be used. 【0194】 In this case, the intentional features may also be extracted from information regarding the performer's actions, speech, emotions, etc. 【0195】 For example, it's not uncommon to see performers asking the audience to raise their hands or wave their hands in time with the music at real-life music concerts. 【0196】 The information processing system 1 can estimate the performer's requests, i.e., intentions, and extract intention features from the information used for estimation, thereby enabling the model to perform motions that reflect the performer's intentions after M to N seconds. 【0197】 Furthermore, the information processing system 1 may select a model to be reconstructed based on the presenter's gaze point, field of view, etc. 【0198】 For example, the information processing system 1 may choose to reconstruct models that are close to the performer's point of gaze and models that are located within the performer's field of vision. On the other hand, the information processing system 1 may choose not to reconstruct models that are far from the performer's point of gaze and models that are located outside the performer's field of vision. 【0199】 The control methods described above can reduce the load required to reconstruct motion data. 【0200】 <2. Hardware Configuration Example> Next, a hardware configuration example common to the user terminal 20 and server 30 according to one embodiment of this disclosure will be described. 【0201】 Figure 11 is a block diagram showing an example of the hardware configuration of an information processing device 90 according to one embodiment of the present disclosure. The information processing device 90 may be a device having a hardware configuration equivalent to that of the above-described devices. 【0202】 As shown in Figure 11, the information processing device 90 includes, for example, a processor 871, a ROM 872, a RAM 873, a host bus 874, a bridge 875, an external bus 876, an interface 877, an input device 878, an output device 879, a storage device 880, a drive 881, a connection port 882, and a communication device 883. Note that the hardware configuration shown here is just an example, and some of the components may be omitted. Furthermore, the information processing device 90 may include components other than those shown here. 【0203】 (Processor 871) The processor 871 functions, for example, as an arithmetic processing unit or a control unit, and controls the overall operation or part thereof of each component based on various programs recorded in the ROM 872, RAM 873, storage 880, or removable storage medium 901. 【0204】 (ROM 872, RAM 873) ROM 872 is a means for storing programs loaded into the processor 871 and data used for calculations. RAM 873 temporarily or permanently stores, for example, programs loaded into the processor 871 and various parameters that change as needed when executing those programs. 【0205】 (Host bus 874, bridge 875, external bus 876, interface 877) The processor 871, ROM 872, and RAM 873 are interconnected via, for example, the host bus 874, which is capable of high-speed data transmission. On the other hand, the host bus 874 is connected to the external bus 876, which has a relatively low data transmission speed, via, for example, the bridge 875. The external bus 876 is also connected to various components via the interface 877. 【0206】(Input device 878) The input device 878 may include, for example, a mouse, keyboard, touch panel, buttons, switches, and levers. Furthermore, the input device 878 may also include a remote controller (hereinafter referred to as a remote control) capable of transmitting control signals using infrared rays or other radio waves. The input device 878 may also include an audio input device such as a microphone. 【0207】 (Output device 879) The output device 879 is a device that can visually or audibly notify the user of acquired information, such as a display device such as a CRT (Cathode Ray Tube), LCD, or organic EL, an audio output device such as a speaker or headphones, a printer, a mobile phone, or a facsimile. The output device 879 according to this disclosure also includes various vibration devices capable of outputting tactile stimuli. 【0208】 (Storage 880) Storage 880 is a device for storing various types of data. Examples of storage devices used for storage 880 include magnetic storage devices such as hard disk drives (HDDs), semiconductor storage devices, optical storage devices, or magneto-optical storage devices. 【0209】 (Drive 881) Drive 881 is a device that reads information recorded on a removable storage medium 901, such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory, or writes information to the removable storage medium 901. 【0210】 (Removable storage medium 901) The removable storage medium 901 is, for example, DVD media, Blu-ray® media, HD DVD media, various semiconductor storage media, etc. Of course, the removable storage medium 901 may also be, for example, an IC card equipped with a contactless IC chip, or an electronic device, etc. 【0211】(Connection port 882) Connection port 882 is a port for connecting external devices 902, such as a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface), an RS-232C port, or an optical audio terminal. 【0212】 (External connected device 902) The external connected device 902 is, for example, a printer, a portable music player, a digital camera, a digital video camera, or an IC recorder. 【0213】 (Communication device 883) The communication device 883 is a communication device for connecting to a network, and is, for example, a communication card for wired or wireless LAN, Bluetooth®, or WUSB (Wireless USB), a router for optical communication, a router for ADSL (Asymmetric Digital Subscriber Line), or a modem for various types of communication. 【0214】 <3. Summary> As described above, the information processing system 1 according to one embodiment of the present disclosure includes a motion feature masking unit 154 that masks a part of the motion features extracted from motion data, and a motion reconstruction unit 156 that reconstructs the motion features in the masked area masked by the motion feature masking unit 154. 【0215】 The above configuration makes it possible to generate diverse motion data that reflects the user's intentions. 【0216】 While preferred embodiments of the present disclosure have been described in detail above with reference to the attached drawings, the technical scope of the present disclosure is not limited to such examples. It is clear to any person with ordinary skill in the art of the present disclosure that various modifications or alterations may be conceived within the scope of the technical ideas described in the claims, and these will naturally also fall within the technical scope of the present disclosure. 【0217】Furthermore, each step of the processing described in this disclosure does not necessarily have to be processed chronologically in the order shown in the flowchart or sequence diagram. For example, each step of the processing for each device may be processed in an order different from the order described, or may be processed in parallel. 【0218】 Furthermore, the series of processes performed by each device described in this disclosure may be implemented by a program stored on a non-transitory computer-readable storage medium. Each program is, for example, loaded into RAM when executed by a computer and executed by a processor such as a CPU. The storage medium is, for example, a magnetic disk, an optical disk, a magneto-optical disk, or flash memory. Alternatively, the program may be distributed without using a storage medium, for example, via a network. 【0219】 Furthermore, the effects described herein are merely descriptive or illustrative and not limiting. In other words, the technology relating to this disclosure may produce other effects that are obvious to those skilled in the art from the description herein, in addition to or instead of the effects described herein. 【0220】Furthermore, the following configurations also fall within the technical scope of this disclosure: (1) An information processing system comprising: a masking unit that masks a portion of motion features extracted from motion data; and a reconstruction unit that reconstructs motion features in the masked region masked by the masking unit based on intent features extracted from user instructions. (2) The information processing system according to (1), wherein the reconstruction unit further reconstructs motion features in the masked region based on reference features extracted from time-series reference data associated with the motion data. (3) The information processing system according to (2), wherein the reference data includes music data. (4) The information processing system according to any one of (1) to (3), further comprising: a mask region determination unit that determines the masked region. (5) The information processing system according to (4), wherein the mask region determination unit determines the masked region based on the attributes of the model to which the reconstructed motion data is applied. (6) The information processing system according to any one of (4) or (5), wherein the mask region determination unit determines the mask region based on a degree specified by the user. (7) The information processing system according to any one of (1) to (6), wherein the user instruction is given by text, voice, or image. (8) The information processing system according to any one of (1) to (7), further comprising an application unit for applying the reconstructed motion data to a model. (9) The information processing system according to any one of (1) to (8), further comprising an interface control unit for controlling an interface that receives the user instruction and presents the reconstructed motion data. (10) The information processing system according to (9), wherein the interface control unit accepts the selection of motion data to be extracted for motion feature extraction.(11) The information processing system according to any one of (9) or (10), wherein the interface control unit accepts the selection of a section in the motion data to be used for extracting motion features. (12) The information processing system according to any one of (9) to (11), wherein the interface control unit accepts the specification of attributes of a model to which the reconstructed motion data is to be applied. (13) The information processing system according to any one of (9) to (12), wherein the interface control unit accepts the specification of a position to which the model to which the reconstructed motion data is to be applied is to be placed. (14) The information processing system according to any one of (9) to (13), wherein the interface control unit accepts the specification of the number of models to which the reconstructed motion data is to be applied. (15) The information processing system according to any one of (9) to (14), wherein the interface control unit accepts the specification of the degree of change to the reconstructed motion data. (16) The information processing system according to any one of (9) to (15), wherein the interface control unit controls the presentation of a virtual space on which a model to which reconstructed motion data has been applied is placed. (17) The information processing system according to (16), wherein the interface control unit provides the interface to a head-mounted display. (18) The information processing system according to (16), wherein the virtual space includes a metaverse space. (19) An information processing method comprising: a processor masking a portion of motion features extracted from motion data; and reconstructing the motion features in the masked area based on intent features extracted from user instructions. (20) A program that causes a computer to function as an information processing system comprising: a masking unit that masks a portion of motion features extracted from motion data; and a reconstruction unit that reconstructs the motion features in the masked area masked by the masking unit based on intent features extracted from user instructions. 【0221】 1 Information Processing System 124 Text Feature Extraction Unit 126 Image Feature Extraction Unit 134 Motion Feature Extraction Unit 138 Music Feature Extraction Unit 152 Mask Region Determination Unit 154 Motion Feature Mask Unit 156 Motion Reconstruction Unit 158 ​​Motion Coupling Unit 20 User Terminal 30 Server 310 Interface Control Unit 315 Interface 320 Cropping Unit 330 Feature Extraction Unit 340 Motion Generation Unit 350 Model Application Unit

Claims

1. An information processing system comprising: a masking unit that masks a portion of the motion features extracted from motion data; and a reconstruction unit that reconstructs the motion features in the masked area based on intent features extracted from user instructions.

2. The information processing system according to claim 1, wherein the reconstruction unit further reconstructs the motion features in the mask region based on reference features extracted from time-series reference data associated with the motion data.

3. The information processing system according to claim 2, wherein the reference data includes music data.

4. The information processing system according to claim 1, further comprising a mask region determination unit for determining the mask region.

5. The information processing system according to claim 4, wherein the mask region determination unit determines the mask region based on the attributes of the model to which the motion data to be reconstructed is applied.

6. The information processing system according to claim 4, wherein the mask area determination unit determines the mask area based on a degree specified by the user.

7. The information processing system according to claim 1, wherein the user's instructions are provided by text, voice, or images.

8. An application unit for applying reconstructed motion data to a model, further comprising: the information processing system according to claim 1.

9. The information processing system according to claim 1, further comprising: an interface control unit that controls an interface that receives instructions from the user and presents reconstructed motion data.

10. The information processing system according to claim 9, wherein the interface control unit accepts the selection of motion data to be used for the extraction of motion features.

11. The information processing system according to claim 9, wherein the interface control unit accepts the selection of a section in the motion data from which motion features are to be extracted.

12. The information processing system according to claim 9, wherein the interface control unit receives a specification of the attributes of the model to which the reconstructed motion data is to be applied.

13. The information processing system according to claim 9, wherein the interface control unit receives a specification of a position to which the model to be reconstructed is to be placed.

14. The information processing system according to claim 9, wherein the interface control unit accepts a specification of the number of models to which the reconstructed motion data is to be applied.

15. The information processing system according to claim 9, wherein the interface control unit accepts a specification of the degree to which the motion data to be reconstructed should be changed.

16. The information processing system according to claim 9, wherein the interface control unit controls the presentation of a virtual space on which a model to which reconstructed motion data has been applied is placed.

17. The information processing system according to claim 16, wherein the interface control unit provides the interface to the head-mounted display.

18. The information processing system according to claim 16, wherein the virtual space includes a metaverse space.

19. An information processing method comprising: a processor masking a portion of motion features extracted from motion data; and reconstructing the motion features in the masked area based on intent features extracted from user instructions.

20. A program that causes a computer to function as an information processing system comprising: a masking unit that masks a portion of motion features extracted from motion data; and a reconstruction unit that reconstructs the motion features in the masked area based on intent features extracted from user instructions.