Image generation method and image generation program

By changing the avatar image's color based on audio or light, the method enhances the relevance and immersion of synthesized images, addressing the weak connection issue in existing technologies.

JP2026105161AActive Publication Date: 2026-06-26COVER CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
COVER CORP
Filing Date
2024-12-16
Publication Date
2026-06-26

Smart Images

  • Figure 2026105161000001_ABST
    Figure 2026105161000001_ABST
Patent Text Reader

Abstract

The objective is to provide an image generation method and image generation program that can generate realistic avatar images. [Solution] In a configuration in which an external input image captured from external video data by a capture device 112 is used as the background image, and an avatar image based on an avatar 3D model created by an avatar 3D model generation program is placed in front of the external input image to create and output a composite image, the brightness of the multiplicative image multiplied by the avatar image is changed based on the external input audio attached to the external video data, and the color of the avatar image placed in front of the external input image is changed based on the audio attached to the external input image.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0007] ,

[0001] The present invention relates to an image generation method and an image generation program capable of generating an avatar image according to a detected action and synthesizing the generated avatar image with a specific image for output.

Background Art

[0002] Conventionally, for example, a technique has been proposed in which by performing face tracking on a face photographed with a smartphone, the expression of the photographed face can be reflected in the expression of an avatar such as a character created by CG or the like (see, for example, Patent Document 1).

[0003] In addition, a technique has been proposed for generating and outputting an image in which a specific image is synthesized on the back of an avatar image (see, for example, Patent Document 2).

Prior Art Documents

Patent Documents

[0004]

Patent Document 1

Patent Document 2

Summary of the Invention

Problems to be Solved by the Invention

[0005] However, as in Patent Document 1, there is a problem that if only an image in which another specific image is synthesized on the back of an avatar image is generated, the relevance between the avatar image and the specific image is weak, resulting in an image lacking in a sense of presence. <000003​​​​​​​​​The image generation method of means 1 is: A motion detection step that detects the performer's movements, An avatar image generation step that generates an avatar image corresponding to the detected action, A specific image input step where a specific image is entered, A composite image generation step that generates a composite image by placing an avatar image in front of the input specific image, The image output step includes outputting the aforementioned composite image, An image generation method comprising, A voice input step of inputting audio attached to the aforementioned specific image, A color change step which changes the color of the avatar image based on the voice input in the voice input step, To further enhance It is characterized by the following. This feature allows the color of an avatar image placed in front of a specific image to change based on the audio associated with that image, thus creating a more immersive image with both the specific image and the avatar image.

[0008] The image generation method of means 2 is the image generation method described in means 1, In the color change step, the color of the avatar image is changed if the volume of the voice input in the voice input step is above a predetermined threshold. It is characterized by the following. This feature allows the avatar image's color to reflect loud audio input.

[0009] The image generation method of means 3 is the image generation method described in means 1 or 2, In the color change step, the color of the avatar image is changed when a specific voice is input in the voice input step. It is characterized by the following. This feature allows the avatar image's color to be reflected when specific audio is input.

[0010] The image generation method of means 4 is the image generation method described in means 3, and the specific voice is a voice in a specific sound range is characterized in that According to this feature, when a voice in a specific sound range is input, it can be reflected in the color of the avatar image.

[0011] The image generation method of means 5 is the image generation method described in means 3 or 4, and the specific voice includes a plurality of predetermined types of voices, in the color change step, the color of the avatar image is changed to a color set according to the type of voice input in the voice input step is characterized in that According to this feature, the color of the avatar image can be changed to a color corresponding to the type of input voice.

[0012] The image generation method of means 6 is the image generation method described in any one of means 1 to 5, and in the color change step, the color of the avatar image is changed by multiplying a specific color or a specific image by the avatar image is characterized in that According to this feature, the color of the avatar image can be changed by a simple process.

[0013] The image generation method of means 7 is the image generation method described in any one of means 1 to 6, and in the color change step, the color of the avatar image is changed by changing the brightness is characterized in that According to this feature, the color of the avatar image can be changed by a simple process.

[0014] The image generation method of means 8 is the image generation method described in any one of means 1 to 5, and the avatar image generation step is a 3D model generation step of generating a 3D model of an avatar according to the detected action, and A 3D model placement step of placing the generated 3D model in a 3D space, including generating the avatar image from the 3D model placed in the 3D space, The color change step includes an irradiation step of irradiating the 3D model placed in the 3D space with virtual irradiation light based on the voice input in the voice input step, changing the color of the avatar image by generating the avatar image from the 3D model irradiated with the virtual irradiation light in the irradiation step is characterized by this. According to this feature, the color of the avatar image can be changed in a natural appearance.

[0015] The image generation method of means 9 is the image generation method described in any of means 1 to 8, In the color change step, the color of the avatar image is changed based on the specific image is characterized by this. According to this feature, since the color of the avatar image arranged in front of the specific image is changed based on the specific image, a vivid image can be generated by the specific image and the avatar image.

[0016] The image generation method of means 10 is the image generation method described in means 9, In the color change step, the color of the avatar image is changed based on the color of the specific image is characterized by this. According to this feature, the color of the specific image can be reflected in the avatar image.

[0017] The image generation method of means 11 is the image generation method described in means 9 or 10, The specific image input by the specific image input step is a video, further includes a motion amount detection step of detecting the motion amount of the specific image, <In the color change step, the color of the avatar image is changed based on the amount of motion of the specific image detected in the motion detection step. It is characterized by the following. According to this feature, the color of the avatar image placed in front of a specific image changes based on the amount of movement of the specific image, making it possible to create a more immersive image with both the specific image and the avatar image.

[0018] The image generation method of means 12 is the image generation method described in any of means 1 to 11, The aforementioned composite image is an image that constitutes a video. In the aforementioned color change step, the color of the avatar image is changed for each frame of the video based on the audio input in the aforementioned audio input step. In the composite image generation step, the composite image is generated for each frame of the video using the specific image and the avatar image with its color changed. It is characterized by the following. This feature makes it possible to generate a video of an avatar image that reflects the audio attached to a specific image frame by frame.

[0019] The image generation program of means 13 is A motion detection step that detects the performer's movements, An avatar image generation step that generates an avatar image corresponding to the detected action, A specific image input step where a specific image is entered, A composite image generation step that generates a composite image by placing an avatar image in front of the input specific image, The image output step includes outputting the aforementioned composite image, In an image generation program that causes a computer to execute, A voice input step of inputting audio attached to the aforementioned specific image, A color change step which changes the color of the avatar image based on the voice input in the voice input step, Further execution It is characterized by the following. This feature allows the color of an avatar image placed in front of a specific image to change based on the audio associated with that image, thus creating a more immersive image with both the specific image and the avatar image.

[0020] Furthermore, the present invention may have only the inventive features described in the claims of the present invention, or it may have the inventive features described in the claims of the present invention along with other features not described therein. [Brief explanation of the drawing]

[0021] [Figure 1] This is a block diagram showing an example configuration of a computer terminal used in Examples 1 and 2 of the present invention. [Figure 2] This figure shows the configuration of the program implemented in the computer terminal used in Embodiment 1 of the present invention. [Figure 3] This diagram shows the relationships between programs implemented on a computer terminal in Embodiment 1 of the present invention. [Figure 4] This diagram shows the relationships between programs implemented on a computer terminal in Embodiment 1 of the present invention. [Figure 5] This diagram shows the process by which a computer terminal synthesizes an external input image and an avatar image and outputs the result in Embodiment 1 of the present invention. [Figure 6] This flowchart shows the control details of the color correction program performed by the computer terminal in Embodiment 1 of the present invention. [Figure 7] This figure shows a specific embodiment of the color correction of an avatar image using a computer terminal according to Embodiment 1 of the present invention. [Figure 8] This figure shows a specific embodiment of the color correction of an avatar image using a computer terminal according to Embodiment 1 of the present invention. [Figure 9] This figure shows a modified example of the color correction of an avatar image using a computer terminal according to Example 1 of the present invention. [Figure 10] This figure shows the configuration of the program implemented in the computer terminal used in Embodiment 2 of the present invention. [Figure 11] This diagram shows the relationships between programs implemented on a computer terminal in Embodiment 2 of the present invention. [Figure 12] This diagram shows the relationships between programs implemented on a computer terminal in Embodiment 2 of the present invention. [Figure 13] This diagram shows the process by which a computer terminal synthesizes an external input image and an avatar image and outputs the result in Embodiment 2 of the present invention. [Figure 14] This flowchart shows the control details of the 3D model color correction program performed by the computer terminal in Embodiment 2 of the present invention. [Figure 15] This figure shows a specific embodiment of the color correction of an avatar image using a computer terminal according to Embodiment 2 of the present invention. [Modes for carrying out the invention]

[0022] [Form 1] The image generation method for form 1-1 is: A motion detection step (face tracking program, motion tracking program, etc.) that detects the performer's movements, An avatar image generation step (avatar image generation program) that generates an avatar image corresponding to the detected action, A specific image input step (capture of an external input image) is used to input a specific image (external input image), A composite image generation step (image synthesis program) that generates a composite image by placing an avatar image in front of the input specific image (external input image), The image output step (output of composite image) outputs the composite image, An image generation method comprising, The system further includes a color change step (color correction program) that changes the color of the avatar image based on the specified image (external input image). It is characterized by the following. This feature allows the color of an avatar image placed in front of a specific image to be changed based on that specific image, thus enabling the creation of more immersive images using both the specific image and the avatar image.

[0023] The image generation method of form 1-2 is the image generation method described in form 1-1, In the aforementioned color change step (color correction program), the color of the specific image (external input image) is multiplied by the avatar image. It is characterized by the following. This feature allows the colors of a specific image to be reflected in the avatar image.

[0024] The image generation method of form 1-3 is the image generation method described in form 1-2, In the aforementioned color change step (color correction program), the average value of the colors of the specific image (external input image) is multiplied by the avatar image. It is characterized by the following. This feature allows the average color of a specific image to be reflected in the color of the avatar image.

[0025] The image generation method of form 1-4 is the image generation method described in form 1-3, In the aforementioned color change step (color correction program), the average value of the colors in a portion of the specific image (external input image) is multiplied by the avatar image. It is characterized by the following. This feature means that the average color of a specific area, rather than the entire image, is reflected in the avatar image's color. For example, by reflecting the average color of the central or characteristic area of ​​a specific image in the avatar image's color, the relevance to the specific image can be enhanced.

[0026] The image generation method of form 1-5 is the image generation method described in form 1-3 or form 1-4, In the aforementioned color change step (color correction program), the average value of colors in the specified image (external input image) that have a predetermined brightness or higher is multiplied by the avatar image. It is characterized by the following. According to this feature, the average value of colors in a specific image that are above a predetermined brightness level is reflected in the avatar image's color, preventing the avatar image from becoming too dark.

[0027] The image generation method of form 1-6 is the image generation method described in any of forms 1-2 to 1-5, In the aforementioned color change step (color correction program), the multiplier color is changed for each of the multiple regions according to the color scheme of the specific image (external input image). It is characterized by the following. This feature allows for the creation of more realistic images by combining a specific image with an avatar image, as the colors of multiple regions change according to the color scheme of the image.

[0028] The image generation method of form 1-7 is the image generation method described in form 1-2, In the aforementioned color change step (color correction program), the specific image (external input image) is blurred and then multiplied with the avatar image. It is characterized by the following. This feature allows for the blurred effect of a specific image to be multiplied onto the avatar image. As a result, the specific image is not projected sharply onto the avatar image, allowing the colors of the specific image to be naturally reflected in the avatar image.

[0029] The image generation method of form 1-8 is the image generation method described in any of forms 1-2 to 1-7, In the aforementioned color change step (color correction program), the brightness of the color multiplied by the central region and the outer region of the avatar image is changed. It is characterized by the following. This feature allows for the three-dimensional representation of a specific image's colors in the avatar image by making the brightness of the multiplied color different in the central and outer regions of the avatar image.

[0030] The image generation method of form 1-9 is the image generation method described in any of forms 1-1 to 1-8, The specific image (external input image) input by the aforementioned specific image input step (capture of an external input image) is a video, The program further includes a motion detection step (image analysis program) for detecting the amount of motion of the specified image (external input image), In the color change step (color correction program), the color of the avatar image is changed based on the amount of motion of the specific image (external input image) detected in the motion amount detection step (image analysis program). It is characterized by the following. According to this feature, the color of the avatar image placed in front of a specific image changes based on the amount of movement of the specific image, making it possible to create a more immersive image with both the specific image and the avatar image.

[0031] The image generation method of form 1-10 is the image generation method described in any of forms 1-1 to 1-9, The aforementioned composite image is an image that constitutes a video. In the aforementioned composite image generation step (image synthesis program), the color of the avatar image is changed based on the specific image (external input image) for each frame of the video, and the composite image is generated using the specific image (external input image) and the avatar image with the changed color. It is characterized by the following. This feature allows for the creation of a video of avatar images that reflect a specific image in each frame.

[0032] The image generation program for form 1-11 is: A motion detection step (face tracking program, motion tracking program, etc.) that detects the performer's movements, An avatar image generation step (avatar image generation program) that generates an avatar image corresponding to the detected action, A specific image input step (capture of an external input image) is used to input a specific image (external input image), A composite image generation step (image synthesis program) that generates a composite image by placing an avatar image in front of the input specific image (external input image), The image output step (output of composite image) outputs the composite image, In an image generation program that causes a computer to execute, The program then executes a color change step (color correction program) that changes the color of the avatar image based on the specified image (external input image). It is characterized by the following. This feature allows the color of an avatar image placed in front of a specific image to be changed based on that specific image, thus enabling the creation of more immersive images using both the specific image and the avatar image.

[0033] [Form 2] The image generation method for form 2-1 is: A motion detection step (face tracking program, motion tracking program, etc.) that detects the performer's movements, A 3D model generation step (avatar 3D model generation program) that generates a 3D model of the avatar (avatar 3D model) corresponding to the detected action, A 3D model placement step (object placement process) is performed to place the generated 3D model (avatar 3D model) in a 3D space (virtual 3D space), An avatar image generation step (avatar image generation process) is performed to generate an avatar image from the 3D model (avatar 3D model) placed in the aforementioned 3D space (virtual 3D space), A specific image input step (capture of an external input image) is used to input a specific image (external input image), A composite image generation step (image synthesis program) that generates a composite image by placing the avatar image in front of the input specific image (external input image), The image output step (output of composite image) outputs the composite image, An image generation method comprising, The process includes an irradiation step (irradiation light identification process, irradiation light placement process) in which a virtual irradiation light based on a specific image (external input image) is irradiated onto the 3D model (avatar 3D model) placed in the 3D space (virtual 3D space), The avatar image generation step (avatar image generation process) generates the avatar image from the 3D model (avatar 3D model) that was irradiated with the virtual irradiation light in the irradiation step (irradiation light identification process, irradiation light arrangement process). It is characterized by the following. According to this feature, an avatar image generated from a 3D model illuminated with virtual light based on the specific image is used as the avatar image placed in front of the specific image. As a result, the color of the avatar image is changed based on the specific image, making it possible to generate a more realistic image using both the specific image and the avatar image.

[0034] The image generation method of form 2-2 is the image generation method described in form 2-1, In the irradiation step (irradiation light identification process, irradiation light arrangement process), irradiation light of the color of the specified image (external input image) is irradiated. It is characterized by the following. This feature allows the colors of a specific image to be reflected in the avatar image.

[0035] The image generation method of form 2-3 is the image generation method described in form 2-2, In the irradiation step (irradiation light identification process, irradiation light arrangement process), irradiation light of the average color of the specified image (external input image) is irradiated. It is characterized by the following. This feature allows the average color of a specific image to be reflected in the color of the avatar image.

[0036] The image generation method of form 2-4 is the image generation method described in form 2-3, In the irradiation step (irradiation light identification process, irradiation light arrangement process), irradiation light of the average color of a portion of the specified image (external input image) is irradiated. It is characterized by the following. This feature means that the average color of a specific area, rather than the entire image, is reflected in the avatar image's color. For example, by reflecting the average color of the central or characteristic area of ​​a specific image in the avatar image's color, the relevance to the specific image can be enhanced.

[0037] The image generation method of form 2-5 is the image generation method described in form 2-3 or form 2-4, In the irradiation step (irradiation light identification process, irradiation light arrangement process), irradiation light of the average value of colors in the identified image (external input image) that have a predetermined brightness or higher is irradiated. It is characterized by the following. According to this feature, the average value of colors in a specific image that are above a predetermined brightness level is used to reflect the colors of the avatar image, so the avatar image will not become too dark.

[0038] The image generation method of form 2-6 is the image generation method described in any of forms 2-2 to 2-5, In the irradiation step (irradiation light identification process, irradiation light arrangement process), irradiation light of different colors can be irradiated from multiple directions according to the color scheme of the identified image (external input image). It is characterized by the following. This feature allows the avatar image's colors to change in different areas depending on the color scheme of the specific image, thus enabling the creation of more realistic images by combining the specific image and the avatar image.

[0039] The image generation method of form 2-7 is the image generation method described in any of forms 2-1 to 2-6, The specific image (external input image) input by the aforementioned specific image input step (capture of an external input image) is a video, The program further includes a motion detection step (image analysis program) for detecting the amount of motion of the specified image (external input image), In the irradiation step (irradiation light identification process, irradiation light placement process), virtual irradiation light is irradiated onto the 3D model (avatar 3D model) placed in the 3D space (virtual 3D space) based on the amount of motion of the specific image (external input image) detected in the motion amount detection step (image analysis program). It is characterized by the following. According to this feature, the color of the avatar image placed in front of a specific image changes based on the amount of movement of the specific image, making it possible to create a more immersive image with both the specific image and the avatar image.

[0040] The image generation method of form 2-8 is the image generation method described in any of forms 2-1 to 2-7, The aforementioned composite image is an image that constitutes a video. In the irradiation step (irradiation light identification process, irradiation light arrangement process), a virtual irradiation light based on the identified image (external input image) is irradiated for each frame of the video. The aforementioned avatar image generation step (avatar image generation process) generates the avatar image from the 3D model (avatar 3D model) that has been irradiated with the virtual irradiation light in the irradiation step (irradiation light identification process, irradiation light placement process) for each frame of the video. In the aforementioned composite image generation step (image synthesis program), the composite image is generated for each frame of the video using the specified image (external input image) and the avatar image. It is characterized by the following. This feature allows for the creation of a video of avatar images that reflect a specific image in each frame.

[0041] The image generation program of form 2-9 is A motion detection step (face tracking program, motion tracking program, etc.) that detects the performer's movements, A 3D model generation step (avatar 3D model generation program) that generates a 3D model of the avatar (avatar 3D model) corresponding to the detected action, A 3D model placement step (object placement process) is performed to place the generated 3D model (avatar 3D model) in a 3D space (virtual 3D space), An avatar image generation step (avatar image generation process) is performed to generate an avatar image from the 3D model (avatar 3D model) placed in the aforementioned 3D space (virtual 3D space), A specific image input step (capture of an external input image) is used to input a specific image (external input image), A composite image generation step (image synthesis program) that generates a composite image by placing the avatar image in front of the input specific image (external input image), The image output step (output of composite image) outputs the composite image, In an image generation program that causes a computer to execute, Further, an irradiation step (irradiation light identification process, irradiation light placement process) is performed to irradiate the 3D model (avatar 3D model) placed in the aforementioned 3D space (virtual 3D space) with virtual irradiation light based on the aforementioned specific image (external input image). In the avatar image generation step (avatar image generation process), the avatar image is generated from the 3D model (avatar 3D model) that has been irradiated with the virtual irradiation light in the irradiation step (irradiation light identification process, irradiation light arrangement process). It is characterized by the following. According to this feature, an avatar image generated from a 3D model illuminated with virtual light based on the specific image is used as the avatar image placed in front of the specific image. As a result, the color of the avatar image is changed based on the specific image, making it possible to generate a more realistic image using both the specific image and the avatar image.

[0042] [Form 3] The image generation method for form 3-1 is: A motion detection step (face tracking program, motion tracking program, etc.) that detects the performer's movements, An avatar image generation step (avatar image generation program) that generates an avatar image corresponding to the detected action, A specific image input step (capture of an external input image) is used to input a specific image (external input image), A composite image generation step (image synthesis program) that generates a composite image by placing an avatar image in front of the input specific image (external input image), The image output step (output of composite image) outputs the composite image, An image generation method comprising, A voice input step (voice analysis program) that inputs audio (external input audio) attached to the aforementioned specific image (external input image), A color change step (color correction program) changes the color of the avatar image based on the audio (external input audio) input in the aforementioned audio input step (audio analysis program), To further enhance It is characterized by the following. This feature allows the color of an avatar image placed in front of a specific image to change based on the audio associated with that image, thus creating a more immersive image with both the specific image and the avatar image.

[0043] The image generation method of form 3-2 is the image generation method described in form 3-1, In the color change step (color correction program), the color of the avatar image is changed if the volume of the audio input (external input audio) in the audio input step (audio analysis program) is above a predetermined threshold (volume setting value). It is characterized by the following. This feature allows the avatar image's color to reflect loud audio input.

[0044] The image generation method of form 3-3 is the image generation method described in form 3-1 or form 3-2, In the aforementioned color change step (color correction program), the color of the avatar image is changed when a specific voice (high-pitched voice, low-pitched voice) is input in the aforementioned voice input step (voice analysis program). It is characterized by the following. This feature allows the avatar image's color to be reflected when specific audio is input.

[0045] The image generation method of form 3-4 is the image generation method described in form 3-3, The aforementioned specific sounds are sounds within a specific frequency range (high-pitched sounds, low-pitched sounds). It is characterized by the following. This feature allows the avatar image's color to be reflected when audio within a specific frequency range is input.

[0046] The image generation method of form 3-5 is the image generation method described in form 3-3 or form 3-4, The aforementioned specific voice includes a predetermined number of voices (high-pitched voices, low-pitched voices), In the aforementioned color change step (color correction program), the color of the avatar image is changed to a color set according to the type of voice input in the aforementioned voice input step (voice analysis program). It is characterized by the following. This feature allows the avatar image's color to change according to the type of audio input.

[0047] The image generation method of form 3-6 is the image generation method described in any of forms 3-1 to 3-5, In the aforementioned color change step (color correction program), the color of the avatar image is changed by multiplying it with a specific color or a specific image. It is characterized by the following. This feature allows you to change the color of an avatar image with simple processing.

[0048] The image generation method of form 3-7 is the image generation method described in any of forms 3-1 to 3-6, In the aforementioned color change step (color correction program), the color of the avatar image is changed by changing the brightness. It is characterized by the following. This feature allows you to change the color of an avatar image with simple processing.

[0049] The image generation method of form 3-8 is the image generation method described in any of forms 3-1 to 3-6, The aforementioned avatar image generation step is: A 3D model generation step (avatar 3D model generation program) that generates a 3D model of the avatar (avatar 3D model) corresponding to the detected action, A 3D model placement step (object placement process) is performed to place the generated 3D model (avatar 3D model) in a 3D space (virtual 3D space), Includes, The avatar image is generated from the 3D model (avatar 3D model) placed in the aforementioned 3D space (virtual 3D space), The aforementioned color change step is, The process includes an illumination step (illumination light identification process, illumination light placement process) which illuminates the 3D model (avatar 3D model) placed in the aforementioned 3D space (virtual 3D space) with virtual illumination light based on the audio input step (audio analysis program) input (external input audio), In the irradiation step (irradiation light identification process, irradiation light placement process), the avatar image is generated from the 3D model (avatar 3D model) irradiated with the virtual irradiation light, thereby changing the color of the avatar image. It is characterized by the following. This feature allows for the natural-looking changes to the colors of avatar images.

[0050] The image generation method of form 3-9 is the image generation method described in any of forms 3-1 to 3-8, In the aforementioned color change step (color correction program), the color of the avatar image is changed based on the specified image (external input image). It is characterized by the following. This feature allows the color of an avatar image placed in front of a specific image to be changed based on that specific image, thus enabling the creation of more immersive images using both the specific image and the avatar image.

[0051] The image generation method of form 3-10 is the image generation method described in form 3-9, In the aforementioned color change step (color correction program), the color of the avatar image is changed based on the color of the specific image (external input image). It is characterized by the following. This feature allows the colors of a specific image to be reflected in the avatar image.

[0052] The image generation method of form 3-11 is the image generation method described in form 3-9 or form 3-10, The specific image (external input image) input by the aforementioned specific image input step (capture of an external input image) is a video, The program further includes a motion detection step (image analysis program) for detecting the amount of motion of the specified image (external input image), In the color change step (color correction program), the color of the avatar image is changed based on the amount of motion of the specific image (external input image) detected in the motion amount detection step (image analysis program). It is characterized by the following. According to this feature, the color of the avatar image placed in front of a specific image changes based on the amount of movement of the specific image, making it possible to create a more immersive image with both the specific image and the avatar image.

[0053] The image generation method of form 3-12 is the image generation method described in any of forms 3-1 to 3-11, The aforementioned composite image is an image that constitutes a video. In the aforementioned color change step (color correction program), the color of the avatar image is changed for each frame of the video based on the audio input step (audio analysis program) input (external input audio). In the aforementioned composite image generation step (image synthesis program), the composite image is generated for each frame of the video using the specified image (external input image) and the avatar image with its color changed. It is characterized by the following. This feature makes it possible to generate a video of an avatar image that reflects the audio attached to a specific image frame by frame.

[0054] The image generation program of form 3-13 is A motion detection step (face tracking program, motion tracking program, etc.) that detects the performer's movements, An avatar image generation step (avatar image generation program) that generates an avatar image corresponding to the detected action, A specific image input step (capture of an external input image) is used to input a specific image (external input image), A composite image generation step (image synthesis program) that generates a composite image by placing an avatar image in front of the input specific image (external input image), The image output step (output of composite image) outputs the composite image, In an image generation program that causes a computer to execute, A voice input step (voice analysis program) that inputs audio (external input audio) attached to the aforementioned specific image (external input image), A color change step (color correction program) changes the color of the avatar image based on the audio (external input audio) input in the aforementioned audio input step (audio analysis program), Further execution It is characterized by the following. This feature allows the color of an avatar image placed in front of a specific image to change based on the audio associated with that image, thus creating a more immersive image with both the specific image and the avatar image.

[0055] Embodiments for carrying out the present invention will be described below with reference to the drawings, based on examples. [Examples]

[0056] [Computer terminal] Figure 1 is a block diagram showing an example configuration of computer terminal 1 used in Embodiment 1 of the present invention. The same computer terminal 1 configuration will also be used in Embodiment 2, which will be described later. The computer terminal 1 shown in Figure 1 can be a desktop PC or workstation with a separate display device, keyboard, mouse, or other input device; a notebook PC with an integrated display device and input device; or a mobile device such as a tablet or smartphone with a touch panel on the display device. However, this explanation will assume the use of a desktop PC or notebook PC.

[0057] As shown in Figure 1, the computer terminal 1 is equipped with a processor 101, memory 102, and storage 103 such as a hard disk or SSD. These components are connected via a data bus 111, and the processor 101 can perform various processes according to the programs stored in the storage 103.

[0058] Furthermore, a display device 105 and a speaker 106 are connected to the data bus 111 via an output interface (not shown), enabling the output of images and sound based on the processing performed by the processor 101.

[0059] Furthermore, input devices such as an input device 107, a camera 108, a microphone 109, a vital sign measuring instrument 110, a capture device 112, and an audio input device 113 are connected to the data bus 111 via an input interface (not shown), and information input by these input devices can be input. The input device 107 is an input device such as a keyboard or mouse, and various commands can be input. Alternatively, a configuration is available in which commands can be input using an input device such as a touch panel formed integrally with the display device 105. The camera 108 is a 3D camera and can input data including depth data as captured image data. The microphone 109 may be a stereo microphone or a mono microphone and can input ambient sound including the voice of the subject. The vital sign measuring instrument 110 is a wearable device such as a smart band and can input vital data (heart rate, blood pressure, body temperature, etc.). The capture device 112 can input external video data and capture and import images constituting the input external video data at a predetermined frame rate. The audio input device 113 can input audio that is added to external video data and output together with the external video data.

[0060] Furthermore, a communication interface 104 is connected to the data bus 111, enabling communication with other computer terminals and server computers via wired or wireless connections through public networks such as local area networks and the Internet.

[0061] In this embodiment, a configuration is shown in which the image generation method of the present invention is implemented using one computer terminal, but a configuration in which the image generation method of the present invention is implemented using multiple computer terminals is also acceptable. Furthermore, in this embodiment, computer terminal 1 is equipped with a display device 105 and a speaker 106 as output devices, but it is also acceptable to omit these output devices and have images and sound output from the output device of another computer terminal. In addition, in this embodiment, computer terminal 1 is equipped with input devices such as a keyboard and mouse 107, a camera 108, and a microphone 109, but it is acceptable to have at least a touch panel, a keyboard and mouse, a camera capable of taking images, and a capture device capable of capturing images of external video data.

[0062] The storage 103 of computer terminal 1 stores various programs executed by computer terminal 1, in addition to an operating system (OS) not shown in the diagram, as shown in Figure 2. Specifically, it stores a face tracking program, a motion tracking program, a voice detection program, an input detection program, a vital sign detection program, a parameter correction program, an avatar image generation program, an image analysis program, a voice analysis program, a color correction program, an image synthesis program, and a distribution program. Although not shown in the diagram, it also stores configuration settings, material data used by the avatar image generation program, etc.

[0063] As shown in Figures 3 to 5, the computer terminal 1 is configured such that an avatar image generation program generates avatar images with facial expressions and motions based on facial expression parameters and motion parameters based on image data output from the camera 108, an image synthesis program generates a composite image by using an external input image captured by the capture device 112 as a background image and compositing the avatar image generated by the avatar image generation program in front of it, and a distribution program distributes a video using the composite image generated by the image synthesis program.

[0064] Furthermore, the facial expression parameters output by the face tracking program are not directly output to the avatar image generation program, but rather, as shown in Figure 3, are output to the avatar image generation program via a parameter correction program. The parameter correction program corrects at least some of the facial expression parameters using some facial expression parameters and other parameters (voice parameters, input parameters, vital parameters, configuration parameters), and outputs the corrected facial expression parameters to the image generation program.

[0065] Furthermore, the avatar image generated by the avatar image generation program is not output directly to the image synthesis program. Instead, as shown in Figures 4 and 5, the avatar image is color-corrected by the color correction program before being output to the image synthesis program. The color correction program performs color correction on the avatar image based on the color of the external input image to be synthesized by the image synthesis program, the amount of motion of the external input image, and the external input audio input along with the external input image. The color-corrected avatar image is then output to the image synthesis program.

[0066] Color correction, or adjusting color, refers to correcting the elements that make up color: hue, saturation, and brightness. It can be a configuration that corrects all three elements (hue, saturation, and brightness), or a configuration that corrects only some of them.

[0067] Here, computer terminal 1 can take in video content such as gameplay videos and movies as external video data from an external device and capture images from this video data as external input images. By distributing a video that combines the input gameplay videos, movies, and other video content with an avatar image, it is possible to distribute game commentary videos and movie commentary videos as new videos.

[0068] [program] Next, the program executed by computer terminal 1 in this embodiment will be described.

[0069] As shown in Figure 3, the face tracking program uses image data, including depth data, input from camera 108 to detect the state of multiple parts that make up the subject's face image, and outputs facial expression parameters that quantify the degree of movement of each part from the detected state. Facial expression parameters are parameters that can identify the subject's facial expression, and in this embodiment, they are parameters that quantify the degree of movement of 6 types related to the movement of the left eye, 6 types related to the movement of the right eye, 27 types related to the movement of the mouth and jaw, 10 types related to the movement of the eyebrows, cheeks and nose, and 1 type related to the movement of the tongue, each within the range of 0.0 to 1.0. Note that the facial expression parameters are not limited to these examples, and more subdivided types of facial expression parameters may be used, or fewer types of facial expression parameters may be used.

[0070] As shown in Figure 3, the motion tracking program detects the movement of a subject's body using image data, including depth data, input from camera 108, and outputs motion parameters that identify the position and orientation of the parts that make up the body. The motion parameters are parameters that can identify the movement of the subject's body, and in this embodiment, they include a head pose indicating the position coordinates and orientation of the head, all finger joints indicating the position coordinates and orientation of all finger joints, and a hand pose indicating the position coordinates and orientation of the hand. In this embodiment, the motion parameters do not include parameters for the movement of the entire body, but they may also be configured to include parameters for the movement of the entire body.

[0071] As shown in Figure 3, the voice detection program detects the voice of a subject using voice data input from the microphone 109 and outputs voice parameters that quantify the components of the detected voice. The voice parameters are parameters that can identify the voice of a subject, and in this embodiment, they are parameters that quantify voice volume and voice pitch, each within the range of 0.0 to 1.0.

[0072] As shown in Figure 3, the input detection program is a program that converts input data from the input device 107 into input parameters and outputs them. The input parameters are parameters that can identify the input status from the input device 107, and in this embodiment, they are parameters that indicate the type of command input from a keyboard or the like. Command inputs include, for example, command inputs that specify emotions such as joy, anger, sadness, and pleasure in stages, and command inputs that specify decorative images.

[0073] As shown in Figure 3, the vital sign detection program is a program that converts vital data (heart rate, blood pressure, body temperature, etc.) input from the vital sign measuring device 110 and outputs vital sign parameters. The vital sign parameters are parameters that can identify the vital signs of the subject, and in this embodiment, they are parameters obtained by converting heart rate, blood pressure, and body temperature into numerical values ​​between 0.0 and 1.0.

[0074] As shown in Figure 3, the parameter correction program is a program that primarily corrects facial expression parameters created by the face tracking program. It includes a parameter correction process that corrects at least some of the facial expression parameters created by the face tracking program based on some of the facial expression parameters, audio parameters from the audio detection program, input parameters from the input detection program, vital parameters from the vital detection program, and configuration parameters based on pre-set configurations (setting values), as well as a parameter creation process that creates new facial expression parameters based on the input parameters. The parameter correction program is executed every frame at intervals corresponding to the frame rate of the video distributed by the distribution program. For example, if the frame rate is 60fps, the parameter correction program is executed 60 times per second.

[0075] The configuration (setting value) is a value that can be set to determine the amount of change in the facial expression parameter. The configuration setting screen is displayed on the display device 105, and the setting can be made using the input device 107. The configuration (setting value) can be set for each type of facial expression parameter. In this embodiment, a value between 0.0 and 1.0 can be individually set for each facial expression parameter: the facial expression parameter related to eye movement, the facial expression parameter related to mouth and jaw movement, the facial expression parameter related to eyebrow movement, the facial expression parameter related to cheek and nose movement, and the facial expression parameter related to tongue movement. The set value is used as the configuration parameter. In this embodiment, the larger the value set as the configuration, the greater the amount of change in the corresponding facial expression parameter, and the greater the movement of the corresponding part.

[0076] As shown in Figure 3, the avatar image generation program is a program that generates an avatar image based on facial expression parameters N and extension parameters from the parameter correction program, motion parameters from the motion tracking program, audio parameters from the audio parameters program, and input parameters from the input detection program, and outputs the generated avatar image. It includes a face image generation process to generate a face image, a face image correction process to correct the face image, a costume image generation process to generate a costume image, and a decoration image generation process to generate a decoration image. The avatar image generation program is executed every frame at intervals corresponding to the frame rate of the video distributed by the distribution program. For example, if the frame rate is 60fps, the avatar image generation program is executed 60 times per second.

[0077] As shown in Figure 4, the image analysis program identifies the amount of motion in the external input image based on the difference between the captured external input image and the previously captured external input image, and outputs it as motion detection data. The image analysis program identifies the ratio of pixels with different data between the currently captured external input image and the previously captured external input image as the amount of motion and outputs it as a numerical value between 0.0 and 1.0. Note that it is also possible to compare only a part of the external input image, rather than the entire area of ​​the external input image, and identify the amount of motion from the difference. The capture device 112 captures the external input image from the external video data frame by frame at intervals corresponding to the frame rate of the video distributed by the distribution program, and the image analysis program is also executed frame by frame at intervals corresponding to the frame rate of the video distributed by the distribution program. For example, if the frame rate is 60fps, the external input image is captured and the image analysis program is executed 60 times per second.

[0078] As shown in Figure 4, the audio analysis program identifies the volume and pitch of the audio input by the audio input device 113, i.e., the audio that is added to the external video data and output together with the external video data. The program then outputs volume data indicating the identified volume and pitch data indicating the identified pitch as numerical values ​​between 0.0 and 1.0. The audio analysis program is executed every frame at intervals corresponding to the frame rate of the video distributed by the distribution program. For example, if the frame rate is 60fps, the audio analysis program is executed 60 times per second.

[0079] As shown in Figure 4, the color correction program corrects the color of the avatar image generated by the avatar image generation program based on the color of the external input image captured by the capture device 112 from external video data, the motion detection data output from the image analysis program, the volume data output from the audio analysis program, and the pitch data, and outputs the corrected avatar image. The program includes a multiplication image generation process that generates an image to be multiplied by the avatar image, a multiplication process that multiplies the multiplied image by the avatar image, and a brightness change process that changes the brightness of the avatar image. The color correction program is executed every frame at intervals corresponding to the frame rate of the video distributed by the distribution program. For example, if the frame rate is 60fps, the color correction program is executed 60 times per second.

[0080] As shown in Figure 4, the image synthesis program uses the external input image captured by the capture device 112 from external video data as the background image, and generates and outputs a composite image (see Figure 5) in front of the external input image, which is then generated by the avatar image generation program and color-corrected by the color correction program. The image synthesis program is executed every frame at intervals corresponding to the frame rate of the video distributed by the distribution program. For example, if the frame rate is 60fps, the image synthesis program is executed 60 times per second.

[0081] As shown in Figure 4, the distribution program distributes video data consisting of a composite image formed by placing an avatar image in front of an external input image output from the image synthesis program, audio input from microphone 109, and audio input from audio input device 113. The frame rate of the video data distributed by the distribution program can be set from multiple options. Based on the frame rate set here, the external input image is captured by the capture device 112 for each frame of the video data distributed by the distribution program, as described above, and the parameter correction program, avatar image generation program, image analysis program, audio analysis program, color correction program, and image synthesis program are executed.

[0082] [Color Correction Program] Figure 6 is a flowchart showing the control details of the color correction program.

[0083] As shown in Figure 6, the color correction program first acquires an external input image captured by the capture device 112 from external video data (Sa1). Next, it crops a set area of ​​the acquired external input image (Sa2). The area to be cropped can be set arbitrarily, and in this embodiment, as shown in Figure 7(a), an example of cropping the central area of ​​the external input image is shown.

[0084] Next, the cropped setting area is divided into multiple areas (Sa3). The number and size of the areas to be divided can be set arbitrarily, and in this embodiment, as shown in Figure 7(a), an example is shown in which it is divided into 3 vertically and 3 horizontally, resulting in 9 areas.

[0085] Next, colors with a brightness of 50% or more are extracted for each region, and their average value is calculated. As shown in Figure 7(b), the calculated average value is then set as the color for each region (Sa4).

[0086] Next, as shown in Figure 7(c), the boundaries between multiple regions are blurred (Sa5), and then, as shown in Figure 7(d), the image is cropped to match the contour shape of the avatar image (Sa6).

[0087] Next, as shown in Figure 8(a), the outer edge region of the image cropped to the contour shape is set (Sa7), and as shown in Figure 8(b), the brightness is gradually reduced from the inside to the outside of the outer edge region (Sa8).

[0088] Next, the avatar image output from the avatar image generation program is acquired (Sa9), and as shown in Figure 8(c), the acquired avatar image is multiplied by the multiplication image generated in Sa1 to Sa8 (Sa10).

[0089] Next, based on the motion detection data output from the image analysis program, it is determined whether the motion of the external input image is equal to or greater than a preset motion setting value (Sa11). If the motion of the external input image is not equal to or greater than the motion setting value, the process proceeds to Sa13. If the motion of the external input image is equal to or greater than the motion setting value, Sa10 adds 20% to the brightness of the avatar image obtained by multiplying the multiplied image (Sa12). The motion setting value referenced in Sa11 can be set arbitrarily, and the brightness to be varied in Sa12 can also be set arbitrarily.

[0090] Next, based on the volume data and pitch data output from the audio analysis program, it is determined whether the volume of the external input audio is equal to or greater than the volume setting value and the pitch is equal to or greater than the treble setting value (Sa13). If the volume of the external input audio is less than the volume setting value or the pitch is less than the treble setting value, the process proceeds to Sa15. If the volume of the external input audio is equal to or greater than the volume setting value and the pitch is equal to or greater than the treble setting value, 20% is added to the brightness of the avatar image multiplied by the multiplication image in Sa10 (Sa14). The volume setting value and treble setting value referenced in Sa13 can be set arbitrarily, and the brightness to be varied in Sa14 can also be set arbitrarily.

[0091] Next, based on the volume data and pitch data output from the audio analysis program, it is determined whether the volume of the external input audio is equal to or greater than the volume setting value and the pitch is equal to or less than the bass setting value (Sa15). If the volume of the external input audio is less than the volume setting value or the pitch is greater than the bass setting value, the process proceeds to Sa17. If the volume of the external input audio is equal to or greater than the volume setting value and the pitch is equal to or less than the bass setting value, the brightness of the avatar image multiplied by the multiplication image in Sa10 is reduced by 20% (Sa16). The volume setting value and bass setting value referenced in Sa15 can be set arbitrarily, and the brightness to be varied in Sa16 can also be set arbitrarily.

[0092] Next, the color-corrected avatar image is output to the image synthesis program (Sa17).

[0093] [Effect 1] In this embodiment, the capture device 112 uses an external input image captured from external video data as a background image, and generates and outputs a composite image in which an avatar image generated by an avatar image generation program is placed in front of the external input image. Since the color of the avatar image placed in front of the external input image is changed based on the external input image, it is possible to generate an image with a sense of realism using the external input image and the avatar image.

[0094] Furthermore, in this embodiment, the color of the avatar image is changed based on the external input image each time a video frame is generated, so it is possible to generate a video of the avatar image that reflects the external input image based on external video data frame by frame.

[0095] Alternatively, the avatar image's color could be changed based on the external input image each time multiple frames are generated.

[0096] Furthermore, in this embodiment, a multiplicative image based on the color of the external input image is multiplied by the avatar image, allowing the color of the external input image to be reflected in the avatar image. This makes the avatar image appear as if the viewer is viewing the screen through a monitor displaying the external input image, creating a sense of realism.

[0097] Furthermore, in this embodiment, the average value of the colors in the external input image whose brightness is above a predetermined brightness (50% or more in this embodiment) is multiplied by the avatar image. This allows the average value of the colors in the external input image to be reflected in the avatar image's color, while preventing the avatar image from becoming too dark.

[0098] In this embodiment, the avatar image is multiplied by the average value of the colors in the external input image whose brightness is above a predetermined brightness. However, it is also possible to multiply the avatar image by the average value of all the colors in the external input image, and even with such a configuration, the average value of the colors in the external input image can be reflected in the colors of the avatar image.

[0099] Furthermore, in this embodiment, the average value of the colors in a portion of the external input image is multiplied by the avatar image. Since the average value of the colors in a portion of the external input image, rather than the entire external input image, is reflected in the avatar image's color, for example, the correlation with the external input image can be enhanced by reflecting the average value of the colors in the central or characteristic region of the external input image in the avatar image's color.

[0100] Alternatively, the system can be configured to multiply the avatar image by the average color value of the entire external input image. This configuration reduces the processing load while reflecting the average color value of the external input image in the avatar image.

[0101] Furthermore, in this embodiment, the external input image is divided into multiple regions, and the average value of the colors in each region is set as the color of that region and multiplied by it with the avatar image. As a result, the color changes in each of the multiple regions according to the color scheme of the external input image, making it possible to generate an image with even more realism by combining the external input image and the avatar image.

[0102] Furthermore, in multiplication images where different colors are set for each of the multiple regions, a blurring effect is applied to the boundaries of those regions. This prevents sharp color boundaries from being projected onto the avatar image, allowing different colors for each region to be naturally reflected in the avatar image.

[0103] In this embodiment, a multiplication image in which different colors are set for each of the multiple regions is multiplied by the avatar image. However, it is also possible to use a configuration in which a single-color image, such as the average value of colors based on the entire or a part of the external input image, is multiplied by the avatar image. Such a configuration can reduce the processing load.

[0104] Furthermore, in this embodiment, the brightness of the outer region of the multiplied image based on the external input image is reduced, and because the brightness of the multiplied color differs between the central and outer regions of the avatar image, the colors of the external input image can be reflected in the avatar image in a three-dimensional manner.

[0105] In particular, in this embodiment, by gradually decreasing the brightness from the inside to the outside of the outer edge region of the multiplied image, the colors of the external input image can be reflected in the avatar image in a natural and three-dimensional manner.

[0106] Furthermore, it is also possible to use a configuration that does not change the brightness according to the region of the multiplication image, thereby reducing the processing load.

[0107] In this embodiment, the avatar image is multiplied by the average color value based on the external input image. However, as shown in Figure 9, the avatar image may also be multiplied by the external input image itself. In this configuration as well, the colors of the external input image can be reflected in the avatar image.

[0108] Furthermore, in this case, instead of directly multiplying the external input image by the avatar image, as shown in Figure 9, blurring the external input image before multiplying it by the avatar image prevents the external input image from being projected sharply onto the avatar image, allowing the colors of the external input image to be naturally reflected in the avatar image.

[0109] Furthermore, in this embodiment, when it is detected that the amount of movement of the external input image exceeds a preset amount, the brightness of the multiplicative image multiplied by the avatar image is added. As a result, the color of the avatar image placed in front of the external input image is changed based on the amount of movement of the external input image, making it possible to generate a more realistic image using both the external input image and the avatar image.

[0110] In this embodiment, the avatar image's color is changed by altering the brightness of a multiplying image applied to the avatar image when it is detected that the amount of movement of the external input image exceeds a preset amount. However, the avatar image's color may also be changed by multiplying or adding a specific color or image to the avatar image when it is detected that the amount of movement of the external input image exceeds a preset amount. In such a configuration, the color of the avatar image placed in front of the external input image is changed based on the amount of movement of the external input image, thus enabling the creation of a more realistic image using both the external input image and the avatar image.

[0111] Furthermore, in this embodiment, the color of the avatar image is changed based on the amount of motion of the external input image each time a video frame is generated. Therefore, it is possible to generate a video of the avatar image that reflects the external input image based on the amount of motion of the external input image for each frame.

[0112] Alternatively, the avatar image's color may be changed based on the amount of movement of the external input image each time multiple frames are generated.

[0113] Furthermore, in this embodiment, the brightness of the multiplicative image multiplied by the avatar image is changed based on the external input audio attached to the external video data. Since the color of the avatar image placed in front of the external input image is changed based on the audio attached to the external video data, it is possible to generate a more realistic image using the external input image and the avatar image.

[0114] In this embodiment, the avatar image's color is changed by altering the brightness of a multiplicative image applied to the avatar image based on external audio input added to the external video data. However, the avatar image's color may also be changed by multiplying or adding a specific color or image to the avatar image based on external audio input added to the external video data. In such a configuration, the color of the avatar image placed in front of the external input image is changed based on the audio added to the external video data, thus enabling the creation of a more realistic image using both the external input image and the avatar image.

[0115] Furthermore, in this embodiment, the avatar image color is changed when the external input audio attached to the external video data exceeds a preset volume setting value, so that the avatar image color can be reflected when loud audio is input.

[0116] Furthermore, in this embodiment, when pre-set high-frequency sounds or pre-set low-frequency sounds are input as external audio added to external video data, that is, when a specific sound is input, the color of the avatar image is changed, so that the input of a specific sound can be reflected in the color of the avatar image.

[0117] Furthermore, in this embodiment, the avatar image's color is changed when audio of a specific frequency range is input as external audio added to the external video data, thus reflecting the input of audio of a specific frequency range in the avatar image's color.

[0118] Furthermore, in this embodiment, the avatar image's color is changed to a color set according to the type of audio input, i.e., whether a pre-set high-frequency audio or a pre-set low-frequency audio is input as external audio added to the external video data. This allows the avatar image's color to change according to the type of audio input.

[0119] In this embodiment, the system identifies a specific frequency range of sound and changes the color of the avatar image according to its type. However, it is also possible to identify the type of sound, such as the genre of background music, and change the color of the avatar image according to the identified type of sound. In such a configuration, the color of the avatar image can also be changed to a color corresponding to the input sound type.

[0120] Furthermore, in this embodiment, the color of the avatar image is changed based on the external input audio attached to the external video data each time a video frame is generated. Therefore, it is possible to generate a video of the avatar image that reflects the audio based on the external video data frame by frame.

[0121] Alternatively, the system could be configured so that the avatar image's color changes based on the external audio input attached to the external input image each time multiple frames are generated.

[0122] Furthermore, in this embodiment, color correction based on the color of the external input image, color correction based on the amount of motion of the external input image, and color correction based on the external input audio are all performed each time a video frame is generated. However, the frequency at which color correction based on the color of the external input image, color correction based on the amount of motion of the external input image, and color correction based on the external input audio are performed may be different. For example, color correction based on the color of the external input image may be performed each time a video frame is generated, while color correction based on the amount of motion of the external input image and color correction based on the external input audio may be performed every time multiple frames are generated.

[0123] Although Embodiment 1 of the present invention has been described above with reference to the drawings, the present invention is not limited to Embodiment 1, and it goes without saying that any modifications or additions that do not depart from the spirit of the present invention are also included in the present invention.

[0124] For example, in the above Example 1, we described an example in which a composite image generated by an image synthesis program is used for video distribution. However, the use of the composite image generated by the image synthesis program is arbitrary; it may be used to record the composite image as an archive, or it may be used for animation production, etc.

[0125] Furthermore, in the above-described example 1, the facial expression parameters output from the face tracking program are corrected by a parameter correction program, and the image generation program generates an avatar image using the corrected facial expression parameters. However, it is also possible to configure the image generation program to directly use the parameters output from the face tracking program to generate an avatar image without using a parameter correction program.

[0126] Furthermore, in the above embodiment 1, the capture device 112 uses an external input image captured from external video data as a background image, and generates and outputs a composite image in which an avatar image generated by an avatar image generation program is placed in front of the external input image. In this configuration, the color of the avatar image is corrected based on the color of the external input image, the amount of motion of the external input image, and the external input sound added to the external input image. However, it is sufficient if the color of the avatar image is corrected based on at least one of the elements of the color of the external input image, the amount of motion of the external input image, and the external input sound added to the external input image.

[0127] Furthermore, in the above embodiment 1, the avatar image generation program generates an avatar image, and a separate color correction program modifies the color of the avatar image generated by the avatar image generation program based on an external input image. However, the avatar image generation program may also be configured to generate an avatar image and modify its color based on an external input image. [Examples]

[0128] Embodiment 2 of the present invention will be described below. Since the computer terminal 1 used in Embodiment 2 has the same configuration as the computer terminal 1 used in Embodiment 1, the same reference numerals will be used, and a detailed explanation will be omitted here. Furthermore, any configurations and modifications thereof applicable to Embodiment 1 are also applicable to this embodiment.

[0129] In this embodiment, the storage 103 of the computer terminal 1 stores various programs executed by the computer terminal 1, in addition to an operating system (OS) not shown in the figure. Specifically, it stores a face tracking program, a motion tracking program, a voice detection program, an input detection program, a vital sign detection program, a parameter correction program, an avatar 3D model generation program, an image analysis program, a voice analysis program, a 3D model color correction program, an image synthesis program, and a distribution program. Although not shown in the figure, it also stores configuration settings, material data used by the image generation program, and the like.

[0130] As shown in Figures 11 to 13, the computer terminal 1 is configured such that an avatar 3D model generation program generates an avatar 3D model with facial expressions based on facial expression parameters and motion parameters based on image data output from camera 108, an image synthesis program generates a composite image by using an external input image captured by capture device 112 as a background image and compositing an avatar image based on the avatar 3D model in front of it, and a distribution program distributes a video using the composite image generated by the image synthesis program.

[0131] Furthermore, the facial expression parameters output by the face tracking program are not directly output to the avatar 3D model generation program, but rather, as shown in Figure 11, are output to the avatar 3D model generation program via a parameter correction program. The parameter correction program corrects at least some of the facial expression parameters using some facial expression parameters and other parameters (voice parameters, input parameters, vital parameters, configuration parameters), and outputs the corrected facial expression parameters to the avatar 3D model generation program.

[0132] Furthermore, the avatar 3D model generated by the avatar 3D model generation program is not directly output to the image synthesis program as an avatar image based on that avatar 3D model. Instead, as shown in Figures 12 and 13, the 3D model color correction program irradiates the avatar 3D model with virtual light, correcting the color of the avatar image based on the avatar 3D model. The color-corrected avatar image is then output to the image synthesis program. The 3D model color correction program irradiates the avatar 3D model with virtual light based on the color of the external input image to be synthesized in the image synthesis program, the amount of motion of the external input image, and the external input audio input along with the external input image. It then performs color correction on the avatar image based on the avatar 3D model and outputs the color-corrected avatar image to the image synthesis program.

[0133] In this embodiment, the computer terminal 1, similar to Embodiment 1, can input video content such as gameplay videos and movies as external video data from an external device and capture images of this video data as external input images. By distributing a video that combines the input gameplay videos, movies, and other video content with an avatar image, it is configured to distribute game commentary videos and movie commentary videos as new videos.

[0134] [program] Next, the program executed by computer terminal 1 in this embodiment will be described. Note that the description of programs common to both Embodiment 1 and Embodiment 1 will be omitted.

[0135] As shown in Figure 11, the avatar 3D model generation program is a program that generates an avatar 3D model based on facial expression parameters N and extended parameters from the parameter correction program, motion parameters from the motion tracking program, audio parameters from the audio parameters program, and input parameters from the input detection program, and outputs the generated avatar 3D model. It includes a face model generation process to generate a face model, a face model correction process to correct the face model, an outfit model generation process to generate an outfit model, and an outfit model generation process to generate an outfit model. The avatar 3D model generation program is executed every frame at intervals corresponding to the frame rate of the video distributed by the distribution program. For example, if the frame rate is 60fps, the avatar 3D model generation program is executed 60 times per second.

[0136] As shown in Figure 12, the 3D model color correction program is a program that corrects the color of an avatar image based on an avatar 3D model generated by an avatar 3D model generation program by irradiating the avatar 3D model generated by the avatar 3D model generation program with virtual illumination light based on the color of the external input image captured by the capture device 112 from external video data, motion amount detection data output from the image analysis program, volume data output from the audio analysis program, and pitch data, and then generates and outputs the corrected avatar image. The program includes object placement processing to place the avatar 3D model in a virtual 3D space, illumination light identification processing to identify the illumination light to be irradiated onto the avatar 3D model, illumination light placement processing to place the illumination light identified in the illumination light identification processing in a virtual 3D space, and avatar image generation processing to generate an avatar image based on the avatar 3D model. The 3D model color correction program is executed every frame at intervals corresponding to the frame rate of the video distributed by the distribution program. For example, if the frame rate is 60fps, the 3D model color correction program is executed 60 times per second.

[0137] As shown in Figure 12, the image synthesis program generates and outputs a composite image (see Figure 13) in front of the external input image captured by the capture device 112 from external video data, using the external input image as the background image, and placing an avatar image based on a 3D avatar generated by the avatar 3D model generation program and color-corrected by the 3D model color correction program in front of the external input image. The image synthesis program is executed every frame at intervals corresponding to the frame rate of the video distributed by the distribution program. For example, if the frame rate is 60fps, the image synthesis program is executed 60 times per second.

[0138] As shown in Figure 12, the distribution program is a program that distributes video data consisting of a video composed of a composite image in which an avatar image is placed in front of an external input image output from the image synthesis program, audio input from microphone 109, and audio input from audio input device 113. The frame rate of the video data distributed by the distribution program can be set from multiple types, and at intervals based on the frame rate set here, the external input image is captured by the capture device 112 as described above for each frame of the video data distributed by the distribution program, and the parameter correction program, avatar 3D model generation program, image analysis program, audio analysis program, 3D model color correction program, and image synthesis program are executed.

[0139] [3D Model Color Correction Program] Figure 14 is a flowchart showing the control details of the 3D model color correction program.

[0140] As shown in Figure 14, the 3D model color correction program first places the avatar 3D model generated by the avatar 3D model generation process into a virtual 3D space (Sb1).

[0141] Next, the capture device 112 acquires an external input image captured from the external video data (Sb2). Then, a set area of ​​the acquired external input image is cropped (Sb3). The area to be cropped can be set arbitrarily, and in this embodiment, as shown in Figure 15(a), an example of cropping the central area of ​​the external input image is shown.

[0142] Next, the cropped area is divided into multiple regions (Sb4). The number and size of the regions to be divided can be set arbitrarily, and in this embodiment, as shown in Figure 15(a), an example is shown in which the area is divided into three vertically and three horizontally, resulting in nine regions.

[0143] Next, colors with a brightness of 50% or higher are extracted for each region, and their average value is calculated. As shown in Figure 15(b), the calculated average value is then set as the color for each region (Sb5).

[0144] Next, as shown in Figures 15(c) to (e), virtual light sources are placed in the virtual 3D space at coordinates corresponding to the regions divided in Sb4, facing the direction of the 3D avatar (Sb6). These virtual light sources emit colors set for each of the multiple regions in Sb5. As a result, the avatar 3D model is illuminated from the directions set for each of the multiple regions with light of the colors set for each region based on the external input image, and the color of the avatar 3D model is changed.

[0145] Next, based on the motion detection data output from the image analysis program, it is determined whether the motion of the external input image is equal to or greater than a preset motion setting value (Sb7). If the motion of the external input image is not equal to or greater than the motion setting value, the process proceeds to Sb9. If the motion of the external input image is equal to or greater than the motion setting value, the brightness of all the light sources placed in Sb6 is increased by 20% (Sb8). The motion setting value referenced in Sa7 can be set arbitrarily, and the brightness of the light sources to be varied in Sb8 can also be set arbitrarily.

[0146] Next, based on the volume data and pitch data output from the audio analysis program, it is determined whether the volume of the external input audio is equal to or greater than the volume setting value and the pitch is equal to or greater than the treble setting value (Sb9). If the volume of the external input audio is less than the volume setting value or the pitch is less than the treble setting value, the process proceeds to Sb11. If the volume of the external input audio is equal to or greater than the volume setting value and the pitch is equal to or greater than the treble setting value, the brightness of all the light sources placed in Sb6 is increased by 20% (Sb10). The volume setting value and treble setting value referenced in Sb9 can be set arbitrarily, and the brightness of the light sources to be varied in Sb10 can also be set arbitrarily.

[0147] Next, based on the volume data and pitch data output from the audio analysis program, it is determined whether the volume of the external input audio is equal to or greater than the volume setting value and the pitch is equal to or less than the bass setting value (Sb11). If the volume of the external input audio is less than the volume setting value or the pitch is greater than the bass setting value, the program proceeds to Sb13. If the volume of the external input audio is equal to or greater than the volume setting value and the pitch is equal to or less than the bass setting value, the brightness of all the light sources placed in Sb6 is reduced by 20% (Sb12). The volume setting value and bass setting value referenced in Sb11 can be set arbitrarily, and the brightness of the light sources to be varied in Sb12 can also be set arbitrarily.

[0148] Next, as shown in Figure 15(d), an avatar image is generated from the avatar 3D model placed in the virtual 3D space (Sb13). Specifically, it is generated by exporting an image of the avatar 3D model as viewed from a specified direction within the virtual 3D space.

[0149] Next, the generated avatar image is output to the image synthesis program (Sb14).

[0150] [Effect 2] In this embodiment, the capture device 112 uses an external input image captured from external video data as a background image, and generates and outputs a composite image in which an avatar image based on an avatar 3D model generated by an avatar 3D model generation program is placed in front of the external input image. In this configuration, the avatar image placed in front of the external input image is an avatar image generated from an avatar 3D model that is illuminated with virtual light based on the external input image. As a result, the color of the avatar image is changed based on the external input image, making it possible to generate an image with a greater sense of realism using both the external input image and the avatar image.

[0151] Furthermore, in this embodiment, each time a video frame is generated, a virtual illumination light based on an external input image is projected onto the avatar 3D model. Each time a video frame is generated, an avatar image is generated from the avatar 3D model that has been illuminated by the virtual illumination light. As a result, the color of the avatar image is changed, making it possible to generate a video of an avatar image that reflects the external input image based on external video data frame by frame.

[0152] Alternatively, the system could be configured so that the color of the avatar image based on the 3D avatar model is changed by modifying the virtual illumination light based on the external input image each time multiple frames are generated.

[0153] Furthermore, in this embodiment, the illuminating light of the color of the external input image is shone onto the avatar 3D model, changing the color of the avatar image based on the avatar 3D model, thus reflecting the color of the external input image onto the avatar image. This makes it possible to create a sense of realism in the avatar image, as if viewing the screen through a monitor displaying the external input image.

[0154] Furthermore, in this embodiment, the avatar 3D model is illuminated with light of the average value of the colors in the external input image whose brightness is above a predetermined brightness (50% or more in this embodiment), thereby changing the color of the avatar image based on the avatar 3D model. This allows the average value of the colors in the external input image to be reflected in the color of the avatar image, while preventing the avatar image from becoming too dark.

[0155] In this embodiment, the avatar 3D model is illuminated with light of the average value of the colors in the external input image whose brightness is above a predetermined brightness. However, the avatar 3D model may also be illuminated with light of the average value of all the colors in the external input image. Even in such a configuration, the average value of the colors in the external input image can be reflected in the color of the avatar image.

[0156] Furthermore, in this embodiment, the avatar 3D model is illuminated with light of the average color of a specific area of ​​the external input image. Since the average color of a specific area of ​​the external input image, rather than the entire image, is reflected in the avatar image's color, the correlation with the external input image can be enhanced by, for example, reflecting the average color of the central or characteristic area of ​​the external input image in the avatar image's color.

[0157] Alternatively, the avatar 3D model may be illuminated with light representing the average color of the entire external input image. This configuration allows the processing load to be reduced while reflecting the average color of the external input image onto the avatar image.

[0158] Furthermore, in this embodiment, the external input image is divided into multiple regions, the average value of the colors in each region is set as the color of that region, and illumination light of the set color corresponding to each region is projected onto the avatar 3D model from the coordinates corresponding to each region. As a result, the color changes for each of the multiple regions according to the color scheme of the external input image, and an image with even greater realism can be generated by combining the external input image and the avatar image.

[0159] In this embodiment, multiple illumination lights, each with a different color set for multiple regions, are directed towards the avatar 3D model. However, it is also possible to use a configuration where a single-color illumination light, such as the average color value based on the entire or a portion of the external input image, is directed towards the avatar 3D model. Such a configuration can reduce the processing load. In this case, there may be multiple illumination light sources or just one.

[0160] Furthermore, in this embodiment, when it is detected that the amount of movement of the external input image exceeds a preset amount, the brightness of the light source illuminating the avatar 3D model is added. As a result, the color of the avatar image placed in front of the external input image is changed based on the amount of movement of the external input image, making it possible to generate a more realistic image using both the external input image and the avatar image.

[0161] In this embodiment, the configuration changes the color of the avatar image by changing the brightness of the light source illuminating the avatar 3D model when it is detected that the amount of movement of the external input image exceeds a preset amount. However, the configuration may also change the color of the avatar image by changing the color of the light source illuminating the avatar 3D model when it is detected that the amount of movement of the external input image exceeds a preset amount. In such a configuration as well, the color of the avatar image placed in front of the external input image is changed based on the amount of movement of the external input image, so that a more realistic image can be generated with the external input image and the avatar image.

[0162] Furthermore, in this embodiment, each time a video frame is generated, the brightness of the light source illuminating the avatar 3D model is changed based on the amount of motion of the external input image, and the color of the avatar image is changed. Therefore, it is possible to generate a video of the avatar image that reflects the external input image based on the amount of motion of the external input image for each frame.

[0163] Alternatively, the system may be configured to change the brightness of the light source illuminating the 3D avatar model based on the amount of movement of the external input image each time multiple frames are generated, thereby changing the color of the avatar image.

[0164] Furthermore, in this embodiment, the brightness of the light source illuminating the avatar 3D model is changed based on external input audio added to the external video data. As a result, the color of the avatar image, which is placed in front of the external input image, is changed based on the audio added to the external video data. Therefore, a more realistic image can be generated using both the external input image and the avatar image.

[0165] In this embodiment, the avatar image's color is changed by altering the brightness of the light source illuminating the avatar 3D model based on external audio input added to the external video data. However, the avatar image's color may also be changed by altering the color of the light source illuminating the avatar 3D model based on external audio input added to the external video data. In this configuration as well, the color of the avatar image positioned in front of the external input image is changed based on the audio added to the external video data, thus enabling the creation of a more realistic image with both the external input image and the avatar image.

[0166] Furthermore, in this embodiment, if the external input audio attached to the external video data exceeds a preset volume setting, the brightness of the light source illuminating the avatar 3D model is changed, thereby changing the color of the avatar image. This allows the color of the avatar image to be reflected when loud audio is input.

[0167] Furthermore, in this embodiment, when pre-set high-frequency sounds or pre-set low-frequency sounds are input as external audio added to external video data, that is, when a specific sound is input, the brightness of the light source illuminating the avatar 3D model is changed, and the color of the avatar image is changed, thereby reflecting the input of a specific sound in the color of the avatar image.

[0168] Furthermore, in this embodiment, when audio of a specific frequency range is input as external audio added to external video data, the brightness of the light source illuminating the avatar 3D model is changed, thereby changing the color of the avatar image. This allows the input of audio of a specific frequency range to be reflected in the color of the avatar image.

[0169] Furthermore, in this embodiment, the brightness of the light source illuminating the avatar 3D model is changed so that the brightness is set according to the type of audio input, i.e., whether a pre-set high-frequency audio or a pre-set low-frequency audio is input as an external input audio added to the external video data. This changes the color of the avatar image to a color set according to the type of audio input, thus allowing the color of the avatar image to be changed to a color corresponding to the type of audio input.

[0170] In this embodiment, the system identifies a specific frequency range of sound and changes the color of the avatar image according to its type. However, it is also possible to identify the type of sound, such as the genre of background music, and change the color of the avatar image according to the identified type of sound. In such a configuration, the color of the avatar image can also be changed to a color corresponding to the input sound type.

[0171] Furthermore, in this embodiment, each time a video frame is generated, the brightness of the light source illuminating the avatar 3D model is changed based on the external input audio attached to the external video data, and the color of the avatar image is changed. This makes it possible to generate a video of the avatar image that reflects the audio based on the external video data frame by frame.

[0172] Alternatively, the system may be configured to change the brightness of the light source illuminating the 3D avatar model based on the external input audio attached to the external input image each time multiple frames are generated, thereby changing the color of the avatar image.

[0173] Furthermore, in this embodiment, the changes to the illumination light based on the color of the external input image, the changes to the illumination light based on the amount of motion of the external input image, and the changes to the illumination light based on the external input audio are all performed each time a video frame is generated. However, the frequency of the changes to the illumination light based on the color of the external input image, the changes to the illumination light based on the amount of motion of the external input image, and the changes to the illumination light based on the external input audio may be different. For example, the changes to the illumination light based on the color of the external input image may be performed each time a video frame is generated, while the changes to the illumination light based on the amount of motion of the external input image and the changes to the illumination light based on the external input audio may be performed every time multiple frames are generated.

[0174] Although Embodiment 2 of the present invention has been described above with reference to the drawings, the present invention is not limited to Embodiment 2, and it goes without saying that any modifications or additions that do not depart from the spirit of the present invention are included in the present invention. Furthermore, any aspects of the configuration of Embodiment 1 and its modifications that are applicable thereto may also be applied to Embodiment 2.

[0175] For example, in the above embodiment 2, an example was described in which the facial expression parameters output from the face tracking program are corrected by a parameter correction program, and the avatar 3D model generation program generates an avatar 3D model using the corrected facial expression parameters. However, it is also possible to configure the system so that the avatar 3D model generation program directly uses the parameters output from the face tracking program to generate an avatar 3D model without using a parameter correction program.

[0176] Furthermore, in the above embodiment 2, the capture device 112 uses an external input image captured from external video data as a background image, and generates and outputs a composite image in which an avatar image based on an avatar 3D model generated by an avatar 3D model generation program is placed in front of the external input image. In this configuration, the illumination light illuminating the avatar 3D model is changed based on the color of the external input image, the amount of motion of the external input image, and the external input sound added to the external input image, and the color of the avatar image based on the avatar 3D model is corrected. However, it is sufficient if the illumination light illuminating the avatar 3D model is changed based on at least one of the elements of the color of the external input image, the amount of motion of the external input image, and the external input sound added to the external input image, and the color of the avatar image based on the avatar 3D model is corrected.

[0177] Furthermore, in the above embodiment 2, the avatar 3D model generation program generates an avatar 3D model, and a separate 3D model color correction program generates an avatar image by irradiating the avatar 3D model generated by the avatar 3D model generation program with virtual illumination light based on an external input image, thereby changing the color of the avatar image based on the external input image. However, the avatar 3D model generation program may also generate an avatar 3D model and generate an avatar image by irradiating it with virtual illumination light based on an external input image, thereby changing the color of the avatar image. [Explanation of Symbols]

[0178] 1. Computer terminal 101 Processors 102 memory 103 Storage 104 Communication Interface 105 Display device 106 speakers 107 Input device 108 Cameras 109 Mike 110 Vital Signs Measuring Instruments 111 Data Bus 112 Capture device 113 Voice input device

Claims

1. A motion detection step that detects the performer's movements, An avatar image generation step that generates an avatar image corresponding to the detected action, A specific image input step where a specific image is entered, A composite image generation step that generates a composite image by placing an avatar image in front of the input specific image, The image output step includes outputting the aforementioned composite image, An image generation method comprising, A voice input step of inputting audio attached to the aforementioned specific image, A color change step which changes the color of the avatar image based on the voice input in the voice input step, To further enhance An image generation method characterized by the following:

2. In the color change step, the color of the avatar image is changed if the volume of the voice input in the voice input step is above a predetermined threshold. The image generation method according to claim 1.

3. In the color change step, the color of the avatar image is changed when a specific voice is input in the voice input step. The image generation method according to claim 1.

4. The aforementioned specific sound is a sound within a specific frequency range. The image generation method according to claim 3.

5. The aforementioned specific voice includes a predetermined number of different voices. In the color change step, the color of the avatar image is changed to a color set according to the type of voice input in the voice input step. The image generation method according to claim 3.

6. In the aforementioned color change step, the color of the avatar image is changed by multiplying it with a specific color or a specific image. The image generation method according to claim 1.

7. In the aforementioned color change step, the color of the avatar image is changed by changing the brightness. The image generation method according to claim 1.

8. The aforementioned avatar image generation step is: A 3D model generation step that generates a 3D model of an avatar corresponding to the detected action, A 3D model placement step in which the generated 3D model is placed in 3D space, Includes, The avatar image is generated from the 3D model placed in the 3D space. The aforementioned color change step is, The irradiation step includes irradiating the 3D model, which is arranged in the 3D space, with virtual illumination light based on the voice input in the voice input step, In the irradiation step, the avatar image is generated from the 3D model irradiated with the virtual irradiation light, thereby changing the color of the avatar image. The image generation method according to claim 1.

9. In the color change step, the color of the avatar image is changed based on the specific image. The image generation method according to claim 1.

10. In the color change step, the color of the avatar image is changed based on the color of the specific image. The image generation method according to claim 9.

11. The specific image input by the aforementioned specific image input step is a video, The method further includes a motion amount detection step for detecting the amount of motion of the specified image, In the color change step, the color of the avatar image is changed based on the amount of motion of the specific image detected in the motion detection step. The image generation method according to claim 9.

12. The aforementioned composite image is an image that constitutes a video. In the aforementioned color change step, the color of the avatar image is changed for each frame of the video based on the audio input in the aforementioned audio input step. In the composite image generation step, the composite image is generated for each frame of the video using the specific image and the avatar image with its color changed. The image generation method according to any one of claims 1 to 11.

13. A motion detection step that detects the performer's movements, An avatar image generation step that generates an avatar image corresponding to the detected action, A specific image input step where a specific image is entered, A composite image generation step that generates a composite image by placing an avatar image in front of the input specific image, The image output step includes outputting the aforementioned composite image, In an image generation program that causes a computer to execute, A voice input step of inputting audio attached to the aforementioned specific image, A color change step which changes the color of the avatar image based on the voice input in the voice input step, Further execution An image generation program characterized by the following features.