A single-image-based object illumination editing method, system and medium
By training specular decomposition, normal, and lighting networks, an inverse rendering model was constructed, solving the problem of converting a single image into a 3D model, enabling the editing of lighting and materials, and improving the user experience of augmented reality technology.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NAT UNIV OF DEFENSE TECH
- Filing Date
- 2022-09-30
- Publication Date
- 2026-06-12
Smart Images

Figure CN115719399B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of augmented reality technology, specifically to a method, system, and medium for object lighting editing based on a single image, used to achieve reverse rendering of an image. Background Technology
[0002] With the continuous development of modern augmented reality (AR) technology, mobile AR technology has been widely applied, such as inserting new objects into real images and videos. However, currently, inserting objects into real images and videos still requires virtual objects, with professionals creating corresponding 3D models. This step is very unfriendly to amateur users, as those without the necessary background cannot build the 3D models they need. Most current AR mobile applications, such as Snapchat and IKEA Place, only support inserting pre-built virtual objects. This significantly limits the user experience. A more appealing setup would allow users to automatically extract objects from photos and insert them into the target scene. This requires a bottleneck technology to solve the automatic conversion from a single image to a 3D model. Image-based relighting (changing the lighting effects of an object to those of the target scene) has remained a key challenge in the fields of graphics and vision. Relighting requires recovering the current lighting, geometry, and material information of the real object; these issues together constitute the inverse rendering problem in computer graphics.
[0003] There are currently some related technologies that address this problem from other perspectives, such as: 1) Relighting methods for scene images: Paper [1] (Y.Yu,A.Meka,M.Elgharib,H.-P.Seidel,C.Theobalt,and WASmith,“Self-supervised outdoor scene relighting,”in European Conference on ComputerVision.Springer,2020,pp.84–101.) proposes a scene relighting method based on deep learning. For images of outdoor buildings, assuming the material is matte, the normal (geometric information), material color, and shadow of the input image are estimated, and then the scene image under new lighting is rendered. This method only requires one image input and does not require any other information. The disadvantage is that it is only effective for building images and outdoor scenes, and the effect is not good for object images. 2) Deep learning inverse rendering technology for synthetic data: Since this task is an ill-conditioned problem, deep learning tools are good at solving such problems, but a large amount of labeled training data is required. Such real image data is difficult to obtain because the material and lighting information of objects are difficult to capture. Therefore, using synthetic data as training data is also a common approach. Method [2] (M. Janner, J. Wu, TD Kulkarni, I. Yildirim, and J. Tenenbaum, “Self-supervised intrinsic image decomposition,” in NIPS, 2017, pp. 5936–5946.) is a method based on large-scale synthetic data. For images of a single object, it can effectively recover geometric and other three-dimensional information. However, there is a problem that the feature space mapping domain of synthetic data and real data does not correspond. This leads to the real test data not conforming to the distribution of synthetic data in the feature space. Therefore, the trained method has a very poor effect on real data. 3) Lighting rendering technology for matte objects: After the reverse rendering step, the differentiable rendering technology is the next step.Currently, PyTorch3D only supports rendering of point light sources, and existing methods [3] (R. Ramamoorthi and P. Hanrahan, “An efficient representation for irradiance environment maps,” in Proceedings of the 28th annual conference on Computer graphics and interactive techniques, 2001, pp. 497–500.) only support rendering of matte materials. The assumption of matte materials can be basically true for scene images, but for individual objects, specular reflection is widespread. Therefore, the rendering of matte reflections cannot realistically simulate the lighting effects of objects. 4) Lighting editing techniques for image histograms: There is another type of method [4](Shu Z, Hadap S, Shechtman E, et al. Portrait lighting transfer using a mass transport approach[J]. ACM Transactions on Graphics(TOG),2017,36(4):1.) which does not consider three-dimensional information at all, nor does it re-render the object. It achieves a visual effect similar to lighting editing by simply transferring the color histogram between two images. This type of method has significant limitations and may result in changes in material color as well. Summary of the Invention
[0004] The technical problem to be solved by the present invention is to provide a method, system and medium for object lighting editing based on a single image, which can realize the automatic conversion from a single image to a three-dimensional model and can assign lighting and material information as needed, and can be widely used in augmented reality technology.
[0005] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is as follows:
[0006] A method for object lighting editing based on a single image, comprising:
[0007] S101, remove strong highlights from a single image of the target object using a trained specular decomposition network;
[0008] S102, Input the image after removing strong highlights into the trained normal network to estimate the normal map of the target object in the image, and input the image after removing strong highlights into the trained lighting network to estimate the lighting map of the target object in the image.
[0009] S103, perform matte rendering based on the normal map and the lighting map to obtain the lighting map of the target object, divide the original image of the single image by the lighting map to obtain the material map, and obtain the inverse rendering model composed of the normal map, the lighting map, the lighting map and the material map.
[0010] S104: Assign at least one of new lighting and materials to the inverse rendering model of the target object, and then perform specular rendering on the inverse rendering model of the target object to obtain an image of the target object under the new lighting and materials.
[0011] Optionally, before removing strong highlights from the input single image through the trained highlight decomposition network in step S101, the saturation pixel ratio of the input single image is also detected. If the saturation pixel ratio is greater than a set threshold, the strong highlights of the input single image are removed through the trained highlight decomposition network; otherwise, the input single image is used as the image after removing strong highlights, and the process proceeds to step S102.
[0012] Optionally, the normal network in step S102 includes an encoder and a decoder connected in sequence. The encoder is used to encode the image after removing strong highlights to obtain a normal encoding vector, and the decoder is used to decode the encoding vector into a normal map of the object in the image.
[0013] Optionally, the illumination network in step S102 includes an encoder, a connection layer, a multilayer perceptron, and a spherical harmonic coefficient layer connected in sequence. The encoder is used to encode the image after removing strong highlights to extract the illumination encoding vector. The connection layer is used to connect the illumination encoding vector and the image after removing strong highlights as the input of the multilayer perceptron to obtain illumination coefficient information. The spherical harmonic coefficient layer is used to estimate multiple spherical harmonic coefficients based on the second-order spherical harmonic basis function to serve as the illumination map of the object in the image.
[0014] Optionally, before step S101, the process may include training a specular decomposition network, a normal network, and an illumination network:
[0015] S201, Construct a video dataset with object alignment between frames but different ambient lighting;
[0016] S202, constructing a low-rank error as the loss function, and performing unsupervised training on the specular decomposition network, normal network, and illumination network. The unsupervised training of the specular decomposition network, normal network, and illumination network is divided into two rounds. In the first round of training, the normal network is fixed to train the specular decomposition network and illumination network until the low-rank error converges. In the second round, the illumination network is fixed to train the specular decomposition network and normal network until the low-rank error converges.
[0017] Optionally, constructing the low-rank error as the loss function in step S202 includes:
[0018] S301, extract the normal map and lighting map from the original image of the sample object, perform matte rendering to obtain the lighting map of the sample object, divide the original image of the sample object by the lighting map of the sample object to obtain the material map, and use multiple material maps from the same batch as a row of matrix R to construct matrix R.
[0019] S302, perform singular value decomposition on matrix R and extract low-rank approximate matrices.
[0020] S303, based on matrix R and low-rank approximation The square of the F-modulus between the two is used as the loss function.
[0021] Optionally, in step S201, constructing a video dataset with aligned target objects but different ambient lighting in each frame means fixing the camera and the target object relatively on the same turntable, rotating the turntable and capturing images of the target object through the camera, so that the position of the target object captured by the camera remains unchanged, thereby achieving alignment of target objects in each frame, but with different ambient lighting depending on the angle of the turntable, thus obtaining a video dataset with aligned target objects but different ambient lighting in each frame.
[0022] Optionally, the function expression for matte rendering based on the normal map and lightmap in step S103 is:
[0023]
[0024] In the above formula, I d (p) represents the matte reflection color at any pixel p, a p The color of the matte material at point p, l w Let L be the luminous intensity of point light source w in the illumination diagram. w Let n be the direction of the point light source w in the lighting diagram. p Let C be the normal vector at point p in the normal diagram, and L be the set of point light sources in the lighting diagram; l,m The spherical harmonic coefficients, Y is the parameter of the spherical harmonic basis function. l,m (θ·φ) are spherical harmonic basis functions, n p = (x,y,z), where (θ,φ) are the spherical coordinates corresponding to (x,y,z);
[0025] The function expression for specular rendering of the inverse rendering model of the target object in step S104 is:
[0026]
[0027] In the above formula, H(p) is the specular reflection color at pixel p, and s p Let be the color of the specular material at point p, v be the viewing direction, and α be the specular material parameters. is a spherical harmonic basis function of the hyperspectral basis.
[0028] Furthermore, the present invention also provides an object lighting editing system based on a single image, including a microprocessor and a memory interconnected thereto, wherein the microprocessor is programmed or configured to execute the object lighting editing method based on a single image.
[0029] Furthermore, the present invention also provides a computer-readable storage medium storing a computer program that is programmed or configured by a microprocessor to perform the object lighting editing method based on a single image.
[0030] Compared with existing technologies, the present invention has the following advantages: The method of the present invention includes removing strong highlights from a single image of the target object through a trained specular decomposition network; inputting the image after removing strong highlights into a trained normal network to estimate the normal map of the target object in the image; inputting the image after removing strong highlights into a trained lighting network to estimate the lighting map of the target object in the image; inputting the normal map and lighting map into a trained differentiable rendering layer to obtain the inverse rendering model of the target object; assigning new lighting and materials to the inverse rendering model of the target object; and then re-rendering the lighting and material information of the inverse rendering model of the target object to obtain an image of the target object under the new lighting and materials. The present invention can realize the automatic conversion from a single image to a three-dimensional model directly, and can assign lighting and material information as needed, and can be widely used in augmented reality technology. Attached Figure Description
[0031] Figure 1 This is a schematic diagram of the basic process of the method in an embodiment of the present invention.
[0032] Figure 2 This is a schematic diagram illustrating the basic principle of the method in an embodiment of the present invention.
[0033] Figure 3 This is a schematic diagram of the training process of the method in an embodiment of the present invention.
[0034] Figure 4 This is a schematic diagram of the spherical harmonic basis functions of the high-light basis in an embodiment of the present invention.
[0035] Figure 5 The results are test results of the method of the present invention in multiple scenarios.
[0036] Figure 6 This is a schematic diagram comparing the effects of the method in this embodiment of the invention and direct insertion. Detailed Implementation
[0037] The problem to be solved by the single-image-based object lighting editing method in this embodiment is the lighting editing technology of single images. It can enable users to take any object image and automatically change its lighting effects to the target scene, so as to achieve a realistic augmented reality effect of inserting the object into the new scene.
[0038] like Figure 1 and Figure 2 As shown, this embodiment provides a method for editing object lighting based on a single image, including:
[0039] S101, remove strong highlights from a single image of the target object using a trained specular decomposition network;
[0040] S102, Input the image after removing strong highlights into the trained normal network to estimate the normal map of the target object in the image, and input the image after removing strong highlights into the trained lighting network to estimate the lighting map of the target object in the image.
[0041] S103. Perform matte rendering based on the normal map and the light map to obtain the light map of the target object. Divide the original image of the aforementioned single image by the light map to obtain the material map (that is, divide the value of each pixel in the original image by the value of the corresponding pixel in the light map to obtain the value of the material color of the corresponding pixel). The result is an inverse rendering model composed of the normal map, the light map, the light map and the material map.
[0042] S104: Assign at least one of new lighting and materials to the inverse rendering model of the target object, and then perform specular rendering on the inverse rendering model of the target object to obtain an image of the target object under the new lighting and materials.
[0043] See Figure 2In this embodiment, the execution objects for matte rendering in step S103 and specular rendering in step S104 are referred to as the differentiable rendering layer. The above process assumes the object is made of matte material, but many real images have strong highlights. In this case, the Lambertian assumption of diffuse inverse rendering cannot be used as a general case. To solve this problem, we added a specular decomposition network to remove strong highlights before the differentiable rendering layer performs diffuse inverse rendering. We observed that in the highlight region, the pixel color value is usually saturated (equal to 255), and if all three channels are saturated, the highlight often appears white. However, considering the diversity of input single images, in order to achieve on-demand operation of the specular decomposition network, step S101 in this embodiment includes detecting the saturation pixel ratio of the input single image before passing it through the trained specular decomposition network to remove strong highlights. If the saturation pixel ratio is greater than a set threshold, the input single image is passed through the trained specular decomposition network to remove strong highlights; otherwise, the input single image is used as the image after removing strong highlights, and the process jumps to step S102. For example, in this embodiment, the threshold is set to 10%. If the saturated pixel ratio is greater than 10%, the input single image is processed by the trained specular decomposition network to remove strong highlights; otherwise, the input single image is used as the image after removing strong highlights, and the process proceeds to step S102. We found that under this setting, the result is better than performing specular decomposition on all images, because learning-based specular removal methods tend to over-extract highlights from diffuse images.
[0044] In this embodiment, the specular decomposition network is specifically Spec-Net. The paper [1] (Y.Yu,A.Meka,M.Elgharib,H.-P.Seidel,C.Theobalt,and WASmith,“Self-supervised outdoorscene relighting,”in European Conference on Computer Vision.Springer,2020,pp.84–101.) defines low-rank error for unsupervised training by considering the consistency of rg chromaticity in the matte reflection portion after specular separation.
[0045] See Figure 3 In this embodiment, the normal network in step S102 includes an encoder and a decoder connected in sequence. The encoder is used to encode the image after removing strong highlights to obtain a normal encoding vector, and the decoder is used to decode the encoding vector into a normal map of the object in the image.
[0046] See Figure 3In this embodiment, the illumination network in step S102 includes an encoder, a connection layer, a multilayer perceptron, and a spherical harmonic coefficient layer connected in sequence. The encoder is used to encode the image after removing strong highlights to extract the illumination encoding vector. The connection layer is used to connect the illumination encoding vector and the image after removing strong highlights as the input of the multilayer perceptron to obtain illumination coefficient information. The spherical harmonic coefficient layer is used to estimate multiple spherical harmonic coefficients based on the second-order spherical harmonic basis function to serve as the illumination map of the object in the image.
[0047] To address the data bottleneck problem of unsatisfactory results from synthetic data and the lack of labeled real-world data, this embodiment proposes an unsupervised training method. For unlabeled training data, this embodiment innovatively proposes a lightweight network structure (a specular decomposition network, a normal network, and an illumination network). See [link to documentation]. Figure 3 In this embodiment, before step S101, there are also steps of training the specular decomposition network, the normal network, and the illumination network:
[0048] S201, Construct a video dataset with object alignment between frames but different ambient lighting;
[0049] S202, constructing a low-rank error as the loss function, and performing unsupervised training on the specular decomposition network, normal network, and illumination network. The unsupervised training of the specular decomposition network, normal network, and illumination network is divided into two rounds. In the first round of training, the normal network is fixed to train the specular decomposition network and illumination network until the low-rank error converges. In the second round, the illumination network is fixed to train the specular decomposition network and normal network until the low-rank error converges.
[0050] In this embodiment, the entire training process is self-supervised. During training, because the normal coordinate axis can have several selection methods, we first pre-train the normal network using a small batch of synthetic data to initialize its coordinate axes. After pre-training, we begin alternating learning of the normal network and the lighting network. Since training with the same low-rank error can lead to a chicken-and-egg problem between the two networks, making simultaneous training difficult, we first fix the normal network and train the lighting network. Once the error converges, we fix the lighting network again and train the normal network, alternating in this way until the error converges. This process continuously reduces the error, thus ensuring optimal results. Finally, after the network is trained, rapid rendering of specular reflections can be achieved during testing, supporting simultaneous editing of lighting and materials. Ultimately, the method in this embodiment is implemented as an Android application, supporting lightweight augmented reality effects on mobile devices.
[0051] This embodiment proposes a novel low-rank error definition, which exhibits good convergence and can effectively complete unsupervised learning of the lightweight network structures (spectral decomposition network, normal network, and illumination network) mentioned above. Specifically, in step S202 of this embodiment, constructing the low-rank error as the loss function includes:
[0052] S301, extract the normal map and lighting map from the original image of the sample object, perform matte rendering to obtain the lighting map of the sample object, divide the original image of the sample object by the lighting map of the sample object to obtain the material map (that is, divide the value of each pixel in the original image by the value of the corresponding pixel in the lighting map to obtain the material color value of the corresponding pixel), and construct matrix R by taking multiple material maps in the same batch as a row of matrix R.
[0053] S302, perform singular value decomposition on matrix R and extract low-rank approximate matrices.
[0054] The singular value decomposition of matrix R can be expressed as:
[0055] R=UΣV T ,
[0056] In the above formula, U is the left singular vector, Σ is the diagonal matrix, V is the right singular vector, and we have:
[0057] Σ=diag(σ1,σ2,…,σ 16 ),
[0058] In the above formula, σ1~σ 16 These are the 16 singular values obtained by performing singular value decomposition on matrix R.
[0059] In this embodiment, let Σ′=diag(σ1,0,…,0) to obtain a low-rank approximate matrix. The function expression is:
[0060]
[0061] S303, based on matrix R and low-rank approximation The square of the F-mode between the two can be used as the loss function, which can be expressed as:
[0062]
[0063] In the above formula, loss LR This is the loss function. The definition of this loss function can effectively ensure that the gradient descent speed is stable during training and that the convergence effect is good.
[0064] To drive network training, this embodiment employs a video dataset (referred to as the Relit dataset) where target objects are aligned across frames but ambient lighting varies. To improve the acquisition efficiency of this video dataset, as an optional implementation, step S201 involves fixing the camera and target object relatively on the same turntable, rotating the turntable, and capturing images of the target object through the camera. This ensures the target object's position remains constant, achieving target object alignment across frames while varying ambient lighting with the turntable's angle. This results in a video dataset where target objects are aligned across frames but ambient lighting varies, enabling rapid, large-scale acquisition of pixel-aligned video data, solving the mapping domain migration problem of synthetic data, and ensuring effectiveness during real-data training. Specifically, in this embodiment, the turntable is an electric turntable. Through a stepper motor or servo motor, the ambient lighting can be precisely controlled as the turntable's angle changes. Since the camera and object are placed on an electric turntable, the turntable rotates during shooting, causing the lighting effects on the object to constantly change. However, the camera and object remain relatively stationary, so the object's position in the video remains constant. Ultimately, the Relit dataset obtained in this embodiment contains 500 videos, including various indoor and outdoor lighting conditions and over 100 objects. Each video is 50 seconds long and can contribute 1500 object-aligned images. The Relit dataset contains a total of 750K images. All objects encompass various shapes, materials, and textures. The Relit dataset obtained in this embodiment can be used for many tasks, such as image relighting, segmentation, and inverse rendering.
[0065] After obtaining the dataset, we follow Figure 3Training is then performed. During training, 16 images are taken from the same video each time as a batch. Each batch contains images of the same object under different lighting conditions from the same viewpoint. For images with highlights, a specular decomposition network removes the highlights (this step is detailed later). First, we use a normal network (Normal-Net) to estimate the normal information of the object. Then, we design a lighting network (Light-Net) to estimate the lighting information corresponding to each image (represented by second-order spherical harmonic basis functions, requiring only the estimation of nine spherical harmonic coefficients). After obtaining the normal and lighting, we first use a differentiable rendering layer to render the lighting information of the object image. According to the image equation, the material information can be obtained by dividing the original image by the lighting. Since the material colors of images in the same batch should be consistent, a constraint is added here: the 16 material images are used as the 16th row of a matrix to construct a matrix R, requiring that the rank of this matrix be 1, that is, the 16 material images should be as similar as possible. This is the low-rank error. By leveraging this low-rank error, the normal network and lighting network can be trained unsupervised during training, ultimately yielding ideal results. When using this technique, the pre-trained networks are directly utilized. First, specular decomposition is performed to remove specular reflections. Then, the normal network and lighting network are used to estimate the normal, material color, and illumination coefficients of the objects in the image. To insert new scenes, the illumination coefficients of the new scene are pre-calculated. Given new material smoothness (spectral reflection parameters), the specular reflection rendering layer proposed in this invention is used for rapid rendering, ultimately resulting in a seamless augmented reality object insertion effect.
[0066] Currently, for scene images, it is often assumed that the materials are matte. Differentiable rendering currently only supports matte material rendering under ambient lighting because specular highlights in a scene are usually few. However, this is not the case for objects; metal or plastic objects often exhibit strong specular highlights. The lack of differentiable specular rendering technology (implemented as a deep neural network rendering process, with all calculations being differentiable, facilitating gradient transfer during network training) limits the speed and effectiveness of augmented reality applications. To address this issue, this technology proposes a specular rendering technique based on spherical harmonic lighting, which can quickly render specular highlights and supports lighting editing for non-matte materials. Specifically, the function expression for matte rendering based on the normal map and lighting map in step S103 of this embodiment is:
[0067]
[0068] In the above formula, I d (p) represents the matte reflection color at any pixel p, a p The color of the matte material at point p, l w Let L be the luminous intensity of point light source w in the illumination diagram. wLet n be the direction of the point light source w in the lighting diagram. p Let C be the normal vector at point p in the normal diagram, and L be the set of point light sources in the lighting diagram; l,m The spherical harmonic coefficients, Y is the parameter of the spherical harmonic basis function. l,m (θ·φ) are spherical harmonic basis functions, n p = (x,y,z), (θ,φ) are the spherical coordinates corresponding to (x,y,z); the above method can be found in method [2] (M. Janner, J. Wu, TD Kulkarni, I. Yildirim, and J. Tenenbaum, “Self-supervised intrinsic image decomposition,” in NIPS, 2017, pp. 5936–5946.);
[0069] The function expression for specular rendering of the inverse rendering model of the target object in step S104 is:
[0070]
[0071] In the above formula, H(p) is the specular reflection color at pixel p, and s p Let be the color of the specular material at point p, v be the viewing direction, and α be the specular material parameters. Let C be the spherical harmonic basis function of the high-light basis. A new illumination will result in spherical harmonic coefficients Cs. l,m The change will cause the specular material color s at point p to change when a new material is given. p As the specular material parameter α changes, the above formula can be used for specular rendering, thus obtaining a specular-rendered image based on a matte-rendered image.
[0072] In this embodiment, the spherical harmonic coefficient C l,m Including nine spherical harmonics, to accelerate the rendering process, we can pre-calculate the rendered image of the target object under the spherical harmonic basis functions of the nine specular bases. Only the spherical harmonic coefficient C needs to be used during rendering. l,m The rendering graphs of the spherical harmonic basis functions of the nine specular bases are weighted. The spherical harmonic basis functions of the nine specular bases in this embodiment are defined as follows:
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082] In the above formula, (x,y,z) represents the normal vector at point p in the normal diagram, and c0 to c5 are constant coefficients, specifically as follows in this embodiment:
[0083] c0=0.282095, c1=0.488603, c2=1.092548, c3=0.315392, c5=0.546274.
[0084] To achieve the material editing effect, this embodiment also proposes a method for re-rendering the specular reflection portion, improving the original spherical harmonic basis to... Figure 4 The highlight base shown, Figure 4 In the diagram, (a) represents the direction L of the point light source w. w (a) is a schematic diagram showing the relationship between the viewing direction v and its angle bisector b; (b) is a schematic diagram of the yz plane, showing the direction L of the point light source w in the yz plane. w The angle θ with the viewpoint v Lw =2θ b , where θ b Let the angle bisector b and the direction L of the point light source w be... w (c) is a schematic diagram of the xy plane, where the direction of the point light source w is L. w The viewing direction v and the angle bisector b coincide, and the direction of the point light source w is L. w Angle with the y-axis The angle between the angle bisector b and the y-axis Equal. (d) is a schematic diagram of the spherical harmonic basis functions used for specular rendering. Figure 4 In (d), the spherical harmonic basis functions of the nine specular bases are arranged in a pyramid shape (1 at the top, 3 in the middle, and 5 at the bottom). Because the spherical harmonic basis functions of the three specular bases on the left are structurally symmetrical with those of the three specular bases on the right, the spherical harmonic basis functions of the three specular bases on the left are... Figure 4 (d) is omitted from the drawing.
[0085] In this embodiment, the specular rendering part is implemented in PyTorch and has been effectively verified. Ultimately, this invention achieves the following: Figure 5The images shown depict the lighting and material editing effects of a single image. (a), (b), and (c) are the original images; (a-1) to (a-3) are images of the target object obtained using the method of this embodiment under new lighting and material conditions (a); (b-1) to (b-3) are images of the target object obtained using the method of this embodiment under new lighting and material conditions (b); and (c-1) to (c-3) are images of the target object obtained using the method of this embodiment under new lighting and material conditions (c). See also... Figure 5 As can be seen, after inserting the target object into the new scene using the method of this embodiment, the light and shadow appear to be harmonious and consistent.
[0086] Figure 6 The diagram shows a comparison between using the method of this embodiment and directly inserting the target object (without any lighting editing). In diagram (a), the first group of target objects (upper half) and the scene (lower half) are shown; (a-1) is the image obtained by inserting the scene using the method of this embodiment; and (a-2) is the image obtained by directly inserting the scene. In diagram (b), the second group of target objects (upper half) and the scene (lower half) are shown; (b-1) is the image obtained by inserting the scene using the method of this embodiment; and (b-2) is the image obtained by directly inserting the scene. In diagram (c), the third group of target objects (upper half) and the scene (lower half) are shown; (c-1) is the image obtained by inserting the scene using the method of this embodiment; and (c-2) is the image obtained by directly inserting the scene. See also... Figure 6 As can be seen, compared with directly inserting the target object (without any lighting editing), the augmented reality effect achieved by using the method of this embodiment is more realistic. Through the high-brightness reflection rendering layer, a fast rendering is performed, and finally a seamless augmented reality object insertion effect is obtained.
[0087] Furthermore, this embodiment also provides an object lighting editing system based on a single image, including a microprocessor and a memory interconnected thereto, wherein the microprocessor is programmed or configured to execute the aforementioned object lighting editing method based on a single image.
[0088] In addition, this embodiment also provides a computer-readable storage medium storing a computer program that is programmed or configured by a microprocessor to perform the aforementioned object lighting editing method based on a single image.
[0089] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code. This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a machine for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The functions specified in one or more boxes. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable apparatus for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0090] The above description is merely a preferred embodiment of the present invention. The scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be noted that for those skilled in the art, any improvements and modifications made without departing from the principles of the present invention should also be considered within the scope of protection of the present invention.
Claims
1. A method for object lighting editing based on a single image, characterized in that, include: S101, remove strong highlights from a single image of the target object using a trained specular decomposition network; S102, Input the image after removing strong highlights into the trained normal network to estimate the normal map of the target object in the image, and input the image after removing strong highlights into the trained lighting network to estimate the lighting map of the target object in the image. S103, perform matte rendering based on the normal map and lighting map to obtain the lighting map of the target object. Divide the original image of the single image by the lighting map to obtain the material map, thus obtaining an inverse rendering model composed of the normal map, lighting map, lighting map, and material map; the function expression for performing matte rendering based on the normal map and lighting map is: , In the above formula, For any pixel p Matte reflective color at that location for p The matte material color at the point Point light source in the lighting diagram w Light intensity, Let w be the direction of the point light source in the lighting diagram. In the normal diagram p Point normal direction, L It is the set of point light sources in the lighting diagram; The spherical harmonic coefficients, are the parameters of the spherical harmonic basis functions. For spherical harmonic basis functions, , for Corresponding spherical coordinates; S104, assign at least one of new lighting and materials to the inverse rendering model of the target object, and then perform specular rendering on the inverse rendering model of the target object to obtain an image of the target object under the new lighting and materials; the function expression for performing specular rendering on the inverse rendering model of the target object is: , In the above formula, For pixels p The color of the highlight reflection at that location. for p The color of the highlight material at the point. As the perspective direction, These are the parameters for the high-gloss material. is a spherical harmonic basis function of the hyperspectral basis.
2. The object lighting editing method based on a single image according to claim 1, characterized in that, Before removing strong highlights from the input single image through the trained highlight decomposition network in step S101, the saturation pixel ratio of the input single image is also detected. If the saturation pixel ratio is greater than a set threshold, the strong highlights of the input single image are removed through the trained highlight decomposition network; otherwise, the input single image is used as the image after removing the strong highlights, and the process proceeds to step S102.
3. The object lighting editing method based on a single image according to claim 1, characterized in that, The normal network in step S102 includes an encoder and a decoder connected in sequence. The encoder is used to encode the image after removing strong highlights to obtain a normal encoding vector, and the decoder is used to decode the encoding vector into a normal map of the objects in the image.
4. The object lighting editing method based on a single image according to claim 1, characterized in that, The illumination network in step S102 includes an encoder, a connection layer, a multilayer perceptron, and a spherical harmonic coefficient layer connected in sequence. The encoder is used to encode the image after removing strong highlights to extract the illumination encoding vector. The connection layer is used to connect the illumination encoding vector and the image after removing strong highlights and use it as the input of the multilayer perceptron to obtain illumination coefficient information. The spherical harmonic coefficient layer is used to estimate multiple spherical harmonic coefficients based on the second-order spherical harmonic basis function to serve as the illumination map of the object in the image.
5. The object lighting editing method based on a single image according to any one of claims 1 to 4, characterized in that, Before step S101, there are also steps to train the specular decomposition network, the normal network, and the illumination network: S201, Construct a video dataset with object alignment between frames but different ambient lighting; S202, constructing a low-rank error as the loss function, and performing unsupervised training on the specular decomposition network, normal network, and illumination network. The unsupervised training of the specular decomposition network, normal network, and illumination network is divided into two rounds. In the first round of training, the normal network is fixed to train the specular decomposition network and illumination network until the low-rank error converges. In the second round, the illumination network is fixed to train the specular decomposition network and normal network until the low-rank error converges.
6. The object lighting editing method based on a single image according to claim 5, characterized in that, Step S202, constructing the low-rank error as the loss function, includes: S301, extract the normal map and lighting map from the original image of the sample object, perform matte rendering to obtain the lighting map of the sample object, divide the original image of the sample object by the lighting map of the sample object to obtain the material map, and use multiple material maps from the same batch as a row of matrix R to construct matrix R. S302, perform singular value decomposition on matrix R and extract low-rank approximate matrices. ; S303, based on matrix R and low-rank approximation The square of the F-modulus between the two is used as the loss function.
7. The object lighting editing method based on a single image according to claim 6, characterized in that, In step S201, constructing a video dataset with aligned target objects but different ambient lighting in each frame refers to fixing the camera and the target object relatively on the same turntable, rotating the turntable and capturing images of the target object through the camera, so that the position of the target object captured by the camera remains unchanged, thereby achieving alignment of target objects in each frame, but with different ambient lighting depending on the angle of the turntable, thus obtaining a video dataset with aligned target objects but different ambient lighting in each frame.
8. A single-image-based object lighting editing system, comprising a microprocessor and a memory interconnected, characterized in that, The microprocessor is programmed or configured to perform the object lighting editing method based on a single image as described in any one of claims 1 to 7.
9. A computer-readable storage medium storing a computer program, characterized in that, The computer program is used to be programmed or configured by a microprocessor to perform the object lighting editing method based on a single image as described in any one of claims 1 to 7.