An ar image enhancement processing method and system for new media content creation

By using homography transformation and illumination reflection models to process AR images in AR devices, the problems of missing background textures and inconsistent lighting caused by occlusion of calibrated objects were solved, achieving high-quality integration of virtual content and real scenes, and improving the visual effects and user experience of new media content creation.

CN122243761APending Publication Date: 2026-06-19ZHENTANLIN CULTURE TECHNOLOGY (CHENGDU) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHENTANLIN CULTURE TECHNOLOGY (CHENGDU) CO LTD
Filing Date
2026-03-05
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing AR technology suffers from background texture loss due to occlusion by the marker object in new media content creation, and it is difficult to achieve precise geometric alignment and adaptive fusion of complex lighting under dynamic perspectives. This results in inconsistencies between virtual content and real-world lighting, affecting the integrity of the image and the audience's immersive experience.

Method used

By acquiring the original scene image collected by the AR device, the prior reference texture image is mapped to the current viewpoint using the homography transformation matrix. Then, a gradient domain constraint equation is constructed based on the illumination reflection model, and the illumination coefficient set is solved to perform global energy minimization interpolation processing to generate an enhanced scene image consistent with the ambient lighting.

Benefits of technology

It achieves seamless integration of light and shadow between new media virtual content and real scenes, enhances visual realism, eliminates the "sticky" feeling and incongruity caused by lighting mismatch, ensures that textures maintain geometric alignment and lighting consistency under dynamic viewing angles, and reduces technical threshold and time cost.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243761A_ABST
    Figure CN122243761A_ABST
Patent Text Reader

Abstract

This invention discloses an AR image enhancement processing method and system for new media content creation, comprising: acquiring a first original scene image captured by an AR acquisition device for a real scene; acquiring a first reference texture image corresponding to the real scene before the placement of a calibration object from a preset prior knowledge base; generating a first reconstructed texture image with real texture information; performing global energy minimization interpolation processing on a first illumination coefficient set to obtain a second illumination coefficient set covering the target area; and using the second illumination coefficient set to perform nonlinear correction on the target area pixels in the first reconstructed texture image to generate a second enhanced scene image consistent with the ambient lighting. This invention achieves seamless integration of light and shadow between new media virtual content and real scenes, significantly improving visual realism and effectively eliminating the sense of incongruity caused by lighting mismatch in traditional AR technology. It greatly reduces the technical threshold and time cost of new media content creation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of augmented reality and digital media processing technology, and more specifically, to an AR image enhancement processing method and system for new media content creation. Background Technology

[0002] With the booming development of the new media industry, augmented reality (AR) technology has become a core driving force in short video creation, live-streaming e-commerce, and interactive marketing. In the process of creating new media content, creators often need to overlay virtual marketing objects, special effects elements, or information tags onto real-world scenes to enhance visual impact and user interactivity. To achieve precise integration of virtual content with the real environment, specific markers (such as QR codes, AR codes, etc.) are typically placed in the real-world scene as anchor points for 3D registration, enabling AR devices to calculate camera poses in real time and lock virtual objects in designated positions.

[0003] However, the presence of calibrated objects can obscure the background texture of the real scene, resulting in noticeable "patches" or black blocks in the final image, severely damaging the integrity and aesthetics of the image. To address this issue, existing image inpainting techniques typically attempt to fill in the occluded areas using information from surrounding pixels. In static image processing, methods based on sample block matching or diffusion can achieve some results; however, in AR dynamic video streams, due to the continuous movement of the camera, the constant changes in perspective, and the complex fluctuations in real-world lighting, simple texture copying often leads to perspective errors, texture jitter, or inconsistencies with the surrounding lighting in the inpainted area, making the virtual content appear stiffly "pasted" onto the screen and lacking realism.

[0004] Furthermore, lighting distribution in real-world scenes is often non-uniform, exhibiting complexities such as shadows, highlights, and ambient light diffuse reflection. Traditional image fusion methods often employ global linear adjustment or simple edge feathering, making it difficult to accurately simulate the spatial variation characteristics of light on object surfaces. Although existing technologies, such as the Chinese invention patent CN111145341B, propose a single-light source-based virtual-real fusion lighting consistency rendering method, which adjusts the lighting of virtual objects by estimating the parameters of a single dominant light source, thus improving the virtual-real fusion effect to some extent, this method relies heavily on the single-light source assumption and struggles to handle non-uniform lighting scenarios commonly encountered in new media live streaming or short video shooting, such as multiple light source intersections, local occlusion shadows, and complex ambient light diffuse reflection. When mapping reference textures stored in a priori knowledge base (usually collected at different times or under different lighting conditions) to the current dynamic viewpoint, if the current scene's light intensity, color temperature, and shadow direction cannot be adaptively matched, the repaired area will exhibit significant brightness discontinuities or color deviations compared to the surrounding background. This discontinuity in light and shadow is particularly noticeable in high-definition video streams of new media, directly reducing the professionalism of the content and the viewer's immersive experience.

[0005] While existing technologies offer some basic image restoration solutions, they still suffer from the following major drawbacks: First, the lack of geometric consistency constraints for dynamic viewpoints leads to misalignment and jitter in restored textures during camera movement. Second, overly simplified lighting modeling fails to handle complex lighting fields with spatial variations, resulting in a noticeable "pasting" effect and incongruity between the restored area and the real environment. Third, interpolation algorithms are mostly local processing methods, prone to producing blocky effects or abrupt boundary changes in lighting transition areas, making it difficult to achieve a smooth, natural blend globally. Therefore, there is an urgent need for an AR image enhancement processing method that can simultaneously solve the problems of dynamic geometric alignment and complex lighting adaptation to meet the pressing demand for highly realistic virtual-real fusion in new media content creation. Summary of the Invention

[0006] The purpose of this invention is to overcome the shortcomings of the prior art and provide an AR image enhancement processing method and system for new media content creation, which solves the problems of background texture loss caused by occlusion of the calibration object in the existing AR technology, and the difficulty in achieving geometrically accurate alignment and complex lighting adaptive fusion of the repair area under dynamic viewpoint.

[0007] The objective of this invention is achieved through the following technical solution: An AR image enhancement processing method for new media content creation includes: Acquire a first original scene image captured by an AR acquisition device for a real scene, wherein the first original scene image contains a calibration object region for 3D registration and a new media content overlay region to be enhanced; From a preset prior knowledge base, obtain a first reference texture image corresponding to the real scene before the placement of the calibration object, wherein the first reference texture image records the texture information of the real scene that is not occluded by the calibration object; Based on the feature point coordinates of the calibration area, the homography transformation matrix between the first original scene image and the first reference texture image is calculated, and the first reference texture image is mapped to the current viewpoint using the homography transformation matrix to generate a first reconstructed texture image with real texture information. Extract the surrounding neighborhood pixels of the target region in the first reconstructed texture image, construct the gradient domain constraint equation based on the preset illumination reflection model, and solve to obtain the first illumination coefficient set describing the change of ambient light, wherein the first illumination coefficient set includes spatially varying gain field and bias field. The first set of illumination coefficients is subjected to global energy minimization interpolation to obtain a second set of illumination coefficients covering the target region. The global energy minimization interpolation refers to the process of globally smoothing the sparse coefficients using a radial basis function network. The target region pixels in the first reconstructed texture image are nonlinearly corrected using the second set of illumination coefficients to generate a second enhanced scene image consistent with the ambient lighting.

[0008] As a preferred embodiment, before acquiring the first original scene image captured by the AR acquisition device for the real scene, the method further includes: The AR acquisition device is controlled to acquire the initial reference image without the calibration object and the calibration positioning image after the calibration object is placed in a fixed pose. The initial reference image is stored as the first reference texture image, and the calibrated positioning image is used as the geometric constraint reference for subsequent solving of the homography transformation matrix.

[0009] As a preferred embodiment, the step of mapping the first reference texture image to the current viewpoint using the homography transformation matrix to generate a first reconstructed texture image with real texture information includes: Identify the coordinates of the four corner points of the marker in the first original scene image, and define them as the first feature point set; The coordinates of the four corner points of the corresponding calibration object in the calibration and positioning image are identified and defined as the second feature point set; Based on the first feature point set and the second feature point set, the homography transformation matrix describing the plane projection relationship is calculated; The homography transformation matrix is ​​applied to the pixel blocks in the first reference texture image corresponding to the occluded area of ​​the calibration object to generate a first reconstructed texture image that fills in the real texture.

[0010] As a preferred method, the gradient domain constraint equation is constructed based on a preset illumination reflection model, and the solution yields a first set of illumination coefficients describing changes in ambient light, including: The annular neighborhood extending beyond the target region in the first reconstructed texture image is selected as the guiding region. ; Construct a guiding vector field, and define the guiding region. The gradient field of the first reference texture image is defined as the guiding vector field, denoted as . ,Right now It is used to characterize the edge orientation and intensity variation of the real texture structure to be recovered; Based on the aforementioned guiding vector field, a Poisson fusion constraint equation is constructed, and its core calculation formula is as follows: ; in, This represents the Laplacian operator, used to describe the second-order differential properties of an image; This represents the fused image to be solved in coordinates. The pixel grayscale value at that location; This represents the divergence operator, which acts on a vector field; When solving the Poisson equation above, Dirichlet boundary conditions are applied to constrain the edges of the target region. The value is equal to the edge pixel value of the first original scene image to achieve adaptive fusion of the lighting environment; The above equations are discretized using the finite difference method to reconstruct the illumination distribution of the target region, and then the spatially varying first gain field and first bias field are obtained, which together constitute the first illumination coefficient set.

[0011] As a preferred method, the first illumination coefficient set is subjected to global energy minimization interpolation to obtain a second illumination coefficient set covering the target region, including: For the guiding area Discrete sampling is performed on the already solved first gain field and first bias field to construct a sparse constraint point set. ;

[0012] A global energy functional based on radial basis functions is constructed. By minimizing this functional, a continuous coefficient surface within the target region is obtained. The energy minimization formula is as follows: ; in, Represents the total energy functional; Let represent the continuous gain coefficient function to be determined; This represents the total number of constraint points; For the first The weight coefficients of each constraint point; For the first Known gain values ​​for each constraint point; Represents the second derivative (curvature) of the coefficient function; Represents an area element; λ is the smoothing regularization parameter, used to balance fitting error and surface smoothness; Using Gaussian radial basis functions The kernel function is used to solve the above variational problem, where... This represents the Euclidean distance from the target point to the constraint point. The shape parameter controls the width of the kernel function; The dense second gain field and second bias field covering the entire target area are obtained by solving, and together they constitute the second illumination coefficient set.

[0013] As a preferred method, the target region pixels in the first reconstructed texture image are non-linearly corrected using the second illumination coefficient set to generate a second enhanced scene image consistent with the ambient lighting, including: Traverse each pixel within the target area and read its corresponding second gain field value and second bias field value; Multiply the original color value of the pixel in the first reconstructed texture image by the second gain field value and add the second bias field value to obtain the corrected color value. Replace all the pixels in the original target area with the corrected color values ​​to generate a second enhanced scene image that eliminates lighting breakage and has continuous texture.

[0014] A second aspect of the present invention provides an AR image enhancement processing system for new media content creation, comprising: The first image acquisition unit is used to acquire a first original scene image captured by the AR acquisition device for a real scene, wherein the first original scene image includes a calibration object area for 3D registration and a new media content overlay area to be enhanced. The prior data retrieval unit is used to obtain a first reference texture image corresponding to the real scene before the calibration object is placed from a preset prior knowledge base, wherein the first reference texture image records the texture information of the real scene that is not occluded by the calibration object; The texture reconstruction unit is used to calculate the homography transformation matrix between the first original scene image and the first reference texture image based on the feature point coordinates of the calibration object region, and to use the homography transformation matrix to map the first reference texture image to the current viewpoint to generate a first reconstructed texture image with real texture information. The gradient domain solving unit is used to extract the surrounding neighbor pixels of the target region in the first reconstructed texture image, construct the gradient domain constraint equation based on the preset illumination reflection model, and solve to obtain the first illumination coefficient set describing the change of ambient light, wherein the first illumination coefficient set includes spatially varying gain field and bias field. A global interpolation unit is used to perform global energy minimization interpolation on the first illumination coefficient set to obtain a second illumination coefficient set covering the target region. The global energy minimization interpolation refers to the process of globally smoothing the sparse coefficients using a radial basis function network. The image enhancement output unit is used to perform non-linear correction on the target region pixels in the first reconstructed texture image using the second illumination coefficient set, thereby generating a second enhanced scene image consistent with the ambient lighting.

[0015] As a preferred embodiment, the texture reconstruction unit includes: The feature point recognition subunit is used to identify the corner coordinates of the calibrated object in the first original scene image to form a first feature point set, and to identify the corresponding corner coordinates in the calibration and positioning image to form a second feature point set; The matrix solving subunit is used to solve for the homography transformation matrix describing the plane projection relationship based on the first feature point set and the second feature point set. The texture mapping subunit is used to apply the homography transformation matrix to the pixel blocks in the first reference texture image corresponding to the occlusion area of ​​the calibration object, to generate a first reconstructed texture image that fills in the real texture.

[0016] As a preferred embodiment, the global interpolation unit includes: The constraint point set construction sub-unit is used to discretize the solved first gain field and first bias field in the guiding region to construct a sparse constraint point set; Energy functional construction subunits are used to construct global energy functionals based on radial basis functions (RBF), and a smoothing regularization parameter with area dimensions is set to balance fitting error and surface smoothness. The variational solution sub-unit is used to solve the variational problem using the Gaussian radial basis function as the kernel function, and obtains a dense second gain field and a second bias field covering the entire target region, which together constitute the second illumination coefficient set.

[0017] As a preferred approach, a new media interactive rendering unit is also included, used for: Receive new media interaction commands from external input, including gesture recognition signals or voice control signals; A virtual marketing object is generated at a specified location in the second enhanced scene image according to the instructions; The light source direction parameters from the second set of illumination coefficients are called to perform real-time shadow projection processing on the virtual marketing object in order to achieve the fusion of virtual and real light and shadow.

[0018] The present invention has at least the following beneficial effects: This invention achieves seamless integration of virtual content in new media with the light and shadow of real scenes, significantly enhancing visual realism. By constructing gradient domain constraint equations and solving for a set of illumination coefficients that includes spatially varying gain and bias fields, it can accurately capture complex and non-uniform illumination distributions in real scenes. This coefficient set is then used to non-linearly correct reconstructed textures, ensuring that the repaired area is completely consistent with the current shooting scene in terms of brightness, contrast, and ambient light background color, effectively eliminating the "sticky" and incongruity caused by illumination mismatches in traditional AR technology. Secondly, it overcomes the challenge of geometric consistency in occlusion removal and texture reconstruction under dynamic viewing angles. Using a homography transformation matrix, the prior reference texture is accurately mapped to the current viewing angle, ensuring that during camera movement, the real scene texture occluded by the calibrated object maintains strict geometric alignment with the real scene in terms of perspective and detail, avoiding image jitter or texture misalignment. Thirdly, it guarantees the smoothness and global optimality of coefficient interpolation under complex lighting conditions. A global energy minimization interpolation process based on radial basis function networks is employed to transform sparse illumination coefficients into a dense coefficient field. This mathematically guarantees the global smoothness of the illumination surface, effectively eliminating the blocky effects and abrupt boundary changes that are prone to occur in local interpolation, resulting in a natural and smooth illumination transition. Finally, an automated closed-loop process for prior restoration and illumination adaptation is constructed, enabling the real-time generation of virtual-real fusion videos with lighting effects on mobile devices without manual parameter tuning. This significantly reduces the technical threshold and time cost of new media content creation. Attached Figure Description

[0019] To further illustrate the specific implementation details of the present invention, the relevant drawings will be briefly described below. It should be particularly noted that these drawings are only used to illustrate some preferred embodiments of the present invention and are intended to aid in understanding the technical solutions; they are not intended to limit the scope of protection of the present invention. For those skilled in the art, based on the core concepts shown in these drawings, other equivalent or modified embodiments and corresponding drawings can be derived without creative effort, and these derived solutions should also be considered as an integral part of the technical system of the present invention.

[0020] Figure 1 This is a flowchart illustrating an AR image enhancement processing method for new media content creation, as an example. Figure 2 This is a schematic diagram of a terminal device structure. Detailed Implementation

[0021] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings, but the scope of protection of the present invention is not limited to the following description.

[0022] The embodiments of this disclosure will be described in detail below with reference to the accompanying drawings. However, it must be understood that the scope of protection of this disclosure is by no means limited to the specific embodiments shown herein; rather, it is intended to cover all variations, equivalent substitutions, and alternatives based on the concept of this disclosure. Furthermore, in the description of the drawings, the same reference numerals will always be used to refer to the same or similar constituent elements to ensure consistency and clarity of description.

[0023] In the various embodiments of this disclosure, the terms "first," "second," and similar expressions (such as "first," "second") are used only to distinguish different components or elements and are intended to identify their differences, without implying any order, hierarchy of importance, or limitation on the function of the components. For example, "first user equipment" and "second user equipment" simply represent two different equipment entities, both of which fall under the category of user equipment. Furthermore, this naming is interchangeable: without departing from the substance of this disclosure, "first component" can be called "second component," and vice versa; such changes in name do not affect the essential attributes of the component itself or its actual role in the technical solution.

[0024] like Figure 1 As shown, an AR image enhancement processing method includes: Acquire a first original scene image captured by an AR acquisition device for a real scene, wherein the first original scene image contains a calibration object region for 3D registration and a new media content overlay region to be enhanced; From a preset prior knowledge base, obtain a first reference texture image corresponding to the real scene before the placement of the calibration object, wherein the first reference texture image records the texture information of the real scene that is not occluded by the calibration object; Based on the feature point coordinates of the calibrated object region, the homography transformation matrix between the first original scene image and the first reference texture image is calculated. The homography transformation matrix is ​​then used to map the first reference texture image to the current viewpoint, generating a first reconstructed texture image with realistic texture information. For example, by identifying the pixel coordinates of known calibrated objects (such as the four corners of a piece of paper) in the scene and their corresponding standard world coordinates in the image, a system of linear equations is constructed. Then, the Direct Linear Transform (DLT) algorithm is used to solve the system of equations, and the homography transformation matrix describing the geometric relationship between the two is calculated. Finally, this matrix is ​​applied to perform perspective projection and resampling on the reference texture image, so that it is accurately mapped to the target area under the current viewpoint, thereby generating a first reconstructed texture image with a realistic sense of space.

[0025] Extract the surrounding neighborhood pixels of the target region in the first reconstructed texture image, construct the gradient domain constraint equation based on the preset illumination reflection model, and solve to obtain the first illumination coefficient set describing the change of ambient light, wherein the first illumination coefficient set includes spatially varying gain field and bias field. The first set of illumination coefficients is subjected to global energy minimization interpolation to obtain a second set of illumination coefficients covering the target region. The global energy minimization interpolation refers to the process of globally smoothing the sparse coefficients using a radial basis function network. The target region pixels in the first reconstructed texture image are nonlinearly corrected using the second set of illumination coefficients to generate a second enhanced scene image consistent with the ambient lighting.

[0026] This method aims to solve two major problems: image loss due to AR occlusion and the incompatibility between virtual content and ambient lighting. First, the system uses an AR device to capture real-time images containing the occluded object and intelligently retrieves the original texture map of the scene in the "no occluded object" state from a prior knowledge base. Next, by accurately identifying the feature points of the occluded object, the system calculates the geometric perspective relationship (homography matrix) between the real-time viewpoint and the historical map. The historical map is then "stretched" and fitted to the current viewpoint, mathematically "erasing" the occluded object and restoring the true background texture of the occluded area, completing the first step of geometric restoration. Subsequently, to address the inconsistency between the restored texture and the current ambient light, the system extracts pixel information from the edges of the restoration area and constructs constraint equations based on the physical laws of light reflection to deduce the spatial brightness gain and shadow bias distribution of the current ambient light. For this sparse lighting data, the method employs a radial basis function network for global energy minimization interpolation, effectively "bleeding" discrete lighting sampling points onto the entire target area, generating a continuous and natural global illumination coefficient map. Finally, this illumination coefficient map is used to perform non-linear brightness and contrast correction on the restored background texture, so that the originally "tiled" historical texture perfectly matches the current changes in brightness and color temperature atmosphere. The final output is a high-quality enhanced image that is not obstructed by calibration objects and is integrated with the real ambient lighting, realizing the fully automated processing from geometric structure restoration to light and shadow texture fusion.

[0027] In a preferred embodiment, before acquiring the first original scene image captured by the AR acquisition device for the real scene, the method further includes: The AR acquisition device is controlled to acquire the initial reference image without the calibration object and the calibration positioning image after the calibration object is placed in a fixed pose. The initial reference image is stored as the first reference texture image, and the calibrated positioning image is used as the geometric constraint reference for subsequent solving of the homography transformation matrix.

[0028] Before entering the real-time augmentation process, we establish a precise geometric and texture reference system for subsequent image restoration. Specifically, the system controls the AR acquisition device to collect scene data in two stages under strictly fixed poses: The first stage involves acquiring a clean "initial baseline image" without any markers. This image fully records the original texture details of the scene and will be directly archived as the "first reference texture image" for subsequent restoration, essentially taking an unobstructed "film" of the scene. The second stage involves acquiring a "calibration positioning image" after placing the markers in the same location. This image clarifies the exact position and shape of the markers in the image and will serve as the geometric constraint reference for subsequent calculations of viewpoint transformation relationships. Through this pre-operation of "taking the baseline image first and then placing the markers," the system successfully constructs paired data samples with and without markers. This not only ensures perfect pixel-level alignment between the reference texture and the real scene but also provides indispensable geometric anchor points for the subsequent accurate restoration of occluded areas using homography matrices, thereby eliminating restoration artifacts caused by background mismatch or positional deviations at the source.

[0029] In a preferred embodiment, the step of mapping the first reference texture image to the current viewpoint using the homography transformation matrix to generate a first reconstructed texture image with real texture information includes: Identify the coordinates of the four corner points of the marker in the first original scene image, and define them as the first feature point set; The coordinates of the four corner points of the corresponding calibration object in the calibration and positioning image are identified and defined as the second feature point set; Based on the first feature point set and the second feature point set, the homography transformation matrix describing the plane projection relationship is calculated; The homography transformation matrix is ​​applied to the pixel blocks in the first reference texture image corresponding to the occluded area of ​​the calibration object to generate a first reconstructed texture image that fills in the real texture.

[0030] The core of this step lies in utilizing the "four-point positioning" principle to "intelligently backfill" the background occluded by the calibration object through geometric projection transformation. The system first acts as a "detective," accurately identifying the coordinates of the four corner points of the calibration object in both the real-time acquired raw image and the pre-stored calibration positioning image, constructing feature point sets for the current viewpoint and the reference viewpoint, respectively. Then, based on these two sets of corresponding corner point coordinates, the algorithm calculates a homography transformation matrix describing planar perspective relationships. This matrix is ​​essentially a "geometric key," accurately describing the stretching, rotation, and perspective distortion rules from the reference viewpoint to the current real-time viewpoint. Finally, the system uses this "key" to perform a reverse mapping operation on the pre-stored unoccluded reference texture image, seamlessly stitching the corresponding pixel blocks in the reference image to the occluded area of ​​the calibration object in the current image after precise geometric transformation. This process is like taking a flat historical negative and perfectly "cropping" and "fitting" it according to the current shooting angle, so as to directly generate a reconstructed image that fills in the real background texture without relying on complex 3D reconstruction, thus achieving "seamless repair" of the occluded area.

[0031] In a preferred embodiment, the step of constructing gradient domain constraint equations based on a preset illumination reflection model and solving them to obtain a first set of illumination coefficients describing changes in ambient light includes: The annular neighborhood extending beyond the target region in the first reconstructed texture image is selected as the guiding region. ; Construct a guiding vector field, and define the guiding region. The gradient field of the first reference texture image is defined as the guiding vector field, denoted as . ,Right now It is used to characterize the edge orientation and intensity variation of the real texture structure to be recovered; Based on the aforementioned guiding vector field, a Poisson fusion constraint equation is constructed, and its core calculation formula is as follows: ; in, Let represent the Laplacian operator, used to describe the second-order differential properties of an image, with dimensions . ; This represents the fused image to be solved in coordinates. The pixel grayscale value at a given location is measured in gray levels (Gray Level, abbreviated as Gray Level). ); Let the divergence operator be an operator that acts on a vector field, and its dimensions be... ; The dimensions are (i.e., grayscale level per pixel); When solving the Poisson equation above, Dirichlet boundary conditions are applied to constrain the edges of the target region. The value is equal to the edge pixel value of the first original scene image to achieve adaptive fusion of the lighting environment; The above equations are discretized using the finite difference method to reconstruct the illumination distribution of the target region, and then the spatially varying first gain field and first bias field are obtained, which together constitute the first illumination coefficient set.

[0032] This step utilizes known real texture information at the edges of the repair area to naturally extend the texture structure into the occluded interior region. The system first identifies a ring-shaped area surrounding the target restoration area as the "guide zone," extracting gradient information (i.e., the trend of pixel brightness changes and edge direction) from the original reference image within this area, and constructing it as a vector field indicating the direction. Next, the algorithm constructs a constraint equation based on the Poisson fusion principle. Essentially, this requires that the texture change rate of the internal region to be restored must be continuous and consistent with the change rate of the edge guide zone, as if "continuing" the edge strokes internally. Simultaneously, it mandates that the boundary pixels of the restoration area must perfectly align with the background of the current real-time image, serving as a fixed boundary condition for the solution. By solving this equation numerically, the system can "guess" the most reasonable texture distribution internally based solely on edge clues, even without knowing the specific internal content, thus reconstructing an image that conforms to both the original texture structure and the current edge lighting. Finally, this reconstructed ideal image is compared and analyzed with the original reconstructed texture to deduce the spatial brightness gain (adjusting brightness) and bias (adjusting base brightness) required to achieve this perfect fusion effect. These two parameters together constitute the first set of illumination coefficients describing ambient light changes, providing precise local illumination basis for subsequent global smoothing processing.

[0033] In a preferred embodiment, a second illumination coefficient set covering the target region is obtained by performing global energy minimization interpolation on the first illumination coefficient set, including: For the guiding area Discrete sampling is performed on the already solved first gain field and first bias field to construct a sparse constraint point set.

[0034] A global energy functional based on radial basis functions (RBF) is constructed. By minimizing this functional, a continuous coefficient surface within the target region is obtained. The energy minimization formula is as follows: ; in, Represents the total energy functional, and its energy value; Let represent the continuous gain coefficient function to be determined; This represents the total number of constraint points, expressed in units of individual points. For the first The weight coefficients of each constraint point; For the first Known gain values ​​for each constraint point; The second derivative (curvature) of the coefficient function is expressed in units of . ; Represents the area element, with units of . This makes the integral term The overall unit is ; λ is the smoothing regularization parameter, used to balance fitting error and surface smoothness; its unit is 1 / 2 Ω. To offset the dimensions of the integral term and ensure the overall energy functional It is dimensionless; Using Gaussian radial basis functions The kernel function is used to solve the above variational problem, where... This represents the Euclidean distance from the target point to the constraint point, in pixels. The shape parameter controls the width of the kernel function, in units of... ; The dense second gain field and second bias field covering the entire target area are obtained by solving, and together they constitute the second illumination coefficient set.

[0035] First, the calculated gain and bias values ​​within the guiding region are treated as discrete "lighting anchor points." Although these points are accurate, they are sparsely distributed and cannot be directly used to modify the entire region. Next, the algorithm introduces a radial basis function (RBF) network to construct a global energy model. The core logic of this model is to find a balance between "faithfully fitting known anchor points" and "maintaining the overall smoothness of the surface." On the one hand, the generated lighting surface must be as close as possible to the known anchor point values. On the other hand, the second derivative (i.e., curvature) of the surface is minimized to punish drastic fluctuations and prevent unnatural abrupt changes or noise in the lighting. In this process, the Gaussian radial basis function acts as a "smoothing interpolator," gently spreading the influence of each anchor point to the surrounding space according to its distance. The closer the distance, the greater the influence, and the farther the distance, the smaller the influence, thus connecting discrete points into a whole like ink wash painting. Finally, by solving this energy minimization problem, the system generates a dense and continuous second illumination coefficient map (containing gain field and bias field). This map not only perfectly inherits the real illumination characteristics of the edges, but also ensures the delicate and natural illumination transition within the target area, providing a globally consistent illumination basis for the final nonlinear correction step.

[0036] In a preferred embodiment, the target region pixels in the first reconstructed texture image are non-linearly corrected using the second illumination coefficient set to generate a second enhanced scene image consistent with ambient lighting, including: Traverse each pixel within the target area and read its corresponding second gain field value and second bias field value; Multiply the original color value of the pixel in the first reconstructed texture image by the second gain field value and add the second bias field value to obtain the corrected color value. Replace all the pixels in the original target area with the corrected color values ​​to generate a second enhanced scene image that eliminates lighting breakage and has continuous texture.

[0037] In a preferred embodiment, to further eliminate texture quality differences caused by the asynchronous acquisition time of the reference texture image and the current video stream (e.g., the reference image may have been captured in a low-noise environment, while the current video stream has high ISO noise; or the reference image is clear, while the current frame has motion blur), a dynamic texture denoising and detail enhancement step based on spectral consistency constraints is included before generating the second enhanced scene image. This step aims to ensure that the microscopic texture statistics of the repaired area (such as noise power spectrum and edge sharpness) remain strictly consistent with the surrounding real scene, avoiding a patchy feel of "over-smoothing" or "noise anomaly" in the repaired area from the frequency domain dimension.

[0038] The specific implementation process is as follows: First, define the target area. and its extended annular guiding area For the first reconstructed texture image exist Perform a two-dimensional discrete Fourier transform (2D-DFT) on the pixel block within the array to calculate its average power spectral density function. This is used as the frequency domain benchmark for the current scene's texture quality. Secondly, the same frequency domain analysis is performed on the pixel blocks of the target region to be corrected to obtain the initial power spectrum. and the original frequency domain signal containing phase information .

[0039] Subsequently, a frequency domain correction operator is constructed to directly reshape the spectral energy distribution (amplitude) of the target region to match that of the guiding region, while fully preserving the phase information of the target region to maintain the geometric structure of the texture. The corrected frequency domain signal... Determined by the following formula: ; in, The corrected frequency domain signal (complex number) has a power spectrum that approximates the reference standard and retains the original phase, and is expressed in gray levels (G). The reference power spectral density for the guiding region, expressed in gray level squares per frequency unit. ; The current power spectral density of the target region, expressed in gray level squares per frequency unit. ; It is the frequency domain signal (complex number) of the original image of the target region, containing amplitude and phase information, and the unit is gray level (G); A small regularization amount is introduced to prevent the denominator from being zero and to suppress high-frequency noise amplification. In a preferred embodiment, Adaptive setting to the maximum value of the average power spectrum in the guiding region times, that is Its unit is consistent with the power spectral density. ; Frequency domain coordinates, unit is period per pixel .

[0040] Finally, perform the spatial domain pixel block replacement operation: 1. For the corrected frequency domain signal Performing the inverse Fourier transform (IDFT) yields the corrected texture image patch in the spatial domain. The image patch is a region related to the target area. A pixel matrix of identical size, where each element (pixel) represents a grayscale or color value after spectral correction. 2. Data Overlay: This corrected texture image block... All pixel data in the first reconstructed texture image are directly overwritten. The middle corresponds to the target area The original pixel data at the coordinate position. After the replacement is completed, the updated first reconstructed texture image then enters the subsequent lighting correction stage. This method effectively ensures that the repaired area statistically blends with the real environment in high-frequency details (such as fabric texture, wall particles, and grass details), significantly improving the texture realism of new media high-definition video streams.

[0041] In another embodiment, considering the rapid motion blur that may occur during AR device movement, the instantaneous jitter in the detection of calibrated object corners, and sudden changes in ambient lighting (such as suddenly turning on the flash or entering a shadow area), simple spatial interpolation may lead to flickering, ghosting, or lighting abrupt changes at the repaired edges. Therefore, this invention introduces an adaptive hybrid weighting scheme based on spatiotemporal confidence. By quantifying the geometric stability and lighting continuity of the current frame, the fusion weights of the "reconstructed texture" and the "original scene background" in the edge transition zone are dynamically adjusted to achieve spatiotemporal fusion.

[0042] Specifically, for any pixel at the edge of the target region Define its final output pixel value To reconstruct texture values Compared with the original scene observations The weighted fusion is calculated using the following formula: ; Among them, the spatiotemporal adaptive weight coefficient Geometric confidence and illumination confidence The linear weighted sum, obtained through activation function mapping, has the following computational model: ; Among them, geometric confidence The formula used to characterize the stability of object tracking in the current frame is as follows: ; Illumination confidence The formula used to characterize the reliability of the current frame illumination prediction is as follows: ; in, For pixels The final output grayscale value is in grayscale levels (G). The grayscale values ​​of the reconstructed texture after geometric correction, spectral correction, and illumination correction are expressed in gray levels (G). The original scene grayscale values ​​directly captured by the AR acquisition device, in units of grayscale levels (G); The weights are spatiotemporally adaptive hybrid weights, with values ​​ranging from [0, 1]. It is the Sigmoid activation function. , used to map linear combinations to the interval (0, 1); The geometric confidence level has a value range of (0, 1). The illumination confidence level, with a value range of (0, 1). This represents the Euclidean distance error between the calibrated corner point in the current frame and the predicted corner point in the previous frame, characterizing tracking stability, and is expressed in pixels. This is the feature reference length. In a preferred embodiment, Take 1% of the average side length of the calibration object in the image or the length of the image diagonal, in pixels. This is the measured illumination gradient vector of the current frame's guiding region, expressed in gray levels per pixel. ); This is the illumination gradient vector predicted based on a prior knowledge base, in gray levels per pixel. ); The square of the magnitude of the illumination gradient difference, in units of ; This is the tolerance threshold parameter for changes in illumination, in units of... , These are the balance coefficients representing the weighting of geometry and lighting. In actual new media live streaming scenarios, these two coefficients are dynamically set according to the scene mode: Mode 1: Static / Slow-Speed ​​Interview Scenario. When the AR device's movement speed is detected to be lower than... And when the scene lighting is stable, set At this point, a high weight is assigned to the geometric terms, making the system highly sensitive to minute corner fluctuations. Once instability is detected... (Decrease) Immediately reduce the reconstruction weight, prioritize edge sharpness, and prevent ghosting.

[0043] Mode 2: Dynamic / Complex Lighting Scenes (e.g., outdoor street photography, live stage broadcasts). When there are fast-moving objects or strobe light sources in the scene, set... At this point, significantly increase the weight of the lighting item. This forces the system to have a larger error in illumination prediction ( When the value decreases, it quickly returns to the original observation value to avoid producing false bright spots or dark block artifacts.

[0044] Adaptive acquisition method: In a better solution, It can be set to the image signal-to-noise ratio (SNR). logarithmic function , It can be set as the inverse proportional function of the scene optical flow variance, thereby achieving fully automatic parameter adaptation.

[0045] This embodiment constructs a dynamic trust assessment system: under high confidence conditions, when the tracking of the calibrated object is stable ( And the light prediction is accurate. When the weighted combination value is large, the spatiotemporal adaptive weights are obtained after Sigmoid mapping. When the value approaches 1, the formula degenerates into... The system has high confidence in reconstructing textures to maximize the repair effect; conversely, in a low-confidence state, when violent motion is detected causing tracking divergence ( ) or sudden changes in ambient light caused the prediction to fail. When this happens, the weighted combination value decreases. It automatically approaches 0, at which point the formula degenerates into... The system intelligently downgrades data and tends to retain the original observations. Essentially, this method performs dynamic alpha blending at the pixel level, effectively suppressing edge flicker, ghosting, and illumination abrupt artifacts common in AR video streams, significantly improving visual stability and professionalism in new media live streaming scenarios.

[0046] In a preferred embodiment, the method is also applicable to the dynamic generation of new media marketing content, including: After generating the second enhanced scene image, user gestures or voice commands are detected; If a preset new media interaction instruction is detected, a virtual marketing object is overlaid on the enhanced area of ​​the second enhanced scene image; The superimposed virtual marketing object is rendered with shadows so that its shadow direction is consistent with the light source direction inferred from the second set of illumination coefficients.

[0047] An AR image enhancement processing system for new media content creation includes: The first image acquisition unit is used to acquire a first original scene image captured by the AR acquisition device for a real scene, wherein the first original scene image includes a calibration object area for 3D registration and a new media content overlay area to be enhanced. The prior data retrieval unit is used to obtain a first reference texture image corresponding to the real scene before the calibration object is placed from a preset prior knowledge base, wherein the first reference texture image records the texture information of the real scene that is not occluded by the calibration object; The texture reconstruction unit is used to calculate the homography transformation matrix between the first original scene image and the first reference texture image based on the feature point coordinates of the calibration object region, and to use the homography transformation matrix to map the first reference texture image to the current viewpoint to generate a first reconstructed texture image with real texture information. The gradient domain solving unit is used to extract the surrounding neighbor pixels of the target region in the first reconstructed texture image, construct the gradient domain constraint equation based on the preset illumination reflection model, and solve to obtain the first illumination coefficient set describing the change of ambient light, wherein the first illumination coefficient set includes spatially varying gain field and bias field. A global interpolation unit is used to perform global energy minimization interpolation on the first illumination coefficient set to obtain a second illumination coefficient set covering the target region. The global energy minimization interpolation refers to the process of globally smoothing the sparse coefficients using a radial basis function network. The image enhancement output unit is used to perform non-linear correction on the target region pixels in the first reconstructed texture image using the second illumination coefficient set, thereby generating a second enhanced scene image consistent with the ambient lighting.

[0048] This system aims to seamlessly integrate virtual content into the real world, with a workflow that follows a rigorous logic from geometric reconstruction to lighting and shadow blending. First, the system simultaneously acquires a real-time scene image containing the calibration object and a preset unobstructed background map. Using a homography transformation matrix, the background map is precisely projected onto the current viewpoint, filling in areas occluded by the calibration object and completing the geometric restoration of the texture. Next, to address the challenge of lighting mismatch, the system extracts the real lighting gradient at the edges of the restoration area as a guiding clue, calculates local lighting adjustment coefficients, and uses radial basis functions to smoothly extend these sparse local data into a continuous lighting field covering the entire area, ensuring a natural and abrupt transition of light and shadow. Finally, based on this global lighting coefficient, the system performs non-linear pixel correction on the reconstructed image, dynamically adjusting the brightness and color of the target area to make it visually completely conform to the lighting conditions of the real environment, ultimately outputting an enhanced image with realistic textures, harmonious lighting and shadows, and an indistinguishable virtual reality.

[0049] In a preferred embodiment, the texture reconstruction unit includes: The feature point recognition subunit is used to identify the corner coordinates of the calibrated object in the first original scene image to form a first feature point set, and to identify the corresponding corner coordinates in the calibration and positioning image to form a second feature point set; The matrix solving subunit is used to solve for the homography transformation matrix describing the plane projection relationship based on the first feature point set and the second feature point set. The texture mapping subunit is used to apply the homography transformation matrix to the pixel blocks in the first reference texture image corresponding to the occlusion area of ​​the calibration object, to generate a first reconstructed texture image that fills in the real texture.

[0050] In a preferred embodiment, the global interpolation unit includes: The constraint point set construction sub-unit is used to discretize the solved first gain field and first bias field in the guiding region to construct a sparse constraint point set; Energy functional construction subunits are used to construct global energy functionals based on radial basis functions (RBF), and a smoothing regularization parameter with area dimensions is set to balance fitting error and surface smoothness. The variational solution sub-unit is used to solve the variational problem using the Gaussian radial basis function as the kernel function, and obtains a dense second gain field and a second bias field covering the entire target region, which together constitute the second illumination coefficient set.

[0051] In a preferred embodiment, the preset illumination reflection model is an improved Retinex theoretical model, and the gradient domain solution unit is specifically used for: The image is decomposed into a reflection component (an inherent property of the object) and an illumination component (ambient lighting). Assuming that the reflection component of an object remains constant over a short time series, only the illumination component changes with the movement of the AR device or changes in ambient light; By constraining the illumination component in the gradient domain, accurate estimation of complex non-uniform illumination can be achieved.

[0052] In a preferred embodiment, the radial basis function network in the global interpolation unit employs compactly supported radial basis functions, including: Define a support radius R such that when the Euclidean distance between a pixel and a constraint point is greater than R, the basis function value is zero; The sparse matrix technique is used to accelerate the solution process of energy functionals in order to meet the real-time requirements of new media live streaming scenarios; The support radius R is dynamically adjusted to accommodate the occlusion area of ​​calibration objects of different sizes.

[0053] In a preferred embodiment, a new media interactive rendering unit is further included, for: Receive new media interaction commands from external input, including gesture recognition signals or voice control signals; A virtual marketing object is generated at a specified location in the second enhanced scene image according to the instructions; The light source direction parameters from the second set of illumination coefficients are called to perform real-time shadow projection processing on the virtual marketing object in order to achieve the fusion of virtual and real light and shadow.

[0054] In a preferred embodiment, the gradient domain solving unit employs a multigrid method when discretizing the Poisson equation, including: Construct a hierarchical structure from fine mesh to coarse mesh; Quickly eliminate low-frequency errors on coarse grids and eliminate high-frequency errors on fine grids; By using V-loop or W-loop iterations, the solution complexity is reduced to linear level, ensuring efficient operation on mobile AR devices.

[0055] A terminal device, see Figure 2 ,include: The processor is used to execute the aforementioned AR image enhancement processing method for new media content creation; The memory is used to store the first original scene image, the first reference texture image, the first reconstructed texture image, the first lighting coefficient set, the second lighting coefficient set, and the second enhanced scene image; The monitor is used to display the enhanced second-enhanced scene image in real time for new media content creators to preview and edit; The communication interface is used to upload the generated second enhanced scene image to a new media social platform or a cloud rendering server.

[0056] In summary, this invention proposes an AR image enhancement processing method and system for new media content creation. Its core innovation lies in constructing a full-link solution from "geometric structure restoration" to "light and shadow texture fusion" and then to "spatiotemporal dynamic stabilization." This method cleverly utilizes unoccluded texture maps in a priori knowledge base to accurately restore the real background obscured by the tagged object through homography transformation, fundamentally solving the visual interference caused by AR registration markers. Furthermore, it introduces a combined algorithm based on gradient domain Poisson equation and radial basis function global interpolation to smoothly extend the illumination cues of local edges into a globally continuous illumination coefficient field, achieving seamless integration between the virtual restoration area and the real environment under non-uniform illumination. In addition, by dynamically calibrating the micro-statistical characteristics of the texture through spectral consistency constraints and establishing an adaptive hybrid weight mechanism based on spatiotemporal confidence, it effectively suppresses flicker and artifacts caused by device motion and sudden changes in illumination, significantly improving the visual stability of the video stream. With the addition of a real-time shadow rendering function for virtual objects that supports gesture and voice interaction, this solution not only significantly lowers the technical threshold for new media content production, but also ensures that the final output images meet broadcast-grade professional standards in terms of geometric accuracy, naturalness of lighting and shadow, and realism of interaction, providing solid technical support for the large-scale application of AR technology in scenarios such as live marketing and immersive storytelling.

[0057] It should be noted that the above description is merely a preferred embodiment of the present invention, intended to clearly illustrate the technical concept and core principles of the present invention, and not to limit the scope of protection of the present invention. For those skilled in the art, based on a full understanding of the basic inventive concept of the present invention, equivalent substitutions, logical adjustments, or functional extensions can be made to the technical details in the above embodiments according to the needs of specific application scenarios. Therefore, the scope of protection of the present invention should not be limited to the specific embodiments described herein. Any technical solution formed based on the core idea of ​​the present invention through equivalent transformations, improvements, optimizations, or adaptive modifications, as long as it does not depart from the spirit and essence of the present invention, should be considered to fall within the scope of protection defined by the claims of the present invention.

Claims

1. An AR image enhancement processing method for new media content creation, characterized in that, include: Acquire a first original scene image captured by an AR acquisition device for a real scene, wherein the first original scene image contains a calibration object region for 3D registration and a new media content overlay region to be enhanced; From a preset prior knowledge base, obtain a first reference texture image corresponding to the real scene before the placement of the calibration object, wherein the first reference texture image records the texture information of the real scene that is not occluded by the calibration object; Based on the feature point coordinates of the calibration area, the homography transformation matrix between the first original scene image and the first reference texture image is calculated, and the first reference texture image is mapped to the current viewpoint using the homography transformation matrix to generate a first reconstructed texture image with real texture information. Extract the surrounding neighborhood pixels of the target region in the first reconstructed texture image, construct the gradient domain constraint equation based on the preset illumination reflection model, and solve to obtain the first illumination coefficient set describing the change of ambient light, wherein the first illumination coefficient set includes spatially varying gain field and bias field. The first set of illumination coefficients is subjected to global energy minimization interpolation to obtain a second set of illumination coefficients covering the target region. The global energy minimization interpolation refers to the process of globally smoothing the sparse coefficients using a radial basis function network. The target region pixels in the first reconstructed texture image are nonlinearly corrected using the second set of illumination coefficients to generate a second enhanced scene image consistent with the ambient lighting.

2. The AR image enhancement processing method for new media content creation according to claim 1, characterized in that, Before acquiring the first original scene image captured by the AR acquisition device for the real scene, the process also includes: The AR acquisition device is controlled to acquire the initial reference image without the calibration object and the calibration positioning image after the calibration object is placed in a fixed pose. The initial reference image is stored as the first reference texture image, and the calibrated positioning image is used as the geometric constraint reference for subsequent solving of the homography transformation matrix.

3. The AR image enhancement processing method for new media content creation according to claim 1, characterized in that, The step of mapping the first reference texture image to the current viewpoint using the homography transformation matrix to generate a first reconstructed texture image with real texture information includes: Identify the coordinates of the four corner points of the marker in the first original scene image, and define them as the first feature point set; The coordinates of the four corner points of the corresponding calibration object in the calibration and positioning image are identified and defined as the second feature point set; Based on the first feature point set and the second feature point set, the homography transformation matrix describing the plane projection relationship is calculated; The homography transformation matrix is ​​applied to the pixel blocks in the first reference texture image corresponding to the occluded area of ​​the calibration object to generate a first reconstructed texture image that fills in the real texture.

4. The AR image enhancement processing method for new media content creation according to claim 1, characterized in that, The gradient domain constraint equations are constructed based on a preset illumination reflection model, and the first set of illumination coefficients describing changes in ambient light is obtained by solving the equations, including: The annular neighborhood extending beyond the target region in the first reconstructed texture image is selected as the guiding region. ; Construct a guiding vector field, and define the guiding region. The gradient field of the first reference texture image is defined as the guiding vector field, denoted as . ,Right now It is used to characterize the edge orientation and intensity variation of the real texture structure to be recovered; Based on the aforementioned guiding vector field, a Poisson fusion constraint equation is constructed, and its core calculation formula is as follows: ; in, This represents the Laplacian operator, used to describe the second-order differential properties of an image; This represents the fused image to be solved in coordinates. The pixel grayscale value at that location; This represents the divergence operator, which acts on a vector field; When solving the Poisson equation above, Dirichlet boundary conditions are applied to constrain the edges of the target region. The value is equal to the edge pixel value of the first original scene image to achieve adaptive fusion of the lighting environment; The above equations are discretized using the finite difference method to reconstruct the illumination distribution of the target region, and then the spatially varying first gain field and first bias field are obtained, which together constitute the first illumination coefficient set.

5. The AR image enhancement processing method for new media content creation according to claim 4, characterized in that, A second set of illumination coefficients covering the target region is obtained by performing global energy minimization interpolation on the first set of illumination coefficients, including: For the guiding area Discrete sampling is performed on the already solved first gain field and first bias field to construct a sparse constraint point set. ; A global energy functional based on radial basis functions is constructed. By minimizing this functional, a continuous coefficient surface within the target region is obtained. The energy minimization formula is as follows: ; in, Represents the total energy functional; Let represent the continuous gain coefficient function to be determined; This represents the total number of constraint points; For the first The weight coefficients of each constraint point; For the first Known gain values ​​for each constraint point; Represents the second derivative (curvature) of the coefficient function; Represents an area element; λ is the smoothing regularization parameter, used to balance fitting error and surface smoothness; Using Gaussian radial basis functions The kernel function is used to solve the above variational problem, where... This represents the Euclidean distance from the target point to the constraint point. The shape parameter controls the width of the kernel function; The dense second gain field and second bias field covering the entire target area are obtained by solving, and together they constitute the second illumination coefficient set.

6. The AR image enhancement processing method for new media content creation according to claim 5, characterized in that, Using a second set of illumination coefficients, non-linear corrections are applied to the target region pixels in the first reconstructed texture image to generate a second enhanced scene image consistent with ambient lighting, including: Traverse each pixel within the target area and read its corresponding second gain field value and second bias field value; Multiply the original color value of the pixel in the first reconstructed texture image by the second gain field value and add the second bias field value to obtain the corrected color value. Replace all the pixels in the original target area with the corrected color values ​​to generate a second enhanced scene image that eliminates lighting breakage and has continuous texture.

7. An AR image enhancement processing system for new media content creation, characterized in that, include: The first image acquisition unit is used to acquire a first original scene image captured by the AR acquisition device for a real scene, wherein the first original scene image includes a calibration object area for 3D registration and a new media content overlay area to be enhanced. The prior data retrieval unit is used to obtain a first reference texture image corresponding to the real scene before the calibration object is placed from a preset prior knowledge base, wherein the first reference texture image records the texture information of the real scene that is not occluded by the calibration object; The texture reconstruction unit is used to calculate the homography transformation matrix between the first original scene image and the first reference texture image based on the feature point coordinates of the calibration object region, and to use the homography transformation matrix to map the first reference texture image to the current viewpoint to generate a first reconstructed texture image with real texture information. The gradient domain solving unit is used to extract the surrounding neighbor pixels of the target region in the first reconstructed texture image, construct the gradient domain constraint equation based on the preset illumination reflection model, and solve to obtain the first illumination coefficient set describing the change of ambient light, wherein the first illumination coefficient set includes spatially varying gain field and bias field. A global interpolation unit is used to perform global energy minimization interpolation on the first illumination coefficient set to obtain a second illumination coefficient set covering the target region. The global energy minimization interpolation refers to the process of globally smoothing the sparse coefficients using a radial basis function network. The image enhancement output unit is used to perform non-linear correction on the target region pixels in the first reconstructed texture image using the second illumination coefficient set, thereby generating a second enhanced scene image consistent with the ambient lighting.

8. The AR image enhancement processing system for new media content creation according to claim 7, characterized in that, The texture reconstruction unit includes: The feature point recognition subunit is used to identify the corner coordinates of the calibrated object in the first original scene image to form a first feature point set, and to identify the corresponding corner coordinates in the calibration and positioning image to form a second feature point set; The matrix solving subunit is used to solve for the homography transformation matrix describing the plane projection relationship based on the first feature point set and the second feature point set. The texture mapping subunit is used to apply the homography transformation matrix to the pixel blocks in the first reference texture image corresponding to the occlusion area of ​​the calibration object, to generate a first reconstructed texture image that fills in the real texture.

9. An AR image enhancement processing system for new media content creation according to claim 7, characterized in that, The global interpolation unit includes: The constraint point set construction sub-unit is used to discretize the solved first gain field and first bias field in the guiding region to construct a sparse constraint point set; Energy functional construction subunits are used to construct global energy functionals based on radial basis functions, and a smoothing regularization parameter with area dimensions is set to balance fitting error and surface smoothness. The variational solution sub-unit is used to solve the variational problem using the Gaussian radial basis function as the kernel function, and obtains a dense second gain field and a second bias field covering the entire target region, which together constitute the second illumination coefficient set.

10. An AR image enhancement processing system for new media content creation according to claim 7, 8, or 9, characterized in that, It also includes a new media interactive rendering unit, used for: Receive new media interaction commands from external input, including gesture recognition signals or voice control signals; A virtual marketing object is generated at a specified location in the second enhanced scene image according to the instructions; The light source direction parameters from the second set of illumination coefficients are called to perform real-time shadow projection processing on the virtual marketing object in order to achieve the fusion of virtual and real light and shadow.