Method and device for automatically matching virtual model with picture space, storage medium and computer device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By marking quadrilaterals on a 2D image and calculating the vanishing point, the 3D spatial direction vector is recovered, solving the problem of fusing 3D virtual models with 2D images in a single image, and achieving efficient and accurate virtual model matching.

CN122244299APending Publication Date: 2026-06-19ZHONGXIN SOFTWARE (SHANGHAI) CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: ZHONGXIN SOFTWARE (SHANGHAI) CO LTD
Filing Date: 2026-02-26
Publication Date: 2026-06-19

Application Information

Patent Timeline

26 Feb 2026

Application

19 Jun 2026

Publication

CN122244299A

IPC: G06T17/00; G06T19/20; G06T7/73

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies struggle to quickly and easily infer spatial perspective relationships from a single 2D image without any camera intrinsics, extrinsics, multi-view images, or deep learning training, thus hindering the accurate fusion of 3D virtual models and 2D images.

⚗Method used

By marking four points on a 2D image to form a quadrilateral, calculating the horizontal and vertical vanishing points, recovering the 3D spatial direction vectors, determining the 3D projection plane, and adaptively adjusting the 3D virtual model to make it consistent with the perspective relationship of the 2D image.

🎯Benefits of technology

It achieves high-precision and high-efficiency automatic matching of 3D virtual models with single 2D images, with low computational overhead, fast response speed, and suitability for lightweight environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122244299A_ABST

Patent Text Reader

Abstract

This application discloses a method, apparatus, storage medium, and computer device for automatically matching a virtual model to an image space. The method includes: receiving a two-dimensional image to be processed; obtaining four marker points marked on the two-dimensional image; constructing a target quadrilateral based on the four marker points, and calculating the intersection points of the lines containing the first set of opposite sides to obtain the horizontal vanishing point, and calculating the intersection points of the lines containing the second set of opposite sides to obtain the vertical vanishing point; calculating the horizontal direction vector and the vertical direction vector according to the positions of the horizontal and vertical vanishing points in the two-dimensional image, determining the plane formed by the two vectors, and calculating the normal vector of the plane, and determining the three-dimensional projection plane based on the normal vector; aligning the bottom surface of the three-dimensional virtual model to the three-dimensional projection plane, and adaptively adjusting the three-dimensional virtual model so that the projection of the three-dimensional virtual model in the two-dimensional image is consistent with the perspective relationship of the two-dimensional image; and rendering a fused image of the three-dimensional virtual model and the two-dimensional image.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a method, apparatus, storage medium, and computer equipment for automatically matching virtual models to image spaces. Background Technology

[0002] With the increasing popularity of applications such as augmented reality, virtual try-on, virtual home design, and industrial simulation, the demand for seamlessly integrating 3D virtual models into 2D images is growing. These applications typically require accurately reconstructing the perspective structure of an image, ensuring that the spatial position, orientation, and scale of the 3D virtual model visually match the image background. Traditional perspective matching methods often rely on known camera intrinsic and extrinsic parameters, multi-view image sequences, or physical calibration objects. They calculate the geometric information of the scene using photogrammetry or 3D reconstruction techniques, thereby enabling the registration and overlay of 3D virtual models.

[0003] However, in real-world applications, many images are single, ordinary 2D photographs without any calibration data. This makes it impossible to obtain intrinsic parameters such as the focal length, principal point, and distortion coefficient of the shooting device, as well as extrinsic parameters such as shooting position and pose. Existing technologies, such as feature point matching-based 3D reconstruction methods, require multi-view images or continuous video frames, making them unsuitable for single-image scenarios. End-to-end methods based on deep learning require massive amounts of labeled data to train the model and have limited generalization ability to image content. While registration methods based on manually calibrated objects offer high accuracy, they require the placement of specific markers during shooting, resulting in cumbersome operations, poor user experience, and difficulty in widespread adoption in consumer applications. Therefore, how to quickly and easily infer spatial perspective relationships from a single 2D image without any prior perspective calibration information, and thereby accurately fuse a 3D virtual model with the 2D image, has become a core bottleneck restricting the development of related technologies. Summary of the Invention

[0004] In view of this, this application provides a method, apparatus, storage medium, and computer equipment for automatically matching virtual models to image space. It does not require any camera intrinsics, extrinsics, multi-view images, deep learning training, or physical calibration objects. Compared with existing technologies, it overcomes the limitation of multi-view methods in handling single images, avoids the dependence of deep learning methods on massive labeled data and generalization ability for specific scenes, and completely eliminates the cumbersome operation and equipment constraints caused by manual calibration objects. The entire calculation process has extremely low computational overhead, fast response speed, and unique and deterministic results. It can be easily deployed in lightweight environments such as mobile terminals and web terminals, providing a high-precision, high-efficiency, and universally applicable automatic matching path for matching 3D virtual models to single 2D image spaces.

[0005] According to one aspect of this application, a method for automatically matching a virtual model to an image space is provided, comprising: Receive the 2D image to be processed; Obtain four marker points marked on the two-dimensional image, wherein the four marker points form a quadrilateral, and the quadrilateral corresponds to the perspective distortion of a rectangle in the real world; Based on the four marked points, a target quadrilateral is constructed, and the intersection points of the lines containing the first set of opposite sides of the target quadrilateral are calculated to obtain the horizontal vanishing point, and the intersection points of the lines containing the second set of opposite sides of the target quadrilateral are calculated to obtain the vertical vanishing point. Based on the positions of the horizontal vanishing point and the vertical vanishing point in the two-dimensional image, the corresponding three-dimensional spatial horizontal direction vector and vertical direction vector are calculated respectively. The plane formed by the horizontal direction vector and the vertical direction vector is determined, and the normal vector of the plane is calculated. Based on the normal vector, the three-dimensional projection plane is determined. Align the bottom surface of the preset three-dimensional virtual model to be superimposed with the three-dimensional projection plane, and make adaptive adjustments to the preset three-dimensional virtual model to be superimposed so that the projection of the preset three-dimensional virtual model to be superimposed in the two-dimensional image is consistent with the perspective relationship of the two-dimensional image. The rendering process yields a fused image of the preset three-dimensional virtual model to be superimposed and the two-dimensional image.

[0006] According to another aspect of this application, an apparatus for automatically matching a virtual model to an image space is provided, comprising: The image receiving module is used to receive two-dimensional images to be processed. The marker point acquisition module is used to acquire four marker points marked on the two-dimensional image, wherein the four marker points form a quadrilateral, and the quadrilateral corresponds to the perspective distortion of a rectangle in the real world; The vanishing point calculation module is used to construct a target quadrilateral based on the four marked points, and to calculate the intersection points of the lines containing the first set of opposite sides of the target quadrilateral to obtain the horizontal vanishing point, and to calculate the intersection points of the lines containing the second set of opposite sides of the target quadrilateral to obtain the vertical vanishing point. The projection plane determination module is used to calculate the corresponding three-dimensional spatial horizontal direction vector and vertical direction vector respectively based on the positions of the horizontal vanishing point and the vertical vanishing point in the two-dimensional image, determine the plane formed by the horizontal direction vector and the vertical direction vector, calculate the normal vector of the plane, and determine the three-dimensional projection plane based on the normal vector. The virtual model adjustment module is used to align the bottom surface of the preset three-dimensional virtual model to be superimposed with the three-dimensional projection plane, and to make adaptive adjustments to the preset three-dimensional virtual model to be superimposed so that the projection of the preset three-dimensional virtual model to be superimposed in the two-dimensional image is consistent with the perspective relationship of the two-dimensional image. The rendering module is used to render the fused image of the preset three-dimensional virtual model to be superimposed and the two-dimensional image.

[0007] According to another aspect of this application, a storage medium is provided that stores a computer program thereon, which, when executed by a processor, implements the method for automatically matching the virtual model to the image space described above.

[0008] According to another aspect of this application, a computer device is provided, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, wherein the processor executes the program to implement the method of automatically matching the virtual model to the image space described above.

[0009] By employing the above technical solutions, this application provides a method, apparatus, storage medium, and computer device for automatically matching virtual models to image space. This method requires no camera intrinsics, extrinsics, multi-view images, deep learning training, or physical calibration objects. Compared to existing technologies, it overcomes the limitations of multi-view methods in handling single images, avoids the dependence of deep learning methods on massive amounts of labeled data and generalization capabilities for specific scenes, and completely eliminates the cumbersome operation and equipment constraints caused by manual calibration objects. The entire computation process has extremely low computational overhead, fast response speed, and unique and deterministic results. It can be easily deployed in lightweight environments such as mobile and web terminals, providing a high-precision, high-efficiency, and highly universal automatic matching path for matching 3D virtual models to single 2D image spaces.

[0010] The above description is merely an overview of the technical solution of this application. In order to better understand the technical means of this application and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of this application more obvious and understandable, specific embodiments of this application are given below. Attached Figure Description

[0011] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings: Figure 1 A flowchart illustrating a method for automatically matching a virtual model to an image space according to an embodiment of this application is shown. Figure 2 This illustration shows a schematic diagram of a two-dimensional image to be processed according to an embodiment of this application; Figure 3 A schematic diagram of a target quadrilateral provided in an embodiment of this application is shown; Figure 4 This illustration shows a schematic diagram of a horizontal vanishing point and a vertical vanishing point provided in an embodiment of this application; Figure 5A schematic diagram of a lateral direction vector and a longitudinal direction vector provided in an embodiment of this application is shown; Figure 6 A schematic diagram of a fused image provided in an embodiment of this application is shown; Figure 7 This illustration shows a schematic diagram of the structure of a device for automatically matching a virtual model to an image space according to an embodiment of this application; Figure 8 A schematic diagram of the device structure of a computer device provided in an embodiment of this application is shown. Detailed Implementation

[0012] The present application will be described in detail below with reference to the accompanying drawings and embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in the embodiments of the present application can be combined with each other.

[0013] This embodiment provides a method for automatically matching a virtual model to an image space, such as... Figure 1 As shown, the method includes: Step 101: Receive the two-dimensional image to be processed.

[0014] Step 102: Obtain four marker points marked on the two-dimensional image, wherein the four marker points form a quadrilateral, and the quadrilateral corresponds to the perspective distortion of a rectangle in the real world.

[0015] Step 103: Based on the four marked points, construct the target quadrilateral, and calculate the intersection points of the lines containing the first set of opposite sides of the target quadrilateral to obtain the lateral vanishing point, and calculate the intersection points of the lines containing the second set of opposite sides of the target quadrilateral to obtain the longitudinal vanishing point.

[0016] Step 104: Based on the positions of the horizontal vanishing point and the vertical vanishing point in the two-dimensional image, calculate the corresponding three-dimensional spatial horizontal direction vector and vertical direction vector respectively, determine the plane formed by the horizontal direction vector and the vertical direction vector, calculate the normal vector of the plane, and determine the three-dimensional projection plane based on the normal vector.

[0017] Step 105: Align the bottom surface of the preset three-dimensional virtual model to be superimposed with the three-dimensional projection plane, and make adaptive adjustments to the preset three-dimensional virtual model to be superimposed so that the projection of the preset three-dimensional virtual model to be superimposed in the two-dimensional image is consistent with the perspective relationship of the two-dimensional image.

[0018] Step 106: Render the fused image of the preset three-dimensional virtual model to be superimposed and the two-dimensional image.

[0019] This application provides a method for automatically matching a virtual model to an image space. First, it receives a two-dimensional image to be processed. This image is a standard single digital image that does not contain any camera intrinsic or extrinsic parameters or depth information; it is the original input object for this method. Figure 2 The image shown is a two-dimensional picture of an embodiment of this application.

[0020] Subsequently, four marker points manually marked by the user on the 2D image are obtained. These four points form an arbitrary quadrilateral according to the visual shape of a rectangle under perspective projection. This quadrilateral appears as a projection-distorted outline in the 2D image, which corresponds to the perspective projection of a standard rectangle in the real world, thus providing a geometric constraint based on artificial priors for subsequent perspective analysis.

[0021] Next, these four marker points can be connected sequentially to generate the target quadrilateral, and the intersection points of the lines containing the two pairs of opposite sides of the target quadrilateral on the two-dimensional image plane can be calculated. In a specific embodiment, the target quadrilateral can be as follows: Figure 3 As shown in the red box. In perspective projection, parallel lines in space converge at a single point on the 2D image plane. Therefore, the intersection of these two sets of opposite sides forms the vanishing points in two directions, called the horizontal vanishing point and the vertical vanishing point, respectively. This step uses 2D geometric operations to deconstruct the implicit perspective convergence relationship of the 2D image from the target quadrilateral constructed from the four user-marked points. For example... Figure 4 As shown, an embodiment of this application provides a horizontal vanishing point and a vertical vanishing point, wherein red dots represent horizontal vanishing points and green dots represent vertical vanishing points.

[0022] Next, based on the pixel positions of the horizontal and vertical vanishing points in the 2D image, the corresponding 3D spatial direction vectors, namely the horizontal and vertical direction vectors, are calculated. Specifically, the plane containing the 2D image is defined as a virtual imaging plane, and a virtual viewpoint is preset in its normal direction. 3D rays are emitted from this virtual viewpoint towards the two vanishing point positions; the unit direction vectors of these two rays represent the true orientation of the set of parallel lines in the 3D scene. These two direction vectors uniquely determine the spatial plane they form, allowing the calculation of the plane's normal vector, thus reconstructing the 3D projection plane corresponding to the rectangular plane in the real world. This step achieves the reverse calculation from 2D perspective cues to 3D spatial geometry. Figure 5 As shown, an embodiment of this application provides a horizontal direction vector and a vertical direction vector, wherein the red line segment represents the horizontal direction vector and the green line segment represents the vertical direction vector.

[0023] Then, the bottom surface of the preset 3D virtual model to be superimposed is aligned with the determined 3D projection plane, and adaptive adjustments such as rotation, translation, and scaling are made to the preset 3D virtual model to be superimposed. Specifically, this includes: rotating the preset 3D virtual model to be superimposed so that the normal vector of its bottom surface is parallel to the normal vector of the 3D projection plane; translating the preset 3D virtual model to be superimposed so that its bottom surface coincides with the 3D projection plane; and scaling adjustments are made according to visual rationality. The purpose of this series of operations is to ensure that the pose of the preset 3D virtual model to be superimposed in 3D space completely matches the perspective structure implicit in the image background, thereby ensuring that its projection in the 2D image is consistent with the original perspective relationship in the 2D image.

[0024] Finally, the pre-set 3D virtual model to be superimposed, after pose adjustment, is placed on the 3D projection plane, and rendered based on the perspective projection parameters determined by the previously calculated horizontal and vertical vanishing points, generating a fused image of the pre-set 3D virtual model and the original 2D image. The rendering process simulates the perspective relationship of a virtual camera matching the original shooting environment, so that the final output 2D image visually presents the effect that the pre-set 3D virtual model to be superimposed truly exists within the image scene. Figure 6 As shown, a fused image provided in an embodiment of this application is illustrated, wherein the red model is a preset three-dimensional virtual model to be superimposed.

[0025] By applying the technical solution of this embodiment, no camera intrinsic parameters, extrinsic parameters, multi-view images, deep learning training, or physical calibration objects are required. Compared with the prior art, it overcomes the limitation of multi-view methods in processing single images, avoids the dependence of deep learning methods on massive labeled data and generalization ability for specific scenes, and completely eliminates the cumbersome operation and equipment constraints caused by manual calibration objects. The entire calculation process has extremely low computational overhead, fast response speed, and unique and deterministic results. It can be easily deployed in lightweight environments such as mobile terminals and web terminals, providing a high-precision, high-efficiency, and highly universal automatic matching path for matching 3D virtual models with single 2D image spaces.

[0026] In this embodiment of the application, optionally, step 103, "calculating the intersection points of the lines containing the first set of opposite sides of the target quadrilateral to obtain the lateral vanishing point, and calculating the intersection points of the lines containing the second set of opposite sides of the target quadrilateral to obtain the longitudinal vanishing point," includes: identifying the first set of opposite sides and the second set of opposite sides constituting the target quadrilateral from the four marker points in the two-dimensional image based on the coordinate positions of the four marker points; determining the first set of opposite sides as a first horizontal line segment and a second horizontal line segment respectively, and obtaining the two endpoints of the first horizontal line segment and the two endpoints of the second horizontal line segment; determining the second set of opposite sides as a first vertical line segment and a second vertical line segment respectively, and obtaining the two endpoints of the first vertical line segment and the two endpoints of the second vertical line segment; calculating the first intersection point of the line containing the first horizontal line segment and the line containing the second horizontal line segment based on the two endpoints of the first horizontal line segment and the two endpoints of the second horizontal line segment, and using the first intersection point as the lateral vanishing point; calculating the second intersection point of the line containing the first vertical line segment and the line containing the second vertical line segment based on the two endpoints of the first vertical line segment and the two endpoints of the second vertical line segment, and using the second intersection point as the longitudinal vanishing point.

[0027] In this embodiment, firstly, based on the coordinates of four marker points marked by the user on the 2D image, two pairs of opposite sides constituting the target quadrilateral are determined from these four marker points. The four marker points are connected sequentially according to the order in which they were marked by the user to form a target quadrilateral, where the two non-adjacent sides constitute a pair of opposite sides. Since users typically mark the points in a clockwise or counterclockwise order from the four corner points of a rectangle, the first and second pairs of opposite sides can be automatically identified based on the connection order, corresponding to the two pairs of parallel sides of a rectangle in the real world. This step provides a clear geometric object for subsequent vanishing point calculations.

[0028] Subsequently, the first set of opposite sides identified are designated as the first horizontal line segment and the second horizontal line segment, respectively, and the coordinates of the endpoints of these two line segments are obtained. Here, "horizontal" does not refer to the line segments being horizontal in the two-dimensional image, but rather to the horizontal direction of the rectangles they correspond to in three-dimensional space. By obtaining the two endpoints of each horizontal line segment, the equation of the line containing that horizontal line segment can be uniquely determined, preparing for the calculation of the line intersection point.

[0029] Similarly, the second set of opposite sides is designated as the first and second vertical line segments, and the coordinates of their two endpoints are obtained. Here, the vertical direction corresponds to another set of parallel sides of a rectangle in the real world (e.g., vertical or depth direction). At this point, the information of the four line segments corresponding to the two sets of opposite sides of the target quadrilateral and their endpoints has been fully obtained.

[0030] Next, based on the coordinates of the endpoints of the first and second horizontal line segments, the mathematical equations of the lines containing these two line segments are calculated, and the intersection point of the two lines is determined. In perspective projection geometry, a set of parallel lines in space converges at a single point on the two-dimensional image plane; this point is called the vanishing point. Since these two horizontal line segments are precisely the projections of a set of parallel lines (the two horizontal sides of a rectangle) in the real world onto the two-dimensional image, their intersection point is the vanishing point of this set of parallel lines, called the horizontal vanishing point.

[0031] Similarly, based on the endpoint coordinates of the first and second vertical line segments, the intersection point of the lines containing the two vertical line segments is calculated, and this intersection point is taken as the vertical vanishing point. This intersection point corresponds to the convergence point of the other set of parallel sides (vertical sides) of a rectangle in the real world. Through the above two line intersection point calculations, the vanishing point information of two orthogonal directions in the two-dimensional image is completely extracted using only the coordinates of four marked points, laying a key foundation for subsequent three-dimensional spatial perspective reconstruction.

[0032] In this embodiment, the entire vanishing point calculation process is based entirely on solving the equations of straight lines and intersection points in two-dimensional analytic geometry. It requires no iterative optimization, numerical approximation, or complex matrix operations, resulting in extremely high computational efficiency and uniquely deterministic results. Users only need to perform a simple four-point marking operation, and the system can automatically complete the vanishing point localization, completely avoiding the dependence on camera parameters, multi-view images, or physical calibration objects found in traditional methods. This step is not only lightweight and stable but also independent of the two-dimensional image content, making it widely applicable to various scenarios involving the fusion of uncalibrated single images with three-dimensional virtual models. It truly achieves instant response and accurate output from user interaction to the fused image.

[0033] Optionally, in this embodiment, step 104, "calculating the corresponding three-dimensional spatial horizontal direction vector and vertical direction vector according to the positions of the horizontal vanishing point and the vertical vanishing point in the two-dimensional image," includes: defining the plane where the two-dimensional image is located as a virtual imaging plane, and pre-setting a virtual viewpoint in the normal direction of the virtual imaging plane; generating a first three-dimensional ray from the virtual viewpoint to the horizontal vanishing point according to the position of the horizontal vanishing point in the two-dimensional image, and determining the unit direction vector of the first three-dimensional ray as the horizontal direction vector; generating a second three-dimensional ray from the virtual viewpoint to the vertical vanishing point according to the position of the vertical vanishing point in the two-dimensional image, and determining the unit direction vector of the second three-dimensional ray as the vertical direction vector.

[0034] In this embodiment, the plane containing the 2D image is first defined as a virtual imaging plane. This allows the 2D image to be viewed as a picture taken by a virtual observer, thus establishing a spatial reference frame for subsequent 2D-to-3D backprojection. The virtual imaging plane itself does not carry any real camera parameters; it is merely an ideal plane that carries the image content, allowing each pixel in the 2D image to find its corresponding 2D coordinate position on this plane, providing a geometric starting point for extending rays from 2D points into 3D space.

[0035] Building upon this, a virtual viewpoint is pre-defined along the normal direction of the virtual imaging plane. The virtual viewpoint is a virtual 3D point representing the observer's eye position, located somewhere directly in front of (or behind) the virtual imaging plane. It is crucial to emphasize that the specific distance of this virtual viewpoint is irrelevant; whether it is pre-defined as 1 unit or 100 units from the virtual imaging plane, the ray direction originating from this virtual viewpoint and passing through the same point on the virtual imaging plane will be completely consistent. Therefore, this pre-defined location does not introduce any real camera intrinsic parameters, nor does it violate the core premise of uncalibrated design; it merely provides a legitimate 3D starting point for the rays.

[0036] Next, based on the pixel position of the vanishing point in the 2D image, the 2D coordinates of that point are located on the virtual imaging plane. Then, a first 3D ray is generated in 3D space, originating from the virtual viewpoint, passing through the vanishing point, and extending outwards. Since the vanishing point in perspective geometry is essentially the convergence point of a set of parallel lines in space, the direction of this ray precisely represents the actual orientation of those parallel lines in 3D space. The direction of this first 3D ray is normalized to obtain a unit direction vector, which is the 3D horizontal direction vector. The entire process is based entirely on the geometric relationship between the virtual viewpoint and the virtual imaging plane, relying solely on the pixel coordinate information provided by the 2D image itself, without requiring any intrinsic or extrinsic parameters from a real camera. It should be noted that for any fixed point on the virtual imaging plane, rays pointing to that point from any different position on the normal of the virtual imaging plane will all obtain the same unit direction vector after normalization. Therefore, the specific value of the virtual viewpoint distance does not affect the calculation results of the horizontal and vertical direction vectors.

[0037] Similarly, the same processing method is used for the longitudinal vanishing point to obtain the longitudinal direction vector in three-dimensional space. Since the transverse and longitudinal vanishing points correspond to the convergence points of two sets of orthogonal parallel sides of a rectangle in the real world, the directions of these two rays in three-dimensional space naturally exhibit an orthogonal or approximately orthogonal relationship, providing an accurate directional reference for the subsequent construction of the three-dimensional projection plane.

[0038] This application's embodiments, without relying on any real camera parameters, can convert vanishing pixel coordinates in a 2D image into ray direction vectors in 3D space using only a pure geometric assumption (virtual imaging plane + virtual viewpoint). This conversion process is analytical and deterministic, requiring no iterative optimization or numerical approximation, resulting in extremely high computational efficiency. Furthermore, since the horizontal or vertical direction vectors are independent of the distance from the virtual viewpoint to the virtual imaging plane, this method completely avoids the inherent limitation of being unable to measure distances from a single uncalibrated image, and can fully recover the spatial perspective structure of a scene using only direction information.

[0039] In this embodiment, optionally, the virtual viewpoint corresponds to the optical center position of a preset virtual camera, and the virtual imaging plane corresponds to the imaging plane of the preset virtual camera; the step of "generating a first three-dimensional ray from the virtual viewpoint to the lateral vanishing point" includes: converting the position of the lateral vanishing point in the two-dimensional image into screen coordinates in the screen coordinate system of the preset virtual camera; generating a three-dimensional ray that originates from the optical center of the preset virtual camera and passes through the screen coordinates through a back projection operation from the screen point of the preset virtual camera to three-dimensional space; and using the direction vector of the three-dimensional ray as the unit direction vector of the first three-dimensional ray.

[0040] In this embodiment, the aforementioned virtual viewpoint can be the optical center position of a preset virtual camera, and the aforementioned virtual imaging plane can be the imaging plane of the preset virtual camera. Here, the preset virtual camera does not refer to a real shooting device, but a standard mathematical model used in 3D graphics to simulate perspective projection. The optical center is the optical center of the preset virtual camera, where all imaging rays converge; the imaging plane is the surface on which a point in 3D space is projected onto a 2D image.

[0041] Based on this, firstly, the position of the vanishing point in the 2D image is converted to screen coordinates in the preset virtual camera's screen coordinate system. The screen coordinate system is a standardized coordinate system used in the preset virtual camera to describe 2D positions on the imaging plane, typically in pixels, with the origin located at the top left or bottom left corner of the image, and its range matching the image resolution. Essentially, this conversion maps the vanishing point position on the 2D image to the corresponding pixel position on the preset virtual camera's imaging plane, allowing the vanishing point coordinates to be recognized and processed by the preset virtual camera's built-in geometric interface. This conversion process relies solely on the pixel size of the 2D image itself and the preset screen resolution parameters, requiring no calibration information from a real camera.

[0042] Subsequently, through a back-projection operation from the screen point of the preset virtual camera to three-dimensional space, a three-dimensional ray is generated, originating from the optical center of the preset virtual camera, passing through the screen coordinates corresponding to the transverse vanishing point, and continuing to extend outward. The back-projection operation is the standard inverse transformation process of the virtual camera model: given a two-dimensional point on the imaging plane, it is mapped back to three-dimensional space using the intrinsic parameter matrix of the preset virtual camera (such as focal length and principal point coordinates), resulting in a ray originating from the optical center, i.e., the first three-dimensional ray. It is important to emphasize that the preset virtual camera intrinsic parameters here are not obtained from the actual shooting device, but rather use default preset values (e.g., setting the focal length to the image width, and the principal point to the image center).

[0043] Finally, the direction vector of the first three-dimensional ray is normalized to obtain the unit direction vector of the first three-dimensional ray. The calculation process for the longitudinal direction vector is exactly the same, except that the input is replaced with the longitudinal vanishing point.

[0044] The embodiments of this application retain the uncalibrated characteristic of not requiring any real camera parameters. All virtual camera intrinsic parameters can use default preset values bound to the image size, and the lateral direction vector can be determined without introducing any external prior information.

[0045] In this embodiment of the application, optionally, the step of "converting the position of the horizontal vanishing point in the two-dimensional image into screen coordinates in the screen coordinate system of a preset virtual camera" includes: obtaining the canvas size of the two-dimensional image in the interface display area; converting the interface coordinates of the horizontal vanishing point in the interface display area into relative coordinates with the center of the canvas as the origin; and determining the screen coordinates of the horizontal vanishing point in the screen coordinate system of the preset virtual camera based on the relative coordinates, the canvas size, and the preset screen resolution.

[0046] In this embodiment, firstly, the canvas size of the 2D image in the interface display area is obtained. Here, the interface display area refers to the visual container on the user interface used to present the 2D image, such as the image display panel in an application or the canvas control in a webpage; the canvas size refers to the pixel width and pixel height of the 2D image within that container. In a specific embodiment, the 2D image can be displayed at its original pixel size, so the canvas size is the original pixel size of the 2D image. The purpose of this step is to establish a clear reference system for subsequent coordinate transformation, using the actual pixel coordinates of the 2D image as the mapping basis, thereby ensuring that the vanishing point position can be correctly transferred from the user-interacting interface space to the subsequent virtual camera screen coordinate system.

[0047] Next, the calculated pixel coordinates of the horizontal vanishing point in the 2D image are converted into relative coordinates with the canvas center as the origin. The horizontal vanishing point is a 2D point obtained by calculating the intersection of the lines containing the first pair of opposite sides of the target quadrilateral. The original pixel coordinates of this vanishing point are measured with the top-left corner of the canvas as the origin; that is, the coordinate values represent the pixel distance of this point from the left and top edges of the canvas. However, this absolute coordinate, which depends on the top-left corner of the canvas, changes with the canvas size: when the same 2D image is displayed on a larger canvas, the pixel coordinate values of the horizontal vanishing point will increase proportionally. To eliminate the interference of canvas size on coordinate values, the position of the horizontal vanishing point can be re-expressed as an offset relative to the canvas center. Specifically, the pixel coordinates of the canvas center point (half the width and half the height of the canvas) are first calculated, and then the original pixel coordinates of the horizontal vanishing point are subtracted from the center point coordinates to obtain the horizontal and vertical offsets. For example, if the canvas is 800 pixels wide and 600 pixels high, with its center point at (400, 300) and the original coordinates of the vanishing point at (600, 400), then the offset is (200, 100), indicating that the vanishing point is located 200 pixels to the right and 100 pixels below the center of the canvas. This transformation completely eliminates the influence of the absolute size of the canvas. Therefore, regardless of whether the canvas is enlarged or reduced to any size, as long as the relative position of the vanishing point in the image remains unchanged, its offset relative to the center of the canvas (after normalization) remains constant. Thus, the position of the vanishing point depends only on its relative distribution within the canvas, laying a scale-independent foundation for the subsequent lossless mapping of these offsets to the coordinate system of a virtual camera screen of any resolution: regardless of whether the preset virtual camera screen resolution is 1920×1080 or any other specification, simply scaling the relative offset proportionally will accurately restore the corresponding position of the vanishing point on the virtual camera's imaging plane.

[0048] Finally, based on the relative coordinates, canvas size, and preset virtual camera screen resolution obtained earlier, the precise screen coordinates of the horizontal vanishing point in the virtual camera screen coordinate system can be calculated. Here, the preset screen resolution refers to the pixel range covered by the virtual camera's imaging plane, i.e., the width and height of the final rendered image (for example, the monitor resolution is 1920×1080 for full-screen rendering, and a fixed-size texture for off-screen rendering). The core task of this step is to proportionally map the offset of the horizontal vanishing point relative to the center on the canvas to the virtual camera screen. In a specific embodiment, the center points of the canvas and the virtual camera screen correspond to each other. The proportion by which the horizontal vanishing point deviates from the center on the canvas should also deviate from the center on the virtual camera screen by the same proportion. Therefore, it is only necessary to divide the horizontal offset in the relative coordinates by half the width of the canvas to obtain a proportional value, then multiply this proportional value by half the width of the virtual camera screen, and add the pixel coordinates of the screen center point to obtain the X coordinate of the vanishing point on the virtual camera screen. The calculation method for the vertical direction is exactly the same. In this way, no matter how large the canvas is or what the screen resolution of the virtual camera is, as long as the relative position of the horizontal vanishing point inside the canvas remains unchanged, it can be mapped to the same relative position on the virtual camera screen.

[0049] The final calculated screen coordinates are the two-dimensional input positions required by the preset virtual camera to perform the screen point to 3D space back projection operation. Based on these coordinates, the preset virtual camera can emit a 3D ray from its optical center passing through that point. The entire transformation process does not introduce any real camera parameters; it relies solely on two completely controllable known quantities: the canvas size and the preset screen resolution. It should be noted that the screen coordinates of the vertical vanishing point can be determined in the same way.

[0050] This application's embodiments introduce canvas size and preset screen resolution as intermediate parameters, enabling it to adapt to display containers of different sizes and rendering targets of different resolutions without hardcoding any absolute pixel values. Furthermore, the entire conversion process involves only simple linear operations and coordinate translations, resulting in extremely low computational overhead. It is also independent of image content and can run stably in various front-end interactive environments.

[0051] Optionally, in this embodiment, step 105 includes: determining the target orientation of the bottom surface of the preset three-dimensional virtual model to be superimposed in three-dimensional space based on the normal vector of the three-dimensional projection plane; rotating the preset three-dimensional virtual model to be superimposed so that the normal vector of the bottom surface of the preset three-dimensional virtual model to be superimposed is parallel to the normal vector of the three-dimensional projection plane; translating the preset three-dimensional virtual model to be superimposed so that the bottom surface of the preset three-dimensional virtual model to be superimposed coincides with the three-dimensional projection plane; scaling the preset three-dimensional virtual model to be superimposed so that the projection size of the preset three-dimensional virtual model to be superimposed in the two-dimensional image matches the two-dimensional image; placing the preset three-dimensional virtual model to be superimposed after rotation, translation and scaling on the three-dimensional projection plane, and rendering it based on the perspective constraints determined by the horizontal vanishing point and the vertical vanishing point to generate a projection of the preset three-dimensional virtual model to be superimposed with correct perspective in the two-dimensional image.

[0052] In this embodiment, firstly, based on the calculated normal vector of the 3D projection plane, the target orientation of the bottom surface of the preset 3D virtual model to be superimposed is determined in 3D space. The normal vector of the 3D projection plane is a direction vector perpendicular to the plane, representing the orientation of a rectangular plane in space in the real world; while the bottom surface refers to the preset bottom reference plane of the preset 3D virtual model to be superimposed, which can be the lowest plane in the model's local coordinate system or a plane in contact with the ground. By establishing a correspondence between the spatial orientation of the bottom surface and the direction indicated by the normal vector of the 3D projection plane, a clear target direction can be set for subsequent rotation adjustments, ensuring that the preset 3D virtual model to be superimposed can stand correctly on the 3D projection plane in space.

[0053] Subsequently, the preset 3D virtual model to be superimposed is rotated and adjusted so that the normal vector of its bottom surface is parallel to the normal vector of the 3D projection plane. The bottom surface normal vector is a unit vector perpendicular to the bottom surface, and its direction determines the tilt of the bottom of the preset 3D virtual model to be superimposed. The rotation adjustment is performed through a 3D rotation transformation, rotating the local coordinate system of the preset 3D virtual model to be superimposed around an appropriate axis until the bottom surface normal vector is parallel to the normal vector of the 3D projection plane, so that they point in the same or completely opposite directions. This operation fundamentally eliminates the pitch or tilt deviation of the preset 3D virtual model to be superimposed caused by the perspective angle, and makes the spatial posture of the bottom surface of the preset 3D virtual model to be superimposed strictly aligned with the normal direction of the rectangular plane in the 2D image, laying the geometric foundation for subsequent position fitting.

[0054] After rotation adjustment, the preset 3D virtual model to be superimposed is translated so that its bottom surface coincides with the 3D projection plane. Translation adjustment is essentially a 3D translation transformation, moving the preset 3D virtual model's position in space so that a point on its bottom surface (which could be the lowest point or the center of the bottom surface) contacts and perfectly aligns with the 3D projection plane. Coincidence here specifically means that the bottom surface and the 3D projection plane intersect at the same geometric location; that is, the bottom surface is exactly on and coplanar with the 3D projection plane. This step places the preset 3D virtual model from virtual space to the location of the real-world rectangular plane reconstructed from the perspective of the 2D image, achieving precise spatial matching between the model and the scene.

[0055] Next, the preset 3D virtual model to be overlaid is scaled and adjusted so that the projected size of the model in the 2D image visually matches the 2D image. This scaling adjustment can be achieved through automatic estimation based on the model's default size and the pixel aspect ratio of the target quadrilateral, or through real-time adjustment via user interaction; this solution does not impose specific limitations on either method.

[0056] In one specific embodiment, scaling can be achieved in the following way: (1) Scaling based on the default size of the model. The preset 3D virtual model to be superimposed has a preset default initial size, which is represented by the length of a virtual unit in 3D space. After placing the preset 3D virtual model to be superimposed on the 3D projection plane, perspective projection rendering can be performed directly using this default initial size, so that a visually reasonable projection size can be obtained in the 2D image. Experiments show that for most application scenarios such as virtual try-on and virtual equipment placement, the default size of the model can achieve an acceptable visual effect.

[0057] (2) Scaling based on the visual proportion of the target quadrilateral. As another implementation method, the scaling factor can be determined based on the proportional relationship between the pixel size of the target quadrilateral in the 2D image and the preset projection size of the bottom surface of the preset 3D virtual model to be superimposed on the 3D projection plane. Specifically, firstly, the pixel value H_q occupied by the height (or width) of the target quadrilateral in the 2D image is calculated; secondly, the bottom surface of the preset 3D virtual model to be superimposed is projected onto the 3D projection plane at the default size, and its initial projection height (or width) H_m in the 2D image is calculated; then, the scaling factor s = H_q / H_m is calculated; finally, the preset 3D virtual model to be superimposed is scaled proportionally according to s. This method can make the projection size of the preset 3D virtual model to be superimposed visually more directly proportional to the quadrilateral area marked by the user, which is especially suitable for scenarios where the preset 3D virtual model to be superimposed needs to accurately fill a specific screen area.

[0058] Finally, the pre-set 3D virtual model to be superimposed, after rotation, translation, and scaling adjustments, is placed on the 3D projection plane. Rendering is then performed based on the perspective constraints determined by the horizontal and vertical vanishing points, generating a projection of the pre-set 3D virtual model onto the 2D image with correct perspective. These perspective constraints are essentially the virtual camera pose and projection parameters defined by the previously calculated horizontal and vertical vanishing points, which together determine the unique mapping relationship from the 3D scene to the 2D image. The rendering process simulates a perspective imaging process completely consistent with the original shooting environment, ensuring that the outline, size, and occlusion relationships of the pre-set 3D virtual model in the final fused image strictly match the perspective rules of the background image, thus achieving seamless visual fusion between the pre-set 3D virtual model and the real scene.

[0059] Optionally, in this embodiment, the four marker points are generated in response to a user's dragging or clicking operation on the two-dimensional image.

[0060] In this embodiment, the four marker points can be generated in response to a user's dragging or clicking actions on the 2D image. Dragging refers to a user's continuous interactive behavior of pressing and moving the cursor on the image interface using a mouse, stylus, or finger to drag out a quadrilateral area or sequentially locate the four corner points; clicking refers to a user clicking sequentially at four different locations on the 2D image, generating a marker point each time. Both of these operation methods belong to the most common and intuitive point-and-click and box-selection interaction paradigms in graphical user interfaces.

[0061] The embodiments of this application can complete data input simply by clicking and dragging with a mouse or touch screen. This interaction method does not rely on any image content recognition algorithm and can work stably for images in any scene, under any lighting, and at any shooting angle, with strong generalization ability and robustness.

[0062] Furthermore, as Figure 1 In terms of specific implementation, this application provides a device for automatically matching a virtual model to an image space, such as... Figure 7 As shown, the device includes: The image receiving module is used to receive two-dimensional images to be processed. The marker point acquisition module is used to acquire four marker points marked on the two-dimensional image, wherein the four marker points form a quadrilateral, and the quadrilateral corresponds to the perspective distortion of a rectangle in the real world; The vanishing point calculation module is used to construct a target quadrilateral based on the four marked points, and to calculate the intersection points of the lines containing the first set of opposite sides of the target quadrilateral to obtain the horizontal vanishing point, and to calculate the intersection points of the lines containing the second set of opposite sides of the target quadrilateral to obtain the vertical vanishing point. The projection plane determination module is used to calculate the corresponding three-dimensional spatial horizontal direction vector and vertical direction vector respectively based on the positions of the horizontal vanishing point and the vertical vanishing point in the two-dimensional image, determine the plane formed by the horizontal direction vector and the vertical direction vector, calculate the normal vector of the plane, and determine the three-dimensional projection plane based on the normal vector. The virtual model adjustment module is used to align the bottom surface of the preset three-dimensional virtual model to be superimposed with the three-dimensional projection plane, and to make adaptive adjustments to the preset three-dimensional virtual model to be superimposed so that the projection of the preset three-dimensional virtual model to be superimposed in the two-dimensional image is consistent with the perspective relationship of the two-dimensional image. The rendering module is used to render the fused image of the preset three-dimensional virtual model to be superimposed and the two-dimensional image.

[0063] Optionally, the vanishing point calculation module is used for: Based on the coordinate positions of the four marker points in the two-dimensional image, the first pair of opposite sides and the second pair of opposite sides constituting the target quadrilateral are identified from the four marker points. The first set of opposite sides are respectively defined as the first horizontal line segment and the second horizontal line segment, and the two endpoints of the first horizontal line segment and the two endpoints of the second horizontal line segment are obtained. The second set of opposite sides are respectively defined as the first vertical line segment and the second vertical line segment, and the two endpoints of the first vertical line segment and the two endpoints of the second vertical line segment are obtained; Based on the two endpoints of the first horizontal line segment and the two endpoints of the second horizontal line segment, calculate the first intersection point of the straight line containing the first horizontal line segment and the straight line containing the second horizontal line segment, and take the first intersection point as the horizontal vanishing point; Based on the two endpoints of the first longitudinal line segment and the two endpoints of the second longitudinal line segment, calculate the second intersection point of the straight line containing the first longitudinal line segment and the straight line containing the second longitudinal line segment, and take the second intersection point as the longitudinal vanishing point.

[0064] Optionally, the projection plane determination module is used for: The plane containing the two-dimensional image is defined as a virtual imaging plane, and a virtual viewpoint is preset in the normal direction of the virtual imaging plane; Based on the position of the horizontal vanishing point in the two-dimensional image, a first three-dimensional ray is generated from the virtual viewpoint to the horizontal vanishing point, and the unit direction vector of the first three-dimensional ray is determined as the horizontal direction vector. Based on the position of the vertical vanishing point in the two-dimensional image, a second three-dimensional ray is generated from the virtual viewpoint to the vertical vanishing point, and the unit direction vector of the second three-dimensional ray is determined as the vertical direction vector.

[0065] Optionally, the virtual viewpoint corresponds to the optical center position of a preset virtual camera, and the virtual imaging plane corresponds to the imaging plane of the preset virtual camera; the projection plane determination module is further configured to: The position of the horizontal vanishing point in the two-dimensional image is converted into screen coordinates in the screen coordinate system of a preset virtual camera; By performing a back projection operation from the screen point of the preset virtual camera to three-dimensional space, a three-dimensional ray is generated that originates from the optical center of the preset virtual camera and passes through the screen coordinates. The direction vector of the three-dimensional ray is used as the unit direction vector of the first three-dimensional ray.

[0066] Optionally, the projection plane determination module is further configured to: Obtain the canvas size of the two-dimensional image in the interface display area; The interface coordinates of the horizontal vanishing point in the interface display area are converted into relative coordinates with the center of the canvas as the origin. Based on the relative coordinates, the canvas size, and the preset screen resolution, determine the screen coordinates of the horizontal vanishing point in the preset virtual camera's screen coordinate system.

[0067] Optionally, the virtual model adjustment module is used for: Based on the normal vector of the three-dimensional projection plane, determine the target orientation of the bottom surface of the preset three-dimensional virtual model to be superimposed in three-dimensional space; The preset three-dimensional virtual model to be superimposed is rotated and adjusted so that the normal vector of the bottom surface of the preset three-dimensional virtual model to be superimposed is parallel to the normal vector of the three-dimensional projection plane; The preset three-dimensional virtual model to be superimposed is translated and adjusted so that the bottom surface of the preset three-dimensional virtual model to be superimposed coincides with the three-dimensional projection plane; The preset three-dimensional virtual model to be superimposed is scaled and adjusted so that the projection size of the preset three-dimensional virtual model in the two-dimensional image matches the two-dimensional image. The preset 3D virtual model to be superimposed, after being rotated, translated, and scaled, is placed on the 3D projection plane, and rendered based on the perspective constraints determined by the horizontal vanishing point and the vertical vanishing point, to generate a projection of the preset 3D virtual model to be superimposed in the 2D image with correct perspective relationship.

[0068] Optionally, the four markers are generated in response to a user's dragging or clicking action on the two-dimensional image.

[0069] It should be noted that other corresponding descriptions of the functional units involved in the device for automatically matching virtual models to image space provided in this application embodiment can be found in the following references. Figures 1 to 6 The corresponding descriptions in the method will not be repeated here.

[0070] This application also provides a computer device, which may specifically be a personal computer, a server, a network device, etc. Figure 8 As shown, the computer device includes a bus, a processor, memory, and a communication interface, and may also include an input / output interface and a display device. The processor provides computing and control capabilities. The memory includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database stores location information. The network interface allows communication with external terminals via a network connection. When the computer program is executed by the processor, it implements the steps in the various method embodiments.

[0071] Those skilled in the art will understand that Figure 8 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0072] In one embodiment, a computer-readable storage medium is provided, which may be non-volatile or volatile, having stored thereon a computer program that, when executed by a processor, implements the steps in the above method embodiments.

[0073] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.

[0074] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties.

[0075] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0076] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0077] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A method for automatically matching a virtual model to an image space, characterized in that, include: Receive the 2D image to be processed; Obtain four marker points marked on the two-dimensional image, wherein the four marker points form a quadrilateral, and the quadrilateral corresponds to the perspective distortion of a rectangle in the real world; Based on the four marked points, a target quadrilateral is constructed, and the intersection points of the lines containing the first set of opposite sides of the target quadrilateral are calculated to obtain the horizontal vanishing point, and the intersection points of the lines containing the second set of opposite sides of the target quadrilateral are calculated to obtain the vertical vanishing point. Based on the positions of the horizontal vanishing point and the vertical vanishing point in the two-dimensional image, the corresponding three-dimensional spatial horizontal direction vector and vertical direction vector are calculated respectively. The plane formed by the horizontal direction vector and the vertical direction vector is determined, and the normal vector of the plane is calculated. Based on the normal vector, the three-dimensional projection plane is determined. Align the bottom surface of the preset three-dimensional virtual model to be superimposed with the three-dimensional projection plane, and make adaptive adjustments to the preset three-dimensional virtual model to be superimposed so that the projection of the preset three-dimensional virtual model to be superimposed in the two-dimensional image is consistent with the perspective relationship of the two-dimensional image. The rendering process yields a fused image of the preset three-dimensional virtual model to be superimposed and the two-dimensional image.

2. The method according to claim 1, characterized in that, The step of calculating the intersection points of the lines containing the first pair of opposite sides of the target quadrilateral to obtain the transverse vanishing point, and calculating the intersection points of the lines containing the second pair of opposite sides of the target quadrilateral to obtain the longitudinal vanishing point, includes: Based on the coordinate positions of the four marker points in the two-dimensional image, the first pair of opposite sides and the second pair of opposite sides constituting the target quadrilateral are identified from the four marker points. The first set of opposite sides are respectively defined as the first horizontal line segment and the second horizontal line segment, and the two endpoints of the first horizontal line segment and the two endpoints of the second horizontal line segment are obtained. The second set of opposite sides are respectively defined as the first vertical line segment and the second vertical line segment, and the two endpoints of the first vertical line segment and the two endpoints of the second vertical line segment are obtained; Based on the two endpoints of the first horizontal line segment and the two endpoints of the second horizontal line segment, calculate the first intersection point of the straight line containing the first horizontal line segment and the straight line containing the second horizontal line segment, and take the first intersection point as the horizontal vanishing point; Based on the two endpoints of the first longitudinal line segment and the two endpoints of the second longitudinal line segment, calculate the second intersection point of the straight line containing the first longitudinal line segment and the straight line containing the second longitudinal line segment, and take the second intersection point as the longitudinal vanishing point.

3. The method according to claim 1, characterized in that, The step of calculating the corresponding three-dimensional spatial horizontal and vertical direction vectors based on the positions of the horizontal and vertical vanishing points in the two-dimensional image includes: The plane containing the two-dimensional image is defined as a virtual imaging plane, and a virtual viewpoint is preset in the normal direction of the virtual imaging plane; Based on the position of the horizontal vanishing point in the two-dimensional image, a first three-dimensional ray is generated from the virtual viewpoint to the horizontal vanishing point, and the unit direction vector of the first three-dimensional ray is determined as the horizontal direction vector. Based on the position of the vertical vanishing point in the two-dimensional image, a second three-dimensional ray is generated from the virtual viewpoint to the vertical vanishing point, and the unit direction vector of the second three-dimensional ray is determined as the vertical direction vector.

4. The method according to claim 3, characterized in that, The virtual viewpoint corresponds to the optical center position of a preset virtual camera, and the virtual imaging plane corresponds to the imaging plane of the preset virtual camera; generating a first three-dimensional ray from the virtual viewpoint to the transverse vanishing point includes: The position of the horizontal vanishing point in the two-dimensional image is converted into screen coordinates in the screen coordinate system of a preset virtual camera; By performing a back projection operation from the screen point of the preset virtual camera to three-dimensional space, a three-dimensional ray is generated that originates from the optical center of the preset virtual camera and passes through the screen coordinates. The direction vector of the three-dimensional ray is used as the unit direction vector of the first three-dimensional ray.

5. The method according to claim 4, characterized in that, The step of converting the position of the lateral vanishing point in the two-dimensional image into screen coordinates in the preset virtual camera's screen coordinate system includes: Obtain the canvas size of the two-dimensional image in the interface display area; The interface coordinates of the horizontal vanishing point in the interface display area are converted into relative coordinates with the center of the canvas as the origin. Based on the relative coordinates, the canvas size, and the preset screen resolution, determine the screen coordinates of the horizontal vanishing point in the preset virtual camera's screen coordinate system.

6. The method according to claim 3, characterized in that, The step of aligning the bottom surface of the preset 3D virtual model to be superimposed to the 3D projection plane, and adaptively adjusting the preset 3D virtual model to be superimposed so that the projection of the preset 3D virtual model to be superimposed in the 2D image is consistent with the perspective relationship of the 2D image, includes: Based on the normal vector of the three-dimensional projection plane, determine the target orientation of the bottom surface of the preset three-dimensional virtual model to be superimposed in three-dimensional space; The preset three-dimensional virtual model to be superimposed is rotated and adjusted so that the normal vector of the bottom surface of the preset three-dimensional virtual model to be superimposed is parallel to the normal vector of the three-dimensional projection plane; The preset three-dimensional virtual model to be superimposed is translated and adjusted so that the bottom surface of the preset three-dimensional virtual model to be superimposed coincides with the three-dimensional projection plane; The preset three-dimensional virtual model to be superimposed is scaled and adjusted so that the projection size of the preset three-dimensional virtual model in the two-dimensional image matches the two-dimensional image. The preset 3D virtual model to be superimposed, after being rotated, translated, and scaled, is placed on the 3D projection plane, and rendered based on the perspective constraints determined by the horizontal vanishing point and the vertical vanishing point, to generate a projection of the preset 3D virtual model to be superimposed in the 2D image with correct perspective relationship.

7. The method according to claim 1, characterized in that, The four marker points are generated in response to a user's dragging or clicking action on the two-dimensional image.

8. A device for automatically matching a virtual model to an image space, characterized in that, include: The image receiving module is used to receive two-dimensional images to be processed. The marker point acquisition module is used to acquire four marker points marked on the two-dimensional image, wherein the four marker points form a quadrilateral, and the quadrilateral corresponds to the perspective distortion of a rectangle in the real world; The vanishing point calculation module is used to construct a target quadrilateral based on the four marked points, and to calculate the intersection points of the lines containing the first set of opposite sides of the target quadrilateral to obtain the horizontal vanishing point, and to calculate the intersection points of the lines containing the second set of opposite sides of the target quadrilateral to obtain the vertical vanishing point. The projection plane determination module is used to calculate the corresponding three-dimensional spatial horizontal direction vector and vertical direction vector respectively based on the positions of the horizontal vanishing point and the vertical vanishing point in the two-dimensional image, determine the plane formed by the horizontal direction vector and the vertical direction vector, calculate the normal vector of the plane, and determine the three-dimensional projection plane based on the normal vector. The virtual model adjustment module is used to align the bottom surface of the preset three-dimensional virtual model to be superimposed with the three-dimensional projection plane, and to make adaptive adjustments to the preset three-dimensional virtual model to be superimposed so that the projection of the preset three-dimensional virtual model to be superimposed in the two-dimensional image is consistent with the perspective relationship of the two-dimensional image. The rendering module is used to render the fused image of the preset three-dimensional virtual model to be superimposed and the two-dimensional image.

9. A storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1 to 7.

10. A computer device, comprising a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, characterized in that, When the processor executes the computer program, it implements the method of any one of claims 1 to 7.