Graph reconstruction method, related devices and computer program product

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By obtaining the contextual text of the image to generate geometric constraints and correcting the initial pixel coordinates, the graphic reconstruction errors caused by hand-drawing errors and shooting distortion are resolved, achieving high-precision vector graphics reconstruction and improving the reliability of tasks such as intelligent teaching.

CN122199735APending Publication Date: 2026-06-12IFLYTEK CO LTD

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: IFLYTEK CO LTD
Filing Date: 2026-05-18
Publication Date: 2026-06-12

AI Technical Summary

⚠Technical Problem

Existing technologies cannot effectively identify and correct geometric reconstruction errors caused by hand-drawing errors or photographic distortion in image processing, resulting in deviations between the reconstructed and real images.

⚗Method used

By acquiring the original image and its associated contextual text, the initial pixel coordinates of the graphic elements are detected, and geometric constraints are generated based on the contextual text. The initial pixel coordinates are then corrected to generate the target pixel coordinates, and finally, vector graphics are constructed.

🎯Benefits of technology

It significantly improves the accuracy of image reconstruction and the consistency between visual graphics and semantic text, providing more reliable support for downstream tasks such as intelligent teaching.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122199735A_ABST

Patent Text Reader

Abstract

The application discloses a graphics reconstruction method, related equipment and a computer program product, and relates to the technical field of image processing. The application detects initial pixel coordinates of graphic elements to be vectorized in an original image; generates geometric constraint conditions between the graphic elements in the original image according to context text; corrects the initial pixel coordinates of all the graphic elements according to the geometric constraint conditions between the graphic elements, and obtains target pixel coordinates of each graphic element; generates a vector expression of each graphic element according to the target pixel coordinates of all the graphic elements, and constructs a vector graphics of the original image based on the vector expressions of all the graphic elements. According to the application, the accuracy of graphics reconstruction and the consistency of visual graphics and semantic text can be significantly improved, and more reliable support can be provided for downstream tasks such as intelligent teaching.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing technology, and more specifically, to a graphic reconstruction method, related equipment, and computer program product. Background Technology

[0002] With the deepening application of image processing and computer vision technologies in the field of intelligent interaction, transforming visual information in images into structured, editable vector graphics has become a key technological link supporting upper-level intelligent applications. For example, in the AI-powered problem-solving scenario in intelligent education, the geometric figures in a plane geometry problem photographed by a user can be reconstructed into vector graphics, providing a standardized display for interactive teaching tasks such as dynamic generation of auxiliary lines and deduction of graphic transformations.

[0003] Currently, vector reconstruction of geometric shapes in images can be based on traditional computer vision detection methods, which detect the pixel coordinates of graphic elements in the image and then connect the detection results to generate a vector path. However, this approach can only recognize geometric information at the pixel level. When there are hand-drawn errors or photographic distortions in the image, the model will accurately capture and reconstruct these errors, but it cannot identify the aforementioned errors, resulting in a deviation between the reconstructed image and the real image. Summary of the Invention

[0004] In view of the above problems, this application is made to provide a graphic reconstruction method, related equipment, and computer program product to improve the accuracy of reconstructed graphics. The specific solution is as follows:

[0005] In a first aspect, this application provides a method for graphic reconstruction, including:

[0006] Obtain the original image and the context text associated with the original image, the context text containing a textual description of the original image;

[0007] Detect the initial pixel coordinates of the graphic elements to be vectorized in the original image;

[0008] Based on the context text, generate geometric constraints between the graphic elements in the original image;

[0009] Based on the geometric constraints between the graphic elements, the initial pixel coordinates of all graphic elements are corrected to obtain the target pixel coordinates of each graphic element;

[0010] Based on the target pixel coordinates of all the graphic elements, a vector representation of each graphic element is generated, and a vector graphic of the original image is constructed based on the vector representations of all the graphic elements.

[0011] In one possible design, in another implementation of the first aspect of the embodiments of this application, the process of generating geometric constraints between graphic elements in the original image based on the context text includes:

[0012] The configured multimodal large model is invoked to instruct the multimodal large model to convert the original image into graphic description text, and to perform semantic parsing on the context text and the graphic description text. Based on the semantic parsing results, geometric constraints between the graphic elements to be vectorized in the original image are inferred and generated.

[0013] In one possible design, in another implementation of the first aspect of the embodiments of this application, the method further includes:

[0014] The vector graphics are visually rendered to obtain the corresponding visual image;

[0015] Receive geometric transformation operations initiated by the user for target geometric points in the visual image;

[0016] Based on the topological connection relationship between the target geometric point and other geometric points in the vector graphic, and the geometric constraints between the graphic elements in the vector graphic, the target pixel coordinates of all geometric points in the vector graphic are updated;

[0017] Based on the pixel coordinate update results of all geometric points, a target vector representation of each graphic element is generated, and based on the target vector representations of all graphic elements, the graphic transformation result after responding to the geometric transformation operation is rendered.

[0018] In one possible design, in another implementation of the first aspect of this application, the process of updating the target pixel coordinates of all geometric points in the vector graphic based on the topological connection relationship between the target geometric point and other geometric points in the vector graphic, and the geometric constraints between the graphic elements in the vector graphic, includes:

[0019] Based on the topological connection relationship between the target geometric point and other geometric points in the vector graphics, a geometric point topology map with the target geometric point as the root node is constructed;

[0020] Based on the geometric constraints between the graphic elements, starting from the root node, the pixel coordinates of each node are updated sequentially according to the hierarchical structure of the topological relationship graph of the geometric points, so as to obtain the pixel coordinate update results of all geometric points in the vector graphics.

[0021] In one possible design, in another implementation of the first aspect of the embodiments of this application, the process of correcting the initial pixel coordinates of all graphic elements according to the geometric constraints between the graphic elements to obtain the target pixel coordinates of each graphic element includes:

[0022] Construct an objective function, which is a function that minimizes the difference between the target pixel coordinates and the initial pixel coordinates of each graphic element, constrained by the geometric constraints between the graphic elements.

[0023] Solving the objective function yields the target pixel coordinates for each graphic element.

[0024] In one possible design, in another implementation of the first aspect of the embodiments of this application, the process of constructing the objective function includes:

[0025] The difference between the pixel coordinates to be solved for each of the graphic elements and the initial pixel coordinates is used as the baseline shape-preserving term of the objective function;

[0026] The geometric constraints between the graphic elements are transformed into a penalty term that includes the pixel coordinates to be solved.

[0027] Weight coefficients are assigned to the baseline conformal term and each of the penalty terms, wherein the weight coefficient of the penalty term obtained based on the first type of geometric constraint is higher than the weight coefficient of the penalty term obtained based on the second type of geometric constraint. The first type of geometric constraint is used to constrain the value of the pixel coordinate to be solved to be a finite number of discrete point values, and the second type of geometric constraint is used to constrain the value of the pixel coordinate to be solved to be an interval value.

[0028] The objective function is obtained by weighted summation of the baseline conformal term and all the penalty terms.

[0029] In one possible design, in another implementation of the first aspect of this application, the process of detecting the initial pixel coordinates of the graphic element to be vectorized in the original image includes:

[0030] The configured multimodal large model is invoked to instruct the multimodal large model to detect the graphic elements to be vectorized in the original image and output the initial pixel coordinates of all the graphic elements to be vectorized.

[0031] The multimodal large model is obtained by fine-tuning the basic multimodal large model using an image sample dataset labeled with pixel coordinates of graphic elements.

[0032] Secondly, this application provides an electronic device, including: a memory and a processor;

[0033] The memory is used to store programs;

[0034] The processor is configured to execute the program to implement the graphic reconstruction method described in any of the first aspects of this application.

[0035] Thirdly, this application provides a readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the graphics reconstruction method described in any of the first aspects of this application.

[0036] Fourthly, this application provides a computer program product, including a computer program that, when executed by a processor, implements the graphics reconstruction method described in any of the first aspects of this application.

[0037] By employing the aforementioned technical solution, the proposed graphic reconstruction method first identifies the initial pixel coordinates of the graphic elements to be vectorized in the original image at the pixel level, establishing a basic pixel basis for graphic reconstruction and improving the morphological matching degree between the reconstructed graphic and the original image. Furthermore, it identifies relevant descriptive information for the graphic elements from the contextual text associated with the original image and parses out the constraints used to constrain the geometric relationships between the graphic elements, providing a reference standard that more closely matches the actual design intent for graphic reconstruction and achieving precise correction of pixel-dimensional geometric errors.

[0038] By correcting the initial pixel coordinates based on the extracted geometric constraints, geometric distortions introduced by factors such as hand-drawing bias and shooting distortion can be effectively eliminated, making the corrected target pixel coordinates more consistent with the true geometric features of the image. Finally, vector representations are generated and vector graphics are constructed based on the corrected target pixel coordinates, which can significantly improve the accuracy of image reconstruction and the consistency between visual graphics and semantic text, providing more reliable support for downstream tasks such as intelligent teaching. Attached Figure Description

[0039] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the scope of this application. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings:

[0040] Figure 1 A schematic diagram of an implementation system architecture for the graphic reconstruction method provided in this application embodiment;

[0041] Figure 2This is a schematic diagram of a graphic reconstruction method provided in an embodiment of this application;

[0042] Figure 3 A schematic diagram of a geometric point topology map provided in an embodiment of this application;

[0043] Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0044] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0045] This application can be applied to the field of image processing. The following section will introduce several application scenarios that have been implemented in products, taking image reconstruction as an example.

[0046] First, some potential application scenarios for this application are introduced. In the AI-powered problem-solving scenario within the field of intelligent education, geometric figures from a user-captured plane geometry problem are reconstructed into vector graphics, providing a standardized display for interactive teaching tasks such as dynamic generation of auxiliary lines and graphical transformation deduction. In the scenario of digitally editing architectural blueprints, the diagrams in the drawings are first converted into vector graphics to enable dynamic editing of architectural components (such as walls, doors, and windows). In the scientific research scenario, when researchers read scanned papers, they can use the scanning tools provided by the reading software to identify coordinate axes, data points, or molecular structures in the original charts and reconstruct them into vector graphics, solving the problem of not being able to directly edit the original charts from scanned papers.

[0047] In the image reconstruction process described above, traditional computer vision (CV) detection methods can typically be used. These include edge detection (such as the Canny operator), Hough transform, or deep learning-based object detection models (such as YOLO and Mask R-CNN) to directly identify the pixel positions of graphic elements such as line segments, circles, and corners from the original image, and then connecting the identified pixels to form a vector path. Alternatively, an end-to-end generation scheme based on a large model (LLM) can be used, where the title image and text are input into the large model, and the model's generation capabilities are utilized to directly output drawing code.

[0048] However, computer vision detection methods can only recognize pixel-level geometric information. When there are hand-drawn errors or photographic distortions in the image, these errors will be accurately captured and reconstructed. For example, if a user-taken image is tilted, causing a perfect circle to appear as an ellipse in the photo, the reconstructed graphic will also be skewed. Secondly, large-scale model-generated graphics mainly rely on textual prompts, often ignoring the specific positions and relative proportions of vertices in the original image, resulting in inconsistencies between the generated vector graphics and the original image. For example, in the original question, vertex A of the triangle is at the top, but in the large-scale model-generated graphic, A might be in the lower left corner. This visual difference can severely interfere with students' cognitive comparison.

[0049] This application provides a graphic reconstruction method, related equipment, and computer program product that can effectively solve the above problems. The graphic reconstruction method, related equipment, and computer program product of this application will be described in detail below with reference to the accompanying drawings.

[0050] See Figure 1 , Figure 1 A schematic diagram of a system architecture is shown. The system may include a terminal 100 and a server 200, wherein the server 200 may include one or more servers (…). Figure 1 (This example uses a server as an illustration).

[0051] Either terminal 100 or server 200 can be used independently to execute the image reconstruction method provided in the embodiments of this application. Alternatively, terminal 100 and server 200 can also be used collaboratively to execute the image reconstruction method provided in the embodiments of this application.

[0052] In this application embodiment, the terminal 100 can be a mobile phone, tablet computer, teaching screen, learning machine, wearable device, conference terminal, augmented reality (AR) / virtual reality (VR) device, laptop computer, ultra-mobile personal computer (UMPC), netbook, personal digital assistant (PDA), etc., and this application embodiment does not impose any restrictions on it.

[0053] This application provides a method for image reconstruction, illustrated by applying the method to a computer device. Specifically, the computer device may be... Figure 1 The system consists of terminal 100 or a combination of terminal 100 and server 200. (Refer to...) Figure 2 The present application provides a schematic flowchart of a graphic reconstruction method, which may include steps S110 to S150, and these steps are described in detail below.

[0054] Step S110: Obtain the original image and the context text associated with the original image.

[0055] Specifically, the original image and its associated contextual text can be obtained from local files, databases, network interfaces, etc. The contextual text contains a textual description of the original image. For example, Figure 3 The original questions include text describing the primitives in geometric images; engineering reports include textual content on design specifications, material lists, and construction instructions for CAD engineering drawings; and academic papers include textual content on the citation of charts, model diagrams, and other images, along with captions, experimental conditions, results analysis, and discussions. Original images can be in grayscale, binary, or other formats; there are no restrictions on the format.

[0056] Step S120: Detect the initial pixel coordinates of the graphic elements to be vectorized in the original image.

[0057] The process involves locating and extracting the positional information of the graphic elements to be vectorized (e.g., points, lines, line segments, circles, arcs, etc.) in the image coordinate system from the original image. This information serves as the initial pixel coordinates for these graphic elements. For regular graphic elements, such as points, lines, and circles, the pixel coordinates of their key feature points can be extracted as the initial pixel coordinates. For example, the coordinates of a point, the coordinates of the two endpoints of a line, and the coordinates of the center and a point on the radius of a circle. For irregular graphic elements, such as free-form curves and hand-drawn closed regions, the coordinates of all pixels on the outline of the graphic element can be extracted to form a set of coordinate points representing the graphic element.

[0058] This step plays a role in primitive localization during the entire graphic reconstruction process. The initial pixel coordinates of each detected graphic element can provide an initial position reference for the graphic elements in subsequent steps.

[0059] In one possible implementation, this step can employ image segmentation and edge detection algorithms to separate all graphic elements to be vectorized from the image, and then obtain the initial pixel coordinates of each graphic element through algorithms such as contour fitting and coordinate extraction.

[0060] In another possible implementation, this step can invoke a configured multimodal large model to instruct it to detect the graphic elements to be vectorized in the original image and output the initial pixel coordinates of all graphic elements to be vectorized; wherein, the multimodal large model is obtained by fine-tuning the basic multimodal large model using an image sample dataset labeled with the pixel coordinates of the graphic elements.

[0061] The image samples can cover realistic vectorized image scenarios such as geometric exercise diagrams, hand-drawn geometric figures, and scanned document images. Then, through manual or automatic annotation tools, the pixel coordinates of the graphic elements in each image sample are annotated in a uniform format. Examples include: "point":"keypoint coordinates{\"point name\":\"[x,y]\",...}", "line":"line segment information{\"line name\":{\"endpoints\":{\"endpoint 1\":\"[x,y]\",...}}}", "circle":"circle information{\"circle name\":{\"center\":{\"center point\":\"[x,y]\"},\"points\":{...}}}", "angle":"angle information{\"angle name\":{\"endpoints\":{\"endpoint 1\":...,\"vertex\":...,\"endpoint 2\":...}}}", etc.

[0062] Using the aforementioned labeled image sample dataset, a general-purpose basic multimodal large model was selected. This dataset was used as training data to fine-tune the basic multimodal large model, enabling it to learn the mapping relationship between image visual features and the pixel coordinates of graphic elements. This resulted in a dedicated multimodal large model adapted for graphic detection and coordinate output. Compared to directly using the basic multimodal large model to detect the original image, the fine-tuned multimodal large model can more accurately identify and distinguish different graphic elements in the image. It also shows enhanced adaptability to geometric vectorization scenes and superior performance in detecting graphics in blurred or distorted images.

[0063] The fine-tuned multimodal large model is deployed to the inference environment. A prompt word is generated based on the original image obtained in step S110 and input into the fine-tuned multimodal large model. This prompt word may include role settings, task instructions (such as "Please identify different graphic elements in the input original image, such as points, lines, circles, etc., and determine the pixel coordinates of each graphic element in the image coordinate system of the original image"), and output requirements (such as "Output in JSON format"). The multimodal large model follows the instructions of the prompt word to generate the initial pixel coordinates of each graphic element in the original image.

[0064] Step S130: Based on the context text, generate geometric constraints between the graphic elements in the original image.

[0065] It is understandable that the textual description of the original image contained in the context may include graphic information that the original image cannot directly present visually. For example, refer to... Figure 3The original problem text states that "the side length of rhombus ABCD is 2cm," but this crucial geometric constraint information cannot be derived solely through visual image recognition. Furthermore, considering that relying solely on the textual descriptions within the context to define the various graphic elements and their relationships cannot accurately determine their distribution, for example, referring to… Figure 3 The crease is EF, but it is not clear which side points E and F fall on.

[0066] Based on the above considerations, this step combines the semantic descriptions of each graphic element in the original image in the context text with the actual positional relationships of each graphic element in the original image to generate sufficiently complete geometric constraints.

[0067] In one possible implementation, this step can employ a natural language processing algorithm to semantically parse the context text and extract initial geometric constraints based on the parsing results, including the type, quantity, size relationships, and positional relationships (such as parallel, perpendicular, collinear, etc.) of the graphic elements. These initial geometric constraints are then supplemented and corrected by combining the distribution information of the graphic elements in the original image to obtain the final usable geometric constraints.

[0068] In another possible implementation, this step can also invoke the configured multimodal large model to instruct the multimodal large model to convert the original image into graphic description text, and perform semantic parsing on the context text and graphic description text. Based on the semantic parsing results, it infers and generates geometric constraints between the graphic elements to be vectorized in the original image.

[0069] This embodiment utilizes the multimodal information fusion and geometric logic reasoning capabilities of a multimodal large model. Prompt words generated based on the original image and contextual text are input into the multimodal large model to instruct it to combine the semantic information in the contextual text and the image features of the original image to generate constraints that constrain the geometric relationships between the graphic elements in the original image.

[0070] The prompts generated based on the original image and contextual text can include expert roles, task instructions, output requirements, etc. For example: "I need you to play the role of a mathematical geometry expert. First, please 'look' at this geometry problem and interpret it in plain, easy-to-understand natural language. Your interpretation should cover the problem text, the structure of the figure, the analysis of the figure elements, and the final problem to be solved. After completing this step, please translate your natural language into a formatted geometry definition language (GDL)."

[0071] Understandably, this step involves multimodal joint parsing of the context text and the original image to align the geometric semantic information on the text side with the visual geometric features on the image side, ensuring that the geometric constraints between the generated graphic elements remain consistent in both semantic description and visual features. This provides a standardized geometric constraint basis for step S140, which extracts initial pixel coordinates solely based on visual detection. This ensures that the corrected graphic elements possess precise spatial positions while also conforming to the original design intent conveyed by the context text, correcting geometric deviations caused by factors such as distortion of the original image. Consequently, the final vectorized result better meets the design requirements and usage scenarios of practical applications.

[0072] Step S140: Based on the geometric constraints between graphic elements, the initial pixel coordinates of all graphic elements are corrected to obtain the target pixel coordinates of each graphic element.

[0073] In the specific implementation of this step, the geometric constraints obtained in step S130 can first be transformed into mathematical expressions (inequalities or equations). The initial pixel coordinates of the graphic elements involved in each mathematical expression are then substituted into them, and the initial pixel coordinates are adjusted to satisfy the mathematical expressions until the target pixel coordinates that satisfy all mathematical expressions are obtained, so that the graphic elements under the corrected target pixel coordinates satisfy the geometric constraints.

[0074] Step S150: Generate a vector representation of each graphic element based on the target pixel coordinates of all graphic elements, and construct a vector graphic of the original image based on the vector representations of all graphic elements.

[0075] This step utilizes the target pixel coordinates to generate a parametric representation (i.e., vector representation) of each graphic element. The vector representation method can be selected based on the type of graphic element; for example, an ellipse can be represented by a quadratic equation, a straight line by a linear equation Ax + By + C = 0, and a curve by a Bézier curve. Finally, the vector representations, primitive types, relationships, and construction rules of all graphic elements are integrated to generate a vector graphic encapsulated in a standard format (such as JSON), completing the process of converting the original pixel art image into a vector graphic reconstruction.

[0076] In summary, the proposed graphic reconstruction method first identifies the initial pixel coordinates of the graphic elements to be vectorized in the original image at the pixel level, establishing a basic pixel basis for graphic reconstruction and improving the morphological matching degree between the reconstructed graphic and the original image. Furthermore, it identifies relevant descriptive information for the graphic elements from the contextual text associated with the original image and parses the constraints used to constrain the geometric relationships between the graphic elements, providing a reference standard that more closely matches the actual design intent for graphic reconstruction and achieving precise correction of pixel-level geometric errors.

[0077] By correcting the initial pixel coordinates based on the extracted geometric constraints, geometric distortions introduced by factors such as hand-drawing bias and shooting distortion can be effectively eliminated, making the corrected target pixel coordinates more consistent with the true geometric features of the image. Finally, vector representations are generated and vector graphics are constructed based on the corrected target pixel coordinates, which can significantly improve the accuracy of image reconstruction and the consistency between visual graphics and semantic text, providing more reliable support for downstream tasks such as intelligent teaching.

[0078] Next, other possible implementations of the graphic reconstruction method proposed in this application will be described in detail through the following embodiments.

[0079] In one possible implementation, step S140, which involves correcting the initial pixel coordinates of all graphic elements based on the geometric constraints between graphic elements to obtain the target pixel coordinates of each graphic element, can be achieved by using optimization algorithms such as least squares, gradient descent, or linear programming to construct an objective function and solve the objective function to obtain the corrected target pixel coordinates.

[0080] First, the optimization objective is defined as minimizing the difference between the desired pixel coordinates and the initial pixel coordinates for each graphic element. Then, the constraint variables are defined, namely the geometric constraints between the graphic elements. Based on this, an objective function is constructed. Solving the objective function yields the optimal "desired pixel coordinates" that satisfy the constraints, i.e., the target pixel coordinates.

[0081] Understandably, during the numerical solution of the objective function, conflicts often arise between geometric constraints, preventing multiple constraints from being satisfied simultaneously. In such cases, to obtain a reasonable optimal solution, the core constraints can be prioritized based on their importance or pre-defined priority. To reflect the priority relationship between geometric constraints, this application assigns corresponding weight coefficients to different geometric constraints during the objective function construction, quantifying the influence of each constraint. This quantifies the priority relationship between constraints during the numerical solution process, effectively demonstrating the dominant role of the core constraints in the coordinate correction results.

[0082] Specifically, the process of constructing the objective function may include: using the difference between the pixel coordinates to be solved and the initial pixel coordinates of each graphic element as the baseline shape-preserving term of the objective function; transforming the geometric constraints between each graphic element into penalty terms that include the pixel coordinates to be solved; assigning weight coefficients to the baseline shape-preserving term and each penalty term, wherein the weight coefficient of the penalty term transformed based on the first type of geometric constraints is higher than the weight coefficient of the penalty term transformed based on the second type of geometric constraints. The first type of geometric constraints is used to constrain the pixel coordinates to be solved to take the values of a finite number of discrete points, and the second type of geometric constraints is used to constrain the pixel coordinates to be solved to take the values of an interval; and performing a weighted summation of the baseline shape-preserving term and all penalty terms to obtain the objective function.

[0083] Finding the optimal pixel coordinates requires considering two aspects: first, the geometric shape formed by the final obtained target pixel coordinates should have less deformation compared to the geometric shape in the original image; second, it should satisfy the geometric constraints, so the constructed objective function can balance the two. In this application, the difference between the target pixel coordinates and the initial pixel coordinates is used as the baseline shape-preserving term, and each geometric constraint is transformed into a mathematical penalty term containing the target pixel coordinates. Corresponding weight coefficients are assigned to the baseline shape-preserving term and each penalty term. The baseline shape-preserving term and each penalty term are weighted and summed to form the overall objective function, as shown in equation (1).

[0084] (1)

[0085] in, As the baseline conformal term, P i Let P represent the coordinates of the i-th pixel to be solved. obs,i This represents the coordinates of the i-th initial pixel. This is the sum of the differences between all the pixel coordinates to be solved and their corresponding initial pixel coordinates. The smaller the value, the smaller the distance between the pixel coordinates to be solved and the initial pixel coordinates, resulting in a geometric shape formed by the final solved target pixel coordinates with less deformation compared to the geometric shape in the original image. geometric (P) represents the penalty term, λ vis , λ geo These represent the weight coefficients of the baseline conformity term and the penalty term, respectively.

[0086] Understandably, during the objective function solution process, the optimal solution is found by iteratively adjusting the pixel coordinates to be solved. If the current pixel coordinates do not satisfy a certain geometric constraint, the penalty term resulting from the corresponding geometric constraint will generate a large penalty value, which is accumulated with the baseline shape-preserving term. The more unsatisfied geometric constraints there are, the larger the cumulative penalty value of each penalty term, thus increasing the overall value of the overall objective function. Therefore, by minimizing the objective function value, the pixel coordinates to be solved can be driven to iterate in the direction that satisfies all geometric constraints, ultimately achieving automatic satisfaction of geometric constraints.

[0087] Furthermore, considering the aforementioned conflicting constraints, such as conflicts between geometric constraints or between the pixel coordinates of visual observation and the geometric constraints of text description, this application first assigns a lower weight coefficient to the baseline conformal term than the penalty term, thereby prioritizing the satisfaction of geometric constraints and forcibly correcting visual errors during the solution of the objective function.

[0088] When faced with conflicting geometric constraints, this application categorizes all geometric constraints into two types based on the strictness of their requirements for pixel coordinate values, and assigns differentiated weight coefficients to the penalty terms corresponding to different types of geometric constraints, thereby effectively eliminating conflicts between different constraints.

[0089] Geometric constraints are categorized into two types based on the strictness of the requirements for pixel coordinate values. The first type requires that the pixel coordinates to be solved fall within a finite number of fixed discrete points, with no flexibility in the range of coordinate values. For example, the geometric constraint AO⊥BO requires that the angle formed by the lines connecting pixel coordinates A, O, and B must be equal to 90°, providing a unique constraint for the pixel coordinates of A, O, and B. The second type requires that the pixel coordinates to be solved fall within a certain continuous numerical range, limiting only the range of coordinate values while retaining flexibility for adjustment within the range. For example, if ∠AOB is an acute angle, then the angle formed by the lines connecting pixel coordinates A, O, and B only needs to be within the range (0, 90°), meaning the constraint for the pixel coordinates of A, O, and B is not unique.

[0090] Therefore, the first type of constraint, as the core rigid constraint that cannot be deviated from in the vectorization scene, directly determines whether the vectorization result conforms to the original design intent, and its inviolability is far higher than that of the second type of constraint. Based on this, during the construction of the objective function, a higher weight coefficient is assigned to the first type of constraint than to the second type of constraint, and the weight coefficient of its corresponding penalty term in the objective function is also increased accordingly. During the iterative process of solving the objective function, the optimization algorithm will prioritize adjusting the pixel coordinates to be solved to satisfy the first type of constraint, thereby avoiding large penalty values due to violations of high-weight constraints, which would lead to an increase in the objective function value, and ultimately effectively eliminate the conflict between geometric constraints.

[0091] Finally, the minimum value of the objective function is calculated to obtain the target pixel coordinates of each graphic element. Using these target pixel coordinates, a vector representation of each graphic element is constructed, generating a vector graphic that can ultimately be used for visual rendering, providing data support for downstream intelligent teaching, graphic editing, and other applications.

[0092] In one possible downstream task implementation, the graphics reconstruction method may further include: visually rendering a vector graphic to obtain a corresponding visual image; receiving a geometric transformation operation initiated by a user for a target geometric point in the visual image; updating the target pixel coordinates of all geometric points in the vector graphic based on the topological connection relationship between the target geometric point and other geometric points in the vector graphic, as well as the geometric constraints between graphic elements in the vector graphic; generating a target vector representation for each graphic element based on the updated pixel coordinates of all geometric points, and rendering the graphics transformation result after responding to the geometric transformation operation based on the target vector representations of all graphic elements.

[0093] This process, based on the graphic information in the vector graphics obtained earlier, performs visual rendering to obtain the corresponding visual image, which is then displayed to the user through a pre-built interactive window, providing a visual medium for user interaction. The user selects a target geometric point in the visual image using mouse, touch, or other interactive methods and initiates geometric transformation operations such as translation, rotation, and scaling. The system captures this operation command, determining the original position and the expected position of the target geometric point after transformation.

[0094] It is understandable that a transformation of a target geometric point will affect other geometric points associated with it. If only the coordinates of the target geometric point are updated, it will lead to problems such as graphic deformation and breakage. Therefore, it is necessary to rely on the topological connections between graphic elements in vector graphics (such as the relationships between geometric points) and geometric constraints to update the pixel coordinates of all related geometric points in a coordinated manner, so as to ensure the integrity of the graphic after responding to the geometric transformation operation.

[0095] In one possible implementation, the topological connections and geometric constraints between geometric points can be represented by constraint equations. For example, the constraint equation (x2-x1)×(x3-x2)+(y2-y1)×(y3-y2)=0 represents that adjacent edges are perpendicular. When a point (x1,y1) is transformed, (x2,y2) changes accordingly to satisfy this constraint equation, thus obtaining the transformed graph.

[0096] In another possible implementation, the topological connections between geometric points can be represented through a relational graph, and the pixel coordinates can be updated accordingly. The specific process may include: constructing a topological relational graph of geometric points with the target geometric point as the root node, based on the topological connections between the target geometric point and other geometric points in the vector graphics; and, based on the geometric constraints between graphic elements, updating the pixel coordinates of each node sequentially according to the hierarchical structure of the topological relational graph, starting from the root node, to obtain the updated pixel coordinates of all geometric points in the vector graphics.

[0097] First, the direct or indirect relationships between geometric points in the image are identified. The target geometric point operated on by the user is taken as the root node. Geometric points directly associated with the root node are designated as first-level nodes, and geometric points associated with first-level nodes but not root nodes are designated as second-level nodes, and so on. A directed acyclic graph (DAG) is used to construct a hierarchical topological relationship graph of geometric points. Simultaneously, geometric constraints (such as lines, arcs, etc.) are defined for the connecting edges used to connect two geometric points within the topological relationship graph. This facilitates subsequent updates of the coordinates of each point in the graph based on the constraints of the connecting edges.

[0098] To Figure 3 This example illustrates the construction of a topological graph of geometric points from the original problem. In a digital teaching scenario, the teaching whiteboard displays the geometric image from the original problem. Responding to the teacher's dragging operation on point O in the geometric image, the system identifies the connections between other geometric points A, B, C, D, E, and F and point O, as well as the connections between A, B, C, D, E, and F. A, B, C, D, E, and F are all first-level nodes relative to point O, and connecting edges are drawn that directly connect point O to each of A, B, C, D, E, and F. Simultaneously, based on the connections between A, B, C, D, E, and F, connecting edges are drawn between each of A, B, C, D, E, and F.

[0099] Simultaneously, the geometric constraints between the graphic elements obtained in step S130 can be used to define geometric constraints for each connecting edge. For example, connecting edge 1 is defined with the geometric constraint "AO⊥BO∪AO⊥DO∪[AOC] collinear", and connecting edge 2 is defined with the geometric constraint "BO⊥AO∪BO⊥CO∪[BOD] collinear".

[0100] Furthermore, the coordinates of each geometric point are updated based on the constructed geometric point topology graph. Specifically, the transformed coordinates (x0+Δx, y0+Δy) of point O are first determined based on the changes in coordinate values (Δx, Δy) during the user's dragging operation on point O. Then, using the coordinates of the root node O as a reference, and combining the geometric constraints defined by each connecting edge, the updated coordinates of the first-level nodes are calculated. After the first-level nodes are updated, the updated coordinates of the first-level nodes are used as a reference, and the second-level nodes are updated based on the geometric constraints, and so on, until the coordinates of all nodes are updated.

[0101] Understandably, by establishing the obtained geometric point topology map, the associated paths between each geometric point and the target geometric point can be quickly located during the coordinate update process, avoiding the tedious problem of finding associations when dealing with complex graphics. At the same time, hierarchical updates can reduce the redundancy of coordinate calculations, reduce the probability of constraint conflicts, and further improve the accuracy and efficiency of coordinate updates.

[0102] Finally, after updating the pixel coordinates of all geometric points, the vector representation of each graphic element is reconstructed using the updated pixel coordinates. The new vector data is then rendered into a visual image and presented to the user, completing the entire interactive loop.

[0103] Based on the above interaction methods, the device can support interactive operations between users and images. During the interaction, the geometric features of each graphic element are preserved, preventing the destruction of the geometric attributes of the graphics. This ensures the rationality and integrity of graphic transformations, making the editing results meet the user's expectations for graphic editing, and is adaptable to various downstream tasks related to graphic reconstruction. For users, operations can be completed simply by interacting with the visualized visual image, without needing to concern themselves with the complex structure and processing logic of the underlying vector data. This effectively reduces the difficulty of operation and significantly improves the user experience for various downstream tasks.

[0104] This application also provides an electronic device in its embodiments. (See reference...) Figure 4 The diagram illustrates a structural schematic suitable for implementing the electronic device in the embodiments of this application. The electronic device in the embodiments of this application may include, but is not limited to, fixed terminals such as mobile phones, tablets, large-screen teaching displays, wearable devices, etc. Figure 4The electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0105] like Figure 4 As shown, the electronic device may include a processing unit (e.g., a central processing unit, a graphics processing unit, etc.) 1, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 2 or a program loaded from a storage device 8 into a random access memory (RAM) 3, to implement the graphics reconstruction method of the foregoing embodiments of this application. When the electronic device is powered on, the RAM 3 also stores various programs and data required for the operation of the electronic device. The processing unit 1, ROM 2, and RAM 3 are interconnected via a bus 4. An input / output (I / O) interface 5 is also connected to the bus 4.

[0106] Typically, the following devices can be connected to I / O interface 5: input devices 6 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 7 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 8 including, for example, memory cards, hard drives, etc.; and communication devices 9. Communication device 9 allows electronic devices to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 4 Electronic devices with various devices are shown, but it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.

[0107] This application also provides a computer program product including computer-readable instructions, which, when executed on an electronic device, cause the electronic device to implement any of the graphic reconstruction methods provided in this application.

[0108] This application also provides a computer-readable storage medium that carries one or more computer programs. When the one or more computer programs are executed by an electronic device, the electronic device can implement any of the graphic reconstruction methods provided in this application.

[0109] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. In addition, in the device embodiment drawings provided in this application, the connection relationship between modules indicates that they have a communication connection, which can be implemented as one or more communication buses or signal lines.

[0110] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware, or it can be implemented by special-purpose hardware including application-specific integrated circuits, special-purpose CPUs, special-purpose memory, special-purpose components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for this application, software program implementation is more often the preferred implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, ROM, RAM, magnetic disk, or optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, training equipment, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0111] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.

[0112] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state drives (SSDs)).

[0113] The various embodiments in this specification are described in a progressive manner. Each embodiment focuses on the differences from other embodiments. The various embodiments can be combined as needed, and the same or similar parts can be referred to each other.

Claims

1. A method for reconstructing images, characterized in that, include: Obtain the original image and the context text associated with the original image, the context text containing a textual description of the original image; Detect the initial pixel coordinates of the graphic elements to be vectorized in the original image; Based on the context text, generate geometric constraints between the graphic elements in the original image; Based on the geometric constraints between the graphic elements, the initial pixel coordinates of all graphic elements are corrected to obtain the target pixel coordinates of each graphic element; Based on the target pixel coordinates of all the graphic elements, a vector representation of each graphic element is generated, and a vector graphic of the original image is constructed based on the vector representations of all the graphic elements.

2. The image reconstruction method according to claim 1, characterized in that, The process of generating geometric constraints between graphic elements in the original image based on the context text includes: The configured multimodal large model is invoked to instruct the multimodal large model to convert the original image into graphic description text, and to perform semantic parsing on the context text and the graphic description text. Based on the semantic parsing results, geometric constraints between the graphic elements to be vectorized in the original image are inferred and generated.

3. The image reconstruction method according to claim 1, characterized in that, The method further includes: The vector graphics are visually rendered to obtain the corresponding visual image; Receive geometric transformation operations initiated by the user for target geometric points in the visual image; Based on the topological connection relationship between the target geometric point and other geometric points in the vector graphic, and the geometric constraints between the graphic elements in the vector graphic, the target pixel coordinates of all geometric points in the vector graphic are updated; Based on the pixel coordinate update results of all geometric points, a target vector representation of each graphic element is generated, and based on the target vector representations of all graphic elements, the graphic transformation result after responding to the geometric transformation operation is rendered.

4. The image reconstruction method according to claim 3, characterized in that, The process of updating the target pixel coordinates of all geometric points in the vector graphic based on the topological connection relationship between the target geometric point and other geometric points in the vector graphic, and the geometric constraints between the graphic elements in the vector graphic, includes: Based on the topological connection relationship between the target geometric point and other geometric points in the vector graphics, a geometric point topology map with the target geometric point as the root node is constructed; Based on the geometric constraints between the graphic elements, starting from the root node, the pixel coordinates of each node are updated sequentially according to the hierarchical structure of the topological relationship graph of the geometric points, so as to obtain the pixel coordinate update results of all geometric points in the vector graphics.

5. The image reconstruction method according to claim 1, characterized in that, The process of correcting the initial pixel coordinates of all graphic elements based on the geometric constraints between the graphic elements to obtain the target pixel coordinates of each graphic element includes: Construct an objective function, which is a function that minimizes the difference between the target pixel coordinates and the initial pixel coordinates of each graphic element, constrained by the geometric constraints between the graphic elements. Solving the objective function yields the target pixel coordinates for each graphic element.

6. The image reconstruction method according to claim 5, characterized in that, The process of constructing the objective function includes: The difference between the pixel coordinates to be solved for each of the graphic elements and the initial pixel coordinates is used as the baseline shape-preserving term of the objective function; The geometric constraints between the graphic elements are transformed into a penalty term that includes the pixel coordinates to be solved. Weight coefficients are assigned to the baseline conformal term and each of the penalty terms, wherein the weight coefficient of the penalty term obtained based on the first type of geometric constraint is higher than the weight coefficient of the penalty term obtained based on the second type of geometric constraint. The first type of geometric constraint is used to constrain the value of the pixel coordinate to be solved to be a finite number of discrete point values, and the second type of geometric constraint is used to constrain the value of the pixel coordinate to be solved to be an interval value. The objective function is obtained by weighted summation of the baseline conformal term and all the penalty terms.

7. The image reconstruction method according to any one of claims 1-6, characterized in that, The process of detecting the initial pixel coordinates of the graphic elements to be vectorized in the original image includes: The configured multimodal large model is invoked to instruct the multimodal large model to detect the graphic elements to be vectorized in the original image and output the initial pixel coordinates of all the graphic elements to be vectorized. The multimodal large model is obtained by fine-tuning the basic multimodal large model using an image sample dataset labeled with pixel coordinates of graphic elements.

8. An electronic device, characterized in that, include: Memory and processor; The memory is used to store programs; The processor is used to execute the program to implement the various steps of the graphics reconstruction method as described in any one of claims 1 to 7.

9. A readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the various steps of the graphics reconstruction method as described in any one of claims 1 to 7.

10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the various steps of the graphics reconstruction method as described in any one of claims 1 to 7.