An image generation method, apparatus, electronic device, and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By obtaining the layout information of the original image, generating a background image and adding a sample field, the problem that the sample image cannot reflect the layout and background information is solved, thus improving the training accuracy of the deep learning model.

CN115526964BActive Publication Date: 2026-06-19JD DIGITS HAIYI INFORMATION TECHNOLOGY CO LTD

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: JD DIGITS HAIYI INFORMATION TECHNOLOGY CO LTD
Filing Date: 2022-10-20
Publication Date: 2026-06-19

Application Information

Patent Timeline

20 Oct 2022

Application

19 Jun 2026

Publication

CN115526964B

IPC: G06T11/60; G06T3/04; G06V30/19; G06V30/146; G06V30/41; G06N3/0475; G06N3/094

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

In existing technologies, sample images cannot reflect the layout and background information of the original images, which leads to a decrease in the recognition accuracy of deep learning models.

⚗Method used

By obtaining the layout information of the original image, removing some fields to generate a background image, and determining the sample fields in a pre-built corpus, the sample fields are added to the background image based on their positions to generate a sample image corresponding to the original image.

🎯Benefits of technology

The generated sample images can comprehensively and completely reflect the features of the original images, thus improving the accuracy of deep learning model training.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115526964B_ABST

Patent Text Reader

Abstract

This invention discloses an image generation method, apparatus, electronic device, and storage medium. The method includes: acquiring an original image and corresponding layout information; wherein the original image includes at least one original field, and the layout information includes the field type and field position corresponding to each original field; removing at least one original field from the original image to obtain a background image; determining a removal corpus corresponding to the field type of the removed original field in at least one pre-constructed corpus, and determining a sample field corresponding to the removed original field in the removal corpus; adding the sample field to the background image based on the field position of the removed original field in the original image to generate a sample image corresponding to the original image. The technical solution of this invention generates a sample image that comprehensively and completely reflects the features of the original image, which is beneficial for improving the accuracy of training results.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to the field of image processing technology, and in particular to an image generation method, apparatus, electronic device and storage medium. Background Technology

[0002] In many fields, deep learning models have become a widely used method for recognizing image content. For example, in some applications, deep learning models are used to recognize text information in product description images to quickly identify the product details. However, accurate recognition based on deep learning models requires a large amount of sample image data as technical support.

[0003] Currently, it is possible to identify single-field terms in one or more original images, and then generate sample images in batches based on the identified single terms. However, in the process of implementing this invention, it has been found that the prior art has at least the following technical problems: although sample images with term content consistent with the original images can be obtained, the sample images cannot reflect the layout information between the terms and the background information of the original images, which reduces the recognition accuracy of the deep learning model. Summary of the Invention

[0004] This invention provides an image generation method, apparatus, electronic device, and storage medium to achieve the goal of generating sample images that can comprehensively and completely reflect the features of the original image, which is beneficial to improving the accuracy of training deep learning models.

[0005] In a first aspect, embodiments of the present invention provide an image generation method, comprising:

[0006] Obtain the original image and the layout information corresponding to the original image; wherein, the original image includes at least one original field, and the layout information includes the field type and field position corresponding to each original field;

[0007] Remove at least one original field from the original image to obtain the background image;

[0008] In at least one pre-built corpus, a removal corpus corresponding to the field type of the original field to be removed is determined, and a sample field corresponding to the original field to be removed is determined in the removal corpus.

[0009] Based on the position of the removed original field in the original image, the sample field is added to the background image to generate a sample image corresponding to the original image.

[0010] Secondly, embodiments of the present invention also provide an image generation apparatus, the apparatus comprising:

[0011] A layout information receiving module is used to acquire an original image and layout information corresponding to the original image; wherein, the original image includes at least one original field, and the layout information includes the field type and field position corresponding to each original field;

[0012] A background image determination module is used to remove at least one original field from the original image to obtain a background image;

[0013] The sample field determination module is used to determine, in at least one pre-built corpus, a removal corpus corresponding to the field type of the original field to be removed, and to determine, in the removal corpus, a sample field corresponding to the original field to be removed;

[0014] The sample image generation module is used to add the sample field to the background image based on the field position of the removed original field in the original image, thereby generating a sample image corresponding to the original image.

[0015] Thirdly, embodiments of the present invention also provide an electronic device, the electronic device comprising:

[0016] One or more processors;

[0017] Storage device for storing one or more programs.

[0018] When the one or more programs are executed by the one or more processors, the one or more processors implement the image generation method provided in any embodiment of the present invention.

[0019] Fourthly, embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the image generation method provided in any embodiment of the present invention.

[0020] This invention provides an image generation method that obtains an original image and its corresponding layout information. The layout information includes the field type and position of each original field. At least one original field is removed from the original image to obtain a background image. Based on the field type of the removed original field, a removal corpus corresponding to the removed original field can be determined from at least one pre-built corpus. A sample field corresponding to the removed original field is then determined from the removal corpus. Based on the position of the removed original field in the original image, the sample field is added to the background image to generate a sample image corresponding to the original image. This ensures that the generated sample image maintains the same layout information as the original image and still uses the background from the original image as the background of the sample image. This invention solves the problem that sample images in the prior art cannot reflect layout and background information. The generated sample image can comprehensively and completely reflect the features of the original image. When training deep learning models using the sample images generated by this invention, it helps improve the accuracy of the training results.

[0021] Furthermore, the image generation apparatus, electronic device, and storage medium provided by the present invention correspond to the above-described method and have the same beneficial effects. Attached Figure Description

[0022] To more clearly illustrate the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0023] Figure 1 A flowchart of an image generation method provided in an embodiment of the present invention;

[0024] Figure 2 A schematic diagram of an original image provided for an embodiment of the present invention;

[0025] Figure 3 This is a schematic diagram of a field region before truncation provided in an embodiment of the present invention;

[0026] Figure 4 This is a schematic diagram of a field region removal process provided in an embodiment of the present invention;

[0027] Figure 5 A schematic diagram of a background image provided for an embodiment of the present invention;

[0028] Figure 6 A schematic diagram of a sample image provided in an embodiment of the present invention;

[0029] Figure 7 A flowchart of another image generation method provided in an embodiment of the present invention;

[0030] Figure 8 This is a structural diagram of an image generation device provided in an embodiment of the present invention;

[0031] Figure 9 This is a structural diagram of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0032] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, the accompanying drawings show only the parts relevant to the present invention, and not all of the structures.

[0033] Before discussing the exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowcharts describe the operations (or steps) as sequential processes, many of these operations can be performed in parallel, concurrently, or simultaneously. Furthermore, the order of the operations can be rearranged. The process can be terminated when its operation is completed, but it may also have additional steps not included in the figures. The process may correspond to a method, function, procedure, subroutine, subroutine, etc.

[0034] To enable those skilled in the art to better understand the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0035] Before introducing the technical solution, an illustrative application scenario can be provided. This technical solution can generate a large number of sample images that match the layout and background information of the original image based on the acquired original image. These sample images are then used to train an image deep learning model, thereby improving the accuracy of the training results. For example, the original image could be an image used to display items or to explain scenery. Before the deep learning model can perform deep learning on images such as item display images or scenery explanation images, it needs a large number of sample images for training. The solution in this embodiment can quickly and efficiently generate sample images containing the layout and background information of the original image.

[0036] Figure 1This is a flowchart illustrating an image generation method provided in an embodiment of the present invention. This embodiment is applicable to situations where sample images containing layout and background information of the original image are generated for training a deep learning model. The method can be executed by an image generation device, which can be implemented through software and / or hardware, and can be configured in a terminal and / or server to implement the image generation method of this embodiment.

[0037] like Figure 1 As shown, the method in this embodiment may specifically include:

[0038] S110. Obtain the original image and the layout information corresponding to the original image.

[0039] The original image includes at least one original field, and the layout information includes the field type and field position corresponding to each original field.

[0040] In this embodiment, the original image can be the image that the deep learning model to be trained needs to recognize, such as a book introduction image, a landscape introduction image, or a historical figure introduction image. The original field can be a field in the original image composed of at least one of the following: text, letters, numbers, and punctuation marks. The field type can be used to reflect the content characteristics expressed by the original field. For example, if the original image is a book introduction image and includes fields such as "Book A" and "Author B," then the field type corresponding to "Book A" is the book title type, and the field type corresponding to "Author B" is the book author type. Each line of text in the original image can be called a field, or all the text corresponding to each field type can be called a field. The field position can be the image position of each original field in the original image. For example, the field position can include the coordinates of the four vertices of the smallest bounding rectangle to which the field belongs in the original image, or the coordinates of the center point of the original field in the original image, etc.

[0041] To illustrate this solution in detail, the images from the book's description can be used as the original images for display, such as... Figure 2 As shown, "Book A", "Natural Science", and "C12345" are all original fields from the original image. "C12345" corresponds to the book classification number, "Book A" to the book title, and "Natural Science" to the book type. The image also includes fields such as book introduction, author information, publication date, and author. The field position of "Book A" can be the pixel coordinates of the vertices of the smallest bounding rectangle of "Book A" within the book description image.

[0042] In practice, to obtain richer and more diverse sample images, two or more original images and the layout information corresponding to each original image can be acquired. The layout information can also include writing feature information such as the font size, color, font and character fill of the characters that make up each original field. This is beneficial to generate sample images corresponding to the original images in a more vivid and comprehensive way by using the acquired layout information.

[0043] S120. Remove at least one original field from the original image to obtain the background image.

[0044] The background image can be the image composed of the remaining part of the original image after removing at least one original field. Different background images are obtained depending on the number and / or type of the original fields removed.

[0045] like Figure 2 As shown, when all fields in the image are removed, the background image consists of the book cover image, dotted background, and diagonal background. When only a few original fields such as "Book A", "Author B", "Natural Science", "××.××-××.××", "Author B..." and "This Book..." are removed, the background image can be composed of the remaining original fields, the book cover image, dotted background, and diagonal background.

[0046] In specific implementation, the method of removing at least one original field from the original image to obtain a background image includes: determining at least one field to be removed from each original field of the original image, determining the field region corresponding to the field to be removed in the original image, cropping the field region in the original image; extracting the background of the field region corresponding to the field to be removed, and adding the background of the field to the cropped region corresponding to the original image to obtain the background image.

[0047] The removed field can be the field that distinguishes the sample image to be generated from the original image, i.e., the field that the deep learning model to be trained needs to recognize. Those skilled in the art can determine the original field corresponding to the recognition field type in the original image as the removed field according to the recognition field type that the deep learning model to be trained actually needs to recognize. Only one removed field can be determined, or multiple removed fields can be determined at the same time.

[0048] The area to be removed can be any region in the original image containing the field to be removed. The area to be removed in the original image is determined by the position of the field to be removed. The area to be removed includes both the field to be removed and the background of the area to be removed. The background portion of the original image occupied by the area where the field to be removed is defined as the background of the area to be removed.

[0049] In this embodiment, for each removed field, a corresponding removed field region can be determined. In specific implementations, when the original image contains only one original field, and this unique original field is the removed field, the removed field region is any image region in the original image containing the removed field. When the original image contains two or more original fields, to avoid overlap and interference between the determined removed field regions, the minimum bounding region of the removed field in the original image can be determined based on the field position of the removed field, and this minimum bounding region is defined as the removed field region. For example, the minimum bounding rectangle, minimum bounding ellipse, minimum bounding trapezoid, etc., regions of the removed field in the original image can be defined as the removed field region. Figure 3 As shown, when the fields to be removed are "Book A" and "Author B", the areas within the dashed boxes corresponding to "Book A" and "Author B" can be defined as the fields to be removed.

[0050] Furthermore, the area to be removed can be cropped from the original image. After cropping the area, the removed portion can be left blank and filled with a preset color and background. For example, after removing the areas corresponding to "Book A" and "Author B", the resulting image would look like this: Figure 4 As shown.

[0051] To ensure the background image more accurately and completely reflects the background information of the original image, the background of the cropped area can be added to the corresponding cropped area of the original image to obtain a complete background image. For example... Figure 5 As shown, the background of the area where the fields for "Book A" and "Author B" are removed is a diagonal background. The diagonal background can be filled in and added to the cropped area of the original image to obtain the background image after removing the original fields.

[0052] In practical implementation, the method for extracting the background of the removed region can be as follows: The region to be removed and the removed field are input into a pre-trained deep learning network for background extraction, which then serves as the input background for the removed region. Specifically, the deep learning network for background extraction can be pre-trained by using a generator to generate background information of the removed field region of the original image, and supervising the generated background information to ensure that the generated background information is similar to the background information of the removed field region, thereby improving the accuracy of the deep learning network in extracting the background of the removed region. Alternatively, the background of the removed region can be determined directly through image information recognition and processing.

[0053] In this embodiment, the background of the removed area is extracted by removing the field area, and the removed background is added to the cropped area of the original image to obtain the background image. That is, in practice, only the background of the removed area with a small area needs to be extracted, without the need to identify and determine the background information of each pixel in the original image, which greatly reduces the workload of determining the background image and helps to improve the efficiency of determining the background image.

[0054] Optionally, the implementation of identifying at least one field to be removed from each of the original fields of the original image may include: identifying original fields in the original image whose field type is an addition field type based on the field type, and identifying the identified at least one original field whose field type is an addition field type as a removal field.

[0055] In this embodiment, the field types include fixed field types and added field types. The original fields corresponding to the fixed field types are fields that do not need to be replaced. For example, for any book introduction image, fields such as "book name", "book author", "book type", and "publication date" are required. Therefore, these fields can appear as fixed fields in the generated sample image. As for fields such as "Book A", "Author B", and "C12345", different books have different names, authors, book numbers, etc. Therefore, they can be set as the original fields of the added field type.

[0056] In practical implementation, only the fields to be removed can be identified from the original fields of the added field type and removed, while the original fields of the fixed field type remain unchanged. Through the implementation method in this embodiment, for original images with many original fields of the fixed field type, the original fields of the fixed field type can be regarded as part of the background image. This ensures the accuracy of the generated sample image while flexibly determining the background area, greatly reducing the total workload of replacing original fields when generating sample images and improving the efficiency of sample image generation.

[0057] S130. In at least one pre-built corpus, determine the removal corpus corresponding to the field type of the original field to be removed, and determine the sample field corresponding to the original field to be removed in the removal corpus.

[0058] The corpus stores at least one sample field, which is used to replace the removed field in the original image to generate a sample image. Those skilled in the art can set the number of sample fields stored in the corpus according to the number of sample images to be generated. Corpora can be pre-built based on the field type of the original field. For example, for each field type of the original field, a matching corpus can be built, where the field types of the sample fields stored in this corpus are consistent with the field types of the original field. The removal corpus is the corpus that matches the field type of the removed field.

[0059] For example, for the "Book Title" field type in the original image, a corresponding "Book Title" corpus can be constructed. This corpus may include fields of book title type such as "a1 book", "a2 book", "a3 book", "a4 book", etc. Similarly, for the "Book Author" field type in the original image, a corresponding "Book Author" corpus can be constructed. This corpus may include fields of book author type such as "b1 author", "b2 author", "b3 author", "b4 author", etc.

[0060] In practical implementation, when the corpus includes two or more fields, the number of sample fields to be selected can be determined based on the number of sample images to be generated. For example, let n be the number of sample images to be generated, where n is a positive integer greater than 1. When n is less than or equal to the number of fields to be removed from the corpus, then n fields can be determined from the corpus as sample fields. For example, n fields can be randomly determined from the corpus; or, the number of characters to be removed from each field can be determined, and n fields with a number of characters less than or equal to that number can be determined from the corpus. These fields can then be designated as sample fields, allowing the determined sample fields to be added to the character removal area without overlapping with other fields.

[0061] When n is greater than the number of fields in the corpus to be removed, in order to obtain different sample images, the fields in the corpus to be removed corresponding to each field can be randomly combined to obtain n different field combinations, and the fields in each field combination are determined as sample fields.

[0062] S140. Based on the field position of the removed original field in the original image, add the sample field to the background image to generate a sample image corresponding to the original image.

[0063] In this embodiment, the field position corresponding to the removed original field may include the field start position and the field center position. The field start position may include the coordinates of the first vertex of any left vertex in the first minimum bounding rectangle to which the removed original field belongs; the field center position may include the coordinates of the first center point in the first minimum bounding rectangle to which the removed original field belongs.

[0064] Specifically, adding the sample field to the background image based on the starting position of the field includes: determining the second minimum bounding rectangle of the sample field, using the coordinates of the first vertex as the coordinates of the second vertex; the second vertex is the vertex to the left of the second minimum bounding rectangle of the sample field, and adding the sample field to the background image according to the determined coordinates of the second vertex.

[0065] Specifically, adding a sample field to a background image based on its center position includes: determining the second minimum bounding rectangle of the sample field, using the coordinates of the first center point as the coordinates of the second center point; the second center point is the center point of the second minimum bounding rectangle of the sample field, and adding the sample field to the background image according to the coordinates of the second center point.

[0066] In this embodiment, adding a sample field to a background image includes: determining the writing features of the original field corresponding to the sample field; performing feature transformation on the sample field according to the writing features; updating the field obtained after feature transformation to the sample field; and adding the updated sample field to the background image.

[0067] The writing features include at least one of the following: font size, font type, text color, and text fill.

[0068] To more comprehensively and completely represent the writing characteristics of the original fields in the original image, feature transformation can be performed on the sample fields before adding them to the background image. For example, the content of the removed original field is "B writer," font size 5, font type SimSun, color black, and fill color red-blue gradient; the content of the sample field is "D writer," font size 4, font type KaiTi, color red, and no fill. The content of the sample field can be transformed according to the writing characteristics of the removed original field, that is, the content "D writer" can be converted to a format with a font size of 5, font type SimSun, color black, and fill color red-blue gradient, and then the feature-transformed sample field can be added to the background image. This embodiment, by transforming the writing characteristics of the sample field according to the writing characteristics of the corresponding original field, helps to improve the accuracy of the generated sample image.

[0069] In this embodiment, feature transformation of sample fields according to writing characteristics may include: directly using image processing to identify the writing characteristics of the original field, and adjusting the writing characteristics of the sample field according to the identified writing characteristics, ultimately obtaining a sample field that matches the writing characteristics of the original field. Alternatively, feature transformation of sample fields according to writing characteristics may also include: inputting the original field and the sample field into a pre-trained deep learning network to perform feature transformation on the sample field. Specifically, the deep learning network used for feature transformation can be pre-trained by adversarial training based on the generator and discriminator in the deep learning network to generate a deep learning network that meets accuracy requirements; the sample field and the corresponding original field are both input into the trained deep learning network to generate a sample field corresponding to the writing characteristics of the original field. Obtaining the feature-transformed sample field through the generator in the deep learning network improves the convenience and effectiveness of feature transformation.

[0070] In practice, adding sample fields to a background image can be done by overlaying the sample fields onto the background image. To display the information of the sample fields more clearly and richly in the sample image, the background pattern portion that overlaps with the sample fields in the overlaid background image can be removed, leaving only the color, fill, and other features of the sample fields in the overlapping portion; alternatively, by adjusting the transmittance of the sample fields and the overlapping background pattern portion, the content of the sample fields and the content of the background pattern portion can be highlighted to different degrees, thereby enhancing the diversity of the generated sample images.

[0071] In this embodiment, a sample field region containing the sample fields can also be obtained by adding the sample fields to the extracted background area. This sample field region is then added to the corresponding position of the background image's removed field region to generate a sample image. Similarly, deep learning or image recognition processing can be used to add the sample fields to the background area to obtain the sample field region.

[0072] To demonstrate in detail the differences between the sample image and the original image, such as Figure 2 , Figure 6As shown, "Book A" in the original image is replaced with "Book a1" in the sample image, and "Author B" in the original image is replaced with "Author b1" in the sample image. The original fields corresponding to book type, author introduction, and book description are all replaced with sample fields. However, the layout information of the original image is still retained in the sample image; that is, the field type and field position have not changed, and the background information in the sample image is consistent with the background information in the original image. This embodiment of the invention can be used to generate a large number of sample images. While generating the sample images, the layout information of the sample images is determined simultaneously. When using the sample images for deep learning model training, there is no need to annotate the layout information of the sample images again, which helps to reduce training costs.

[0073] The image generation method provided by this invention involves obtaining an original image and its corresponding layout information. The original image includes at least one original field, and the layout information includes the field type and position of each original field. At least one original field is removed from the original image to obtain a background image. Based on the field type of the removed original field, a removal corpus corresponding to the removed original field can be determined from at least one pre-built corpus. A sample field corresponding to the removed original field is then determined from the removal corpus. Based on the position of the removed original field in the original image, the sample field is added to the background image to generate a sample image corresponding to the original image. This ensures that the generated sample image maintains the same layout information as the original image and still uses the original image's background as the background of the sample image. This solves the problem in existing technologies where sample images cannot reflect layout and background information. The generated sample image comprehensively and completely reflects the features of the original image. When training a deep learning model using the sample image generated by this invention, it helps improve the accuracy of the training results.

[0074] Figure 7 This is a flowchart of another image generation method provided by an embodiment of the present invention; this embodiment is based on and optimized from the above-described technical solutions. Optionally, before obtaining the background image, the maximum image range occupied by the original field in the original image can also be determined; optionally, based on the maximum image range, the sample field corresponding to the original field to be removed is determined in the removal corpus. The explanations of terms that are the same as or corresponding to those in the above embodiments will not be repeated here.

[0075] like Figure 7 As shown, the method in this embodiment may specifically include:

[0076] S210. Obtain the original image and the layout information corresponding to the original image.

[0077] In this embodiment, to increase the diversity and richness of the generated sample images and improve the training accuracy of the deep learning model, the layout information of the original image can be updated, and corresponding sample images can be generated based on the original images with different layout information. Optionally, before determining the maximum image range occupied by each original field in the original image, the movement path of the original field is determined based on preset layout requirements; according to the movement path, the original field is moved on the original image, and the original image and layout information are updated based on the moved image.

[0078] The layout requirements can be as follows: the distance between the original field and the edge of the original image is greater than a preset threshold to prevent the original field from being too close to the image edge and difficult to recognize; or the distance between the original fields in the original image is less than a preset threshold to increase the correlation between the fields; or the fields in the image after movement do not overlap to ensure that the content of each original field displayed in the original image is complete and clear. The movement path can be any path that meets the layout requirements, and can be a straight line, polyline, or curve between the position before and after movement.

[0079] In practical implementation, a path that meets the layout requirements and has a total length greater than a preset length threshold can be determined as the movement path. This increases the degree of change in the movement of the original fields and enhances the distinction between the original image after the layout information is updated and the original image before the update. The preset length threshold is less than the diagonal length of the original image. Those skilled in the art can set the preset length threshold according to the actual application, and this embodiment of the invention does not limit this. Furthermore, the original fields can be moved according to the movement path. Based on the image obtained after the original fields to be moved are moved, the original image is updated, and new layout information is obtained based on the movement path.

[0080] This embodiment improves the diversity and richness of sample images by transforming and updating the layout information of the original images, which helps to enhance the generalization ability of the deep learning model to be trained.

[0081] S220. Based on the layout information and preset layout requirements, determine the maximum image range occupied by each original field in the original image.

[0082] In this embodiment, the preset layout requirements include that the original fields in the original image do not overlap. The maximum image range may include the maximum area range and / or the maximum number of characters. The maximum area range is the largest image area composed of the image positions where the original fields are placed, when the layout requirements are met; the maximum area range can be a region composed of rectangles, circles, trapezoids, and irregular shapes. The maximum number of characters is the maximum number of characters that can be placed in the original image when the layout requirements are met.

[0083] S230. Remove at least one original field from the original image to obtain the background image.

[0084] S240. In at least one pre-built corpus, determine the removal corpus corresponding to the field type of the original field to be removed, and based on the maximum image range, determine the sample field corresponding to the original field to be removed in the removal corpus.

[0085] When the maximum image range is the maximum character count range, the character count of each field in the corpus can be determined to be removed, and the fields with a character count less than or equal to the maximum character count range can be determined as sample fields, thereby ensuring that no field overlap occurs when the sample fields are added to the background image.

[0086] When the maximum image range is the maximum region range, the methods for determining the sample field corresponding to the original field to be removed in the removal corpus include: determining the field-occupied region range of each field in the removal corpus; and determining the fields in the removal corpus that meet the preset range requirements as sample fields.

[0087] The range requirement includes that the field's occupied area be less than or equal to the maximum image range. The field's occupied range can be the minimum bounding region range of each field in the corpus after removing it.

[0088] To facilitate a quick and intuitive determination of whether a sample field meets the range requirements, the range of the area occupied by the field can be set to the minimum bounding rectangle range of the field in the corpus; the maximum image range is the maximum rectangle range; the range requirements include that the minimum horizontal range of the minimum bounding rectangle range is less than or equal to the maximum horizontal range of the maximum rectangle range, and the minimum vertical range of the minimum bounding rectangle range is less than or equal to the maximum vertical range of the maximum rectangle range.

[0089] In practical implementation, the vertex positions of each vertex of the maximum rectangular range of each original field can be determined by the field positions of each original field in the layout information. The maximum horizontal range is determined based on the distance between two vertex positions in the horizontal direction, and the maximum vertical range is determined based on the distance between two vertex positions in the vertical direction. In this embodiment, by defining fields whose occupied area is less than or equal to the maximum image range as sample fields, it can be ensured that there is no overlap between fields after the sample fields are superimposed on the background image. Furthermore, by setting the maximum image range as a rectangular area, the vertex positions can be intuitively and conveniently compared with the occupied area range of the fields, which helps to improve the efficiency of determining sample fields.

[0090] This embodiment determines the sample fields based on the maximum image range, which ensures that no field overlap occurs; furthermore, by using the maximum image range, all fields in the corpus that can be used as sample fields can be filtered out, avoiding field omissions and improving the diversity of sample field selection.

[0091] S250. Based on the field position of the removed original field in the original image, add the sample field to the background image to generate a sample image corresponding to the original image.

[0092] The image generation method provided in this embodiment of the invention generates sample images that can comprehensively and completely reflect the features of the original image. When training a deep learning model using the sample images generated in this embodiment of the invention, it helps to improve the accuracy of the training results.

[0093] Figure 8 This is a structural diagram of an image generation apparatus provided in an embodiment of the present invention. This apparatus is used to execute the image generation method provided in any of the above embodiments. This apparatus and the image generation methods of the above embodiments belong to the same inventive concept. Details not described in detail in the embodiments of the image generation apparatus can be found in the embodiments of the above image generation methods. Specifically, the apparatus may include:

[0094] The layout information receiving module 10 is used to obtain the original image and the layout information corresponding to the original image; wherein, the original image includes at least one original field, and the layout information includes the field type and field position corresponding to each original field;

[0095] Background image determination module 11 is used to remove at least one original field from the original image to obtain a background image;

[0096] The sample field determination module 12 is used to determine, in at least one pre-built corpus, the removal corpus corresponding to the field type of the original field to be removed, and to determine the sample field corresponding to the original field to be removed in the removal corpus.

[0097] The sample image generation module 13 is used to add the sample field to the background image based on the field position of the removed original field in the original image, thereby generating a sample image corresponding to the original image.

[0098] Based on any optional technical solution in the embodiments of the present invention, the device may optionally further include:

[0099] The maximum image range determination module is used to determine the maximum image range occupied by each original field in the original image based on the layout information and preset layout requirements before removing at least one original field from the original image to obtain the background image; wherein, the preset layout requirements include that the original fields in the original image do not overlap or duplicate.

[0100] The sample field determination module 12 includes:

[0101] The sample field determination unit is used to determine, based on the maximum image range, the sample field corresponding to the original field to be removed in the removal corpus.

[0102] Based on any optional technical solution in the embodiments of the present invention, the device may optionally further include:

[0103] The movement path determination module is used to determine the movement path of the original fields based on the layout requirements before determining the maximum image range occupied by each original field in the original image;

[0104] The layout information update module is used to move the original field on the original image according to the moving path, and update the original image and the layout information based on the image obtained by the movement.

[0105] Based on any optional technical solution in the embodiments of the present invention, optionally, the sample field determination unit includes:

[0106] The field occupancy range determination subunit is used to determine the field occupancy range of each field in the removed corpus;

[0107] The sample field determination subunit is used to determine the fields in the removed corpus that meet the preset range requirements as sample fields; wherein, the range requirements include the field occupying an area range that is less than or equal to the maximum image range.

[0108] Based on any optional technical solution in the embodiments of the present invention, optionally, the range of the field occupied area is the minimum bounding rectangle range of the field in the corpus, and the maximum image range is the maximum rectangle range; the range requirement includes that the minimum horizontal range of the minimum bounding rectangle range is less than or equal to the maximum horizontal range of the maximum rectangle range, and the minimum vertical range of the minimum bounding rectangle range is less than or equal to the maximum vertical range of the maximum rectangle range.

[0109] Based on any optional technical solution in the embodiments of the present invention, optionally, the sample image generation module 13 includes:

[0110] A writing feature determination unit is used to determine the writing features of the original field corresponding to the sample field, and to perform feature transformation on the sample field according to the writing features;

[0111] The sample field update unit is used to update the fields obtained after feature transformation into sample fields, and add the updated sample fields to the background image;

[0112] The writing features include at least one of font size, font, text color, and text fill.

[0113] Based on any optional technical solution in the embodiments of the present invention, optionally, the writing feature determination unit includes:

[0114] The field input subunit is used to input the original field and the sample field into a pre-trained deep learning network to perform feature transformation on the sample field.

[0115] Based on any optional technical solution in the embodiments of the present invention, optionally, the background image determination module 11 includes:

[0116] A field removal determination unit is configured to determine at least one field to be removed from each of the original fields of the original image, determine the field removal region corresponding to the field removal in the original image, and crop the field removal region in the original image.

[0117] The background extraction unit is used to extract the background of the area to be removed corresponding to the area to be removed, and add the background of the area to be removed to the cropped area corresponding to the original image to obtain the background image.

[0118] Based on any optional technical solution in the embodiments of the present invention, the field type may optionally include a fixed field type and an added field type;

[0119] The field removal determination unit includes:

[0120] The field removal determination subunit is used to determine, based on the field type, an original field in the original image whose field type is an added field type, and to determine at least one original field whose field type is an added field type as the field to be removed.

[0121] The image generation apparatus provided in the embodiments of the present invention can execute the image generation method provided in any embodiment of the present invention, and has the corresponding functional modules and beneficial effects of executing the method.

[0122] It is worth noting that in the embodiments of the above image generation device, the various units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be achieved; in addition, the specific names of each functional unit are only for easy distinction between each other and are not used to limit the scope of protection of the present invention.

[0123] Figure 9 This is a structural diagram of an electronic device provided in an embodiment of the present invention. Figure 9 A block diagram of an exemplary electronic device 20 suitable for implementing embodiments of the present invention is shown. The illustrated electronic device 20 is merely an example and should not be construed as limiting the functionality and scope of the embodiments of the present invention.

[0124] like Figure 9 As shown, the electronic device 20 is presented in the form of a general-purpose computing device. The components of the electronic device 20 may include, but are not limited to: one or more processors or processing units 201, system memory 202, and bus 203 connecting different system components (including system memory 202 and processing unit 201).

[0125] Bus 203 represents one or more of several bus architectures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of the various bus architectures. For example, these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA bus, the Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect (PCI) bus.

[0126] Electronic device 20 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 20, including volatile and non-volatile media, removable and non-removable media.

[0127] System memory 202 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 204 and / or cache memory 205. Electronic device 20 may further include other removable / non-removable, volatile / non-volatile computer system storage media. By way of example only, storage system 206 may be used to read and write non-removable, non-volatile magnetic media. Disk drives for reading and writing to removable non-volatile disks (e.g., "floppy disks") and optical disk drives for reading and writing to removable non-volatile optical disks (e.g., CD-ROMs, DVD-ROMs, or other optical media) may be provided. In these cases, each drive may be connected to bus 203 via one or more data media interfaces. Memory 202 may include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of the embodiments of the present invention.

[0128] A program / utility 208 having a set (at least one) of program modules 207 may be stored, for example, in memory 202. Such program modules 207 include, but are not limited to, an operating system, one or more application programs, other program modules, and program data. Each or some combination of these examples may include an implementation of a network environment. Program modules 207 typically perform the functions and / or methods described in the embodiments of the present invention.

[0129] Electronic device 20 can also communicate with one or more external devices 209 (e.g., keyboard, pointing device, display 210, etc.), and with one or more devices that enable a user to interact with electronic device 20, and / or with any device that enables electronic device 20 to communicate with one or more other computing devices (e.g., network card, modem, etc.). This communication can be performed via input / output (I / O) interface 211. Furthermore, electronic device 20 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via network adapter 212. As shown, network adapter 212 communicates with other modules of electronic device 20 via bus 203. It should be understood that other hardware and / or software modules can be used in conjunction with electronic device 20, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

[0130] The processing unit 201 executes various functional applications and data processing by running programs stored in the system memory 202.

[0131] The present invention provides an electronic device capable of performing the following method: acquiring an original image and its corresponding layout information; wherein the original image includes at least one original field, and the layout information includes the field type and field position corresponding to each original field; removing at least one original field from the original image to obtain a background image; determining a removal corpus corresponding to the field type of the removed original field in at least one pre-constructed corpus, and determining a sample field corresponding to the removed original field in the removal corpus; adding the sample field to the background image based on the field position of the removed original field in the original image to generate a sample image corresponding to the original image. The embodiments of the present invention solve the problem that sample images in the prior art cannot reflect layout and background information. The generated sample image can comprehensively and completely reflect the features of the original image, which is beneficial to improving the accuracy of training results when using the sample images generated by the embodiments of the present invention for deep learning model training.

[0132] This invention provides a storage medium containing computer-executable instructions, which, when executed by a computer processor, are used to perform an image generation method, the method comprising:

[0133] The process involves obtaining an original image and its corresponding layout information. The original image includes at least one original field, and the layout information includes the field type and position of each original field. At least one original field is removed from the original image to obtain a background image. In at least one pre-built corpus, a removal corpus corresponding to the field type of the removed original field is determined, and a sample field corresponding to the removed original field is determined in the removal corpus. Based on the position of the removed original field in the original image, the sample field is added to the background image to generate a sample image corresponding to the original image. This invention solves the problem that sample images in the prior art cannot reflect layout and background information. The generated sample image can comprehensively and completely reflect the features of the original image. When training a deep learning model using the sample image generated by this invention, it helps improve the accuracy of the training results.

[0134] Of course, the computer-executable instructions provided in the embodiments of the present invention are not limited to the method operations described above, but can also perform related operations in the image generation method provided in any embodiment of the present invention.

[0135] The computer storage medium of this invention can be any combination of one or more computer-readable media. A computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. A computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

[0136] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, capable of sending, propagating, or transmitting programs for use by or in connection with an instruction execution system, apparatus, or device.

[0137] Program code contained on a computer-readable medium may be transmitted using any suitable medium, including—but not limited to—wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0138] Computer program code for performing the operations of embodiments of the present invention can be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, and C++, and conventional procedural programming languages such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0139] Note that the above description is merely a preferred embodiment of the present invention and the technical principles employed. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments, and substitutions can be made without departing from the scope of protection of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and may include many other equivalent embodiments without departing from the concept of the present invention, the scope of which is determined by the scope of the appended claims.

Claims

1. An image generation method, characterized in that, include: Obtain the original image and the layout information corresponding to the original image; wherein, the original image includes at least one original field, and the layout information includes the field type and field position corresponding to each original field, and the field type is used to reflect the content features expressed by the original field; Based on the layout information and the preset layout requirements, the maximum image range occupied by each original field in the original image is determined; wherein, the preset layout requirements include that the original fields in the original image do not overlap or duplicate. At least one original field is removed from the original image to obtain the background image; In at least one pre-built corpus, a removal corpus corresponding to the field type of the original field to be removed is determined, and the field occupancy range of each field in the removal corpus is determined. Fields in the corpus that meet preset range requirements are identified as sample fields; wherein, the range requirements include the field occupying an area that is less than or equal to the maximum image range; Based on the position of the removed original field in the original image, the sample field is added to the background image to generate a sample image corresponding to the original image.

2. The method of claim 1, wherein, Before determining the maximum image range occupied by each original field in the original image, the method further includes: Based on the preset layout requirements, determine the movement path of the original field; According to the movement path, the original field is moved on the original image, and the original image and the layout information are updated based on the image obtained from the movement.

3. The method of claim 1, wherein, The area occupied by the field is the minimum bounding rectangle of the field in the corpus, and the maximum image range is the maximum rectangle range; the range requirement includes that the minimum horizontal range of the minimum bounding rectangle is less than or equal to the maximum horizontal range of the maximum rectangle range, and the minimum vertical range of the minimum bounding rectangle is less than or equal to the maximum vertical range of the maximum rectangle range.

4. The method according to claim 1, characterized in that, Adding the sample field to the background image includes: Determine the writing characteristics of the original field corresponding to the sample field, and perform feature transformation on the sample field according to the writing characteristics; The fields obtained after feature transformation are updated as sample fields, and the updated sample fields are added to the background image; The writing features include at least one of font size, font, text color, and text fill.

5. The method of claim 4, wherein, The feature transformation of the sample field according to the writing features includes: The original field and the sample field are input into a pre-trained deep learning network to perform feature transformation on the sample field.

6. The method of claim 1, wherein, The step of removing at least one original field from the original image to obtain the background image includes: In the original image, at least one field to be removed is determined from each of the original fields, the region of the field to be removed in the original image is determined, and the region of the field to be removed is cropped in the original image; Extract the background of the removed area corresponding to the removed field area, and add the background of the removed area to the cropped area corresponding to the original image to obtain the background image.

7. The method of claim 6, wherein, The field types include fixed field types and added field types; The step of determining at least one field to be removed from each of the original fields of the original image includes: Based on the field type, an original field of type "add field" is determined in the original image, and at least one original field of type "add field" is determined as the field to be removed.

8. An image generation apparatus characterized by comprising: include: A layout information receiving module is used to acquire an original image and layout information corresponding to the original image; wherein, the original image includes at least one original field, and the layout information includes the field type and field position corresponding to each original field, and the field type is used to reflect the content features expressed by the original field; A background image determination module is used to remove at least one original field from the original image to obtain a background image; The sample field determination module is used to determine, in at least one pre-built corpus, a removal corpus corresponding to the field type of the original field to be removed, and to determine, in the removal corpus, a sample field corresponding to the original field to be removed; The sample image generation module is used to add the sample field to the background image based on the field position of the removed original field in the original image, thereby generating a sample image corresponding to the original image. The device further includes: The maximum image range determination module is used to determine the maximum image range occupied by each original field in the original image based on the layout information and preset layout requirements before removing at least one original field from the original image to obtain the background image; wherein, the preset layout requirements include that the original fields in the original image do not overlap or duplicate. The sample field determination module includes: The sample field determination unit is used to determine, based on the maximum image range, the sample field corresponding to the original field to be removed in the removal corpus; The sample field determination unit includes: The field occupancy range determination subunit is used to determine the field occupancy range of each field in the removed corpus; The sample field determination subunit is used to determine the fields in the removed corpus that meet the preset range requirements as sample fields; wherein, the range requirements include the field occupying an area range that is less than or equal to the maximum image range.

9. An electronic device, comprising: include: One or more processors; Memory, used to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the image generation method as described in any one of claims 1-7.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that, When the computer program is executed by a processor, it implements the image generation method as described in any one of claims 1-7.

Citation Information

Patent Citations

Sample image generation method and character recognition model training method
CN114998897A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Sample image generation method and character recognition model training method