An object printing precision extraction method, device, equipment and medium

By employing targeted LoRA training and print repetition constraints, the problems of low text recognition accuracy, poor batch training consistency, and weak background interference removal capabilities of the Kontext model in print extraction are solved, achieving efficient and accurate print pattern extraction that is suitable for multiple materials and batch processing.

CN122265765APending Publication Date: 2026-06-23XIAMEN ZIXUN INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAMEN ZIXUN INFORMATION TECHNOLOGY CO LTD
Filing Date
2026-03-23
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing Kontext models suffer from low text recognition accuracy, poor batch training consistency, limited ability to handle complex prints, and weak ability to remove background interference in the extraction of prints from object surfaces, making it difficult to meet the needs of industrial-grade and standardized print extraction.

Method used

By optimizing the LoRA training with targeted printing repetition constraints, the LoRA module is inserted into the cross-attention layer of the diffusion model. Combined with a weighted loss function of character recognition loss, perceptual loss and structural similarity loss, and using a four-fold repetition constraint mechanism, the model's ability to accurately recognize printed text, color and texture is enhanced, and background interference is removed.

Benefits of technology

It achieves high consistency and integrity of print extraction results, high text recognition accuracy, low background residue rate, significantly improved efficiency, adaptability to various materials, and meets batch processing needs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122265765A_ABST
    Figure CN122265765A_ABST
Patent Text Reader

Abstract

The application provides an object printing accurate extraction method, device, equipment and medium, the method comprises the following steps: collecting an object image containing a single printing as an input image, and obtaining a pure printing image corresponding to each input image as a target image; preprocessing the input image and the target image to obtain a training data set; inserting a LoRA module in the cross-attention layer of the diffusion model, training the diffusion model after inserting the LoRA module using the training data set to obtain a printing extraction model; inputting an object image to be extracted into the printing extraction model and inputting a preset extraction prompt word to guide the printing extraction model to generate an intermediate image containing four repeated arrangement printings; removing the background of the intermediate image, extracting a single printing unit from the image after removing the background, and outputting a pure single printing pattern, thereby improving the extraction efficiency and accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to a method, apparatus, device, and medium for accurately extracting printed images from objects. Background Technology

[0002] In the production, processing, pattern reuse, information traceability, and quality inspection of various printed objects such as textiles, daily necessities, and industrial components, accurate extraction of printed patterns from the object surface is a core technical step, directly determining the effectiveness of subsequent work such as print replication, text verification, and pattern optimization. Currently, print extraction technology based on the Kontext model, with its basic feature recognition and extraction capabilities, has become a commonly used technique in this field. However, limited by multiple factors such as model architecture, training conditions, and scene adaptability, this technology suffers from several core defects, failing to balance accuracy and consistency in print extraction, and thus struggling to meet the industrial-grade, standardized requirements for print extraction. Specific problems are as follows: I. Existing Kontext models have significant limitations in recognizing printed text on object surfaces, making it difficult to guarantee accurate text extraction. Even when the printed pattern is clear enough and the printed text is a simple layout of only 1-2 lines, the model is still prone to character confusion and misjudgment. For example, it may misidentify the number "96" as "86". Such basic character recognition errors directly lead to the distortion of the extracted printed text information, making it impossible to provide reliable data support for subsequent scenarios such as accurate reuse of printed patterns, text information traceability and verification, and product information archiving, severely restricting the development of related work. Second, LoRA fine-tuning training based on the Kontext model is limited by the hardware's memory capacity, requiring the batch size to be set to 1 during training, necessitating a single-sample independent training mode. This training mode cannot achieve batch feature learning for similar printed samples, making it difficult for the model to form a unified and stable standard for print extraction. For similar printed objects with consistent style and layout, the model's extraction results vary significantly: some samples can completely preserve the detailed strokes of the printed text, while others show missing strokes or structural defects, resulting in inconsistent extraction effects and making standardized batch extraction impossible. Third, when traditional print extraction methods are used in conjunction with the Kontext model, their ability to handle complex print elements is limited. When dealing with details such as color gradients, fine lines, and intricate textures, two types of problems easily arise: First, print elements are missed, and key elements such as the corner details, fine lines, and texture layers of complex patterns cannot be fully extracted, resulting in incomplete print patterns. Second, the extracted content is redundant, as the model cannot accurately distinguish between the print itself and the object's base, mistakenly including the object's own background texture in the print extraction range. This results in the extracted print pattern containing invalid and redundant information, lacking completeness and purity, and cannot be directly used for subsequent print replication, secondary design, and other scenarios, requiring additional manual correction. IV. In practical applications, object surfaces commonly exhibit background interference factors such as fabric texture, outer shell reflection, and base color variations. Existing print extraction technologies based on the Kontext model have weak capabilities in removing such background interference, failing to achieve complete separation of the foreground print from the background. The extracted print patterns often retain residual background images, color variations, and other impurities, affecting not only the aesthetics and usability of the print pattern but also requiring significant manual retouching and noise removal. This greatly increases the complexity and labor costs of the print extraction process, reduces overall processing efficiency, and makes it difficult to meet the demands of large-scale, high-efficiency industrial production. Summary of the Invention

[0003] The technical problem to be solved by this invention is to provide a method, apparatus, device and medium for accurate extraction of printed material. By targeted LoRA training optimization and print repetition arrangement constraints, the model’s ability to accurately recognize printed text, color and texture is enhanced, the consistency and completeness of the extraction results are improved, and finally a single printed pattern with no background interference and accurate information is output, which meets the needs of pattern reuse, batch processing and other scenarios, and reduces the cost of manual intervention.

[0004] In a first aspect, the present invention provides a method for accurately extracting printed patterns from objects, comprising the following steps: Step S1: Collect an image of an object containing a single print as the input image, and obtain a clean print image corresponding to each input image as the target image; preprocess the input image and the target image to unify the image resolution and remove image noise to obtain the training dataset; Step S2: Insert a LoRA module into the cross-attention layer of the diffusion model, and train the diffusion model after inserting the LoRA module using the training dataset to obtain the print extraction model; wherein, the loss function used in the training process is a weighted sum of character recognition loss, perceptual loss and structural similarity loss; Step S3: Input the image of the object to be extracted into the print extraction model and input the preset extraction prompt words to guide the print extraction model to generate an intermediate image containing a print with four repeated arrangements. Step S4: Remove the background from the intermediate image, extract a single printing unit from the image after background removal, and output a clean single printing pattern.

[0005] Secondly, the present invention provides a device for precise extraction of printed object patterns, comprising: The data construction module collects images of objects containing a single print as input images and obtains clean print images corresponding to each input image as target images; the input images and target images are preprocessed to unify image resolution and remove image noise to obtain a training dataset; The model training module inserts a LoRA module into the cross-attention layer of the diffusion model, and trains the diffusion model after inserting the LoRA module using the training dataset to obtain the print extraction model; wherein, the loss function used in the training process is a weighted sum of character recognition loss, perceptual loss and structural similarity loss; The print extraction module inputs the image of the object to be extracted into the print extraction model and inputs a preset extraction prompt word to guide the print extraction model to generate an intermediate image containing a print with four repeated arrangements. The post-processing module removes the background from the intermediate image, extracts a single printing unit from the background-removed image, and outputs a clean single printing pattern.

[0006] Thirdly, the present invention provides an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method described in the first aspect.

[0007] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described in the first aspect.

[0008] One or more technical solutions provided by this invention have at least the following technical effects or advantages: 1. The structural similarity index (SSIM) between the extracted pattern and the original printed pattern achieved by this invention can reach over 0.92, far exceeding the 0.75 to 0.85 of traditional methods. This is due to the design of the perceptual loss function, which extracts high-level semantic features through a pre-trained VGG16 network, ensuring the structural consistency between the extracted pattern and the original printed pattern.

[0009] 2. The color difference ΔE between the extracted result and the original print is controlled within 3.0, while traditional methods typically range from 5.0 to 8.0. The powerful generation capability of the diffusion model, combined with color correction in the post-processing stage, ensures accurate reproduction of the printed color.

[0010] 3. The background residue rate in the extraction results of this invention is no more than 2%, while that of traditional methods is as high as 10% to 15%. This advantage stems from the repetition constraint extraction mechanism, which requires the model to generate a four-fold repetition of the print, forcing the model to deeply understand the essential characteristics of the print, thereby more accurately distinguishing the print area from the background area.

[0011] 4. In terms of single image processing, the present invention can complete the print extraction of an image in only 2 to 3 seconds, while manual processing of the same task takes 3 to 5 minutes, improving efficiency by 60 to 100 times.

[0012] 5. This invention processes 1000 images in approximately 1 hour, compared to approximately 70 hours for manual processing, representing a 70-fold increase in efficiency. The batch processing module supports features such as automatic folder reading, parallel calling, and automatic retries on failure, ensuring reliability for large-scale applications.

[0013] 6. In terms of complex background processing, this invention achieves fully automated processing without any manual intervention, while traditional methods often require manual and meticulous image cutout when processing complex backgrounds, which heavily relies on the operator's experience and patience.

[0014] 7. This invention uses no fewer than 300 images covering various materials such as textiles, plastics, metals, paper, and ceramics as training data, ensuring the model's generalization ability across different material surfaces and avoiding overfitting to specific materials. The image resolution is uniformly adjusted to 1024×1024 pixels, achieving an optimal balance between computational efficiency and detail preservation, avoiding the memory pressure caused by high resolution while ensuring that print details are not lost.

[0015] 8. This invention designs a combined loss function, including character recognition loss, perceptual loss, and structural similarity loss, while simultaneously optimizing three dimensions: character recognition, texture detail, and structural integrity. This avoids the limitations of a single loss function and achieves a comprehensive improvement in extraction quality. Employing LoRA lightweight fine-tuning technology, only about 0.1% of the model's parameters are updated, reducing training memory usage by 80% and training time by 70%, significantly lowering the computational resource requirements for model training and making it affordable for small and medium-sized enterprises.

[0016] 9. The core innovation of this invention lies in the four-fold repetition constraint mechanism. This mechanism forces the model to generate consistent printing units, effectively eliminating the instability of single generation and improving extraction completeness by more than 15% compared to unconstrained methods. This effect exceeded expectations; the four-fold repetition, originally only a requirement for output format, unexpectedly proved to force the model to learn the consistent characteristics of the printing, improving extraction completeness by more than 15%.

[0017] 10. LoRA technology was originally designed for generative tasks such as style transfer. This invention is the first to apply it to the inverse task of extracting from the background, and unexpectedly found that it can effectively suppress background generation while preserving details, with performance even better than full model fine-tuning. This discovery was difficult for those skilled in the art to foresee.

[0018] The above description is merely an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention and to implement it in accordance with the contents of the specification, and in order to make the above and other objects, features and advantages of the present invention more apparent and understandable, specific embodiments of the present invention are described below. Attached Figure Description

[0019] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0020] Figure 1 This is a flowchart of the method in Embodiment 1 of the present invention; Figure 2 This is a schematic diagram of the device in Embodiment 2 of the present invention. Detailed Implementation

[0021] The overall concept of the technical solution in this application is as follows: The proposed method for accurate extraction of printed material is based on a four-order framework of "targeted data preparation - LoRA-oriented training - print repetition constraint extraction - background purification post-processing," and is implemented through the following key modules: Targeted LoRA training data construction module Data selection criteria: 300 images of objects with a single print were selected as the training set. The print features were required to be clear: the text was clear (characters ≥ 12 size), the pattern edges were complete, and there was no serious occlusion. The object types covered common materials such as textiles, plastics, and metals to ensure the diversity and representativeness of the training data.

[0022] Data preprocessing rules: Standardize the training set images and uniformly adjust the resolution to 1024×1024 pixels. Use an adaptive threshold denoising algorithm to remove image noise (such as light noise during shooting and minor scratches on the object surface) to avoid noise interfering with the recognition of print features and reduce the impact of irrelevant background on training.

[0023] The training objective of the Kontext-LoRA targeted training module is set to focus on the core requirements of print extraction. The training objective is to "accurately identify the text in the print, fully preserve the color and texture, and remove background interference". The loss function adopts a weighted sum of character recognition loss and pattern integrity loss to enhance the model's learning of text details and pattern structure.

[0024] Training parameter configuration: Under the condition of limited GPU memory (batchsize=1), the training strategy is optimized by extending the number of training iterations per sample to 100 rounds and using gradient accumulation technology to simulate the effect of batch training; the learning rate is set to 5e-5 and a cosine annealing decay strategy is adopted to avoid insufficient consistency caused by training oscillations; during the training process, the accuracy of printed text recognition and the pattern integrity index are monitored in real time, and training is stopped when the index is stable for 10 consecutive rounds.

[0025] Printing Repetition Constraint Extraction Module Extraction of prompt words: Design targeted prompt word extraction, formatted as follows: "Extract the original pattern and text accurately from the surface of the object, including the pattern's colors, lines, and detailed textures, with no elements omitted. After extraction, the pattern needs to be repeated four times for arrangement, and finally output as a clear pattern file free from backgroundinterference." The constraint model ensures that all elements of the print are fully preserved during extraction and are arranged in a repeating pattern four times as required.

[0026] Extraction process optimization: The preprocessed object image is input into the trained Kontext-LoRA model. Based on the features learned through directional training, the model first accurately segments the printed area and the background area, and then extracts the core elements of the printed text, color, lines, texture, etc. The core elements are extracted and arranged in a "2×2 grid" manner for four repetitions to generate an intermediate result image containing the repeated printed text. The repetition arrangement enhances the model's memory and restoration accuracy of the printed text features.

[0027] Algorithm flow: Targeted training data preparation Data filtering steps: Select 300 object images containing a single, clear print and categorize them by material type; Data preprocessing steps: Standardize resolution, remove noise, crop, and generate a training set; Kontext-LoRA Targeted Training Parameter configuration steps: Set parameters such as batchsize=1, learning rate 5e-5, number of iterations, and define the loss function; Model training steps: Input the training set for targeted training, monitor the metrics in real time, stop training after the metrics are met, and save the optimized Kontext-LoRA model; Printing Repetition Constraint Extraction Prompt input steps: Input the constructed extraction prompts into the optimized model; Image input steps: Input the image of the object to be printed (supports JPG and PNG formats); Repeated permutation extraction steps: The model performs print extraction and four repeated permutations, and outputs intermediate result images; Single printing output Single print extraction steps: merge repeated prints, cut, and output a single print pattern without background interference; Final output steps: Save the extracted print pattern (PNG / JPG format, 1024×1024 resolution), supporting single-image extraction and batch processing to achieve accurate extraction of object prints.

[0028] Terminology Explanation Kontext-LoRA: A lightweight fine-tuning model based on the Kontext base model. It enhances the performance of specific tasks (such as print extraction) through targeted training and features efficient parameters and strong adaptability.

[0029] Targeted LoRA training: This refers to LoRA fine-tuning performed by selecting specific scene data (including images of objects with single prints) and focusing on the core task objective, aiming to improve the model's ability to recognize and process the features of the target scene.

[0030] Print integrity loss: This measures the degree of difference between the printed pattern extracted by the model and the original print. It covers indicators such as text accuracy, color similarity, and texture integrity. The lower the loss value, the better the extraction effect.

[0031] Application scenarios: Batch Printed Pattern Reuse: When mass-producing clothing, home furnishings and other industries, it is necessary to extract the original prints on the surface of the object for secondary design. This method can quickly extract accurate and interference-free printed patterns, support batch processing, and improve design efficiency.

[0032] Printing quality inspection: In industrial production (such as textile printing and electronic product casing printing), it is necessary to inspect the integrity of printed text and patterns. The accurate prints extracted by this method can be used as a test benchmark to assist automated quality inspection systems in determining whether the prints are qualified.

[0033] Second-hand goods print traceability: On second-hand trading platforms, the prints on some items (such as limited edition clothing and collectibles) are key to authenticating them. This method can extract clear print patterns to help platforms or buyers compare print details and assist in judging authenticity.

[0034] Cultural and creative product development: The cultural and creative industry needs to extract printing elements from natural objects and traditional handicrafts for creative design. This method can fully preserve the color and texture characteristics of the print, provide high-quality materials for design, and reduce material processing time.

[0035] Example 1 like Figure 1 As shown, this embodiment provides a method for accurately extracting printed patterns from objects, including the following steps: Step S1: Collect an image of an object containing a single print as the input image, and obtain a clean print image corresponding to each input image as the target image; preprocess the input image and the target image to unify the image resolution and remove image noise to obtain the training dataset; Step S2: Insert a LoRA module into the cross-attention layer of the diffusion model, and train the diffusion model after inserting the LoRA module using the training dataset to obtain the print extraction model; wherein, the loss function used in the training process is a weighted sum of character recognition loss, perceptual loss and structural similarity loss; Step S3: Input the image of the object to be extracted into the print extraction model and input the preset extraction prompt words to guide the print extraction model to generate an intermediate image containing a print with four repeated arrangements. Step S4: Remove the background from the intermediate image, extract a single printing unit from the image after background removal, and output a clean single printing pattern.

[0036] In this embodiment, preferably, in step S1, the number of object images containing a single print is not less than 300, and the object material types include at least three of textiles, plastics, metals, paper, and ceramics; the preprocessing includes: uniformly adjusting the image resolution to 1024×1024 pixels, and using an adaptive threshold denoising algorithm to remove image noise; the adaptive threshold denoising algorithm adaptively adjusts the denoising intensity according to the local variance of the image, the larger the local variance of the image, the higher the denoising intensity, and the smaller the local variance, the lower the denoising intensity; Step S1 further includes data augmentation of the training dataset, which includes at least one of the following operations: random rotation, random scaling, color jitter, occlusion simulation, and background replacement.

[0037] In this embodiment, preferably, the diffusion model is Stable Diffusion v2.1 or SDXL; the LoRA module is inserted into the QKV projection layer of the U-Net network of the diffusion model, the rank of the LoRA module is set to 16, and the initial weights are 0; during training, only the parameters of the LoRA module are updated, and the base model parameters of the diffusion model are frozen. The loss function is specifically as follows: L_total = α × L_ocr + β × L_perceptual + γ × L_ssim Where L_ocr is the character recognition loss, which uses a pre-trained OCR model to extract the text sequence in the generated image and calculates the edit distance with the text annotation in the target image; L_perceptual is the perceptual loss. A pre-trained VGG16 network is used to extract feature maps of the generated image and the target image at multiple network layers, and the L1 distance is calculated. L_ssim is the structural similarity loss, calculated as L_ssim = 1 - SSIM(generated image, target image), where SSIM(·) is the structural similarity exponential function. α, β, and γ are weighting coefficients, with values ​​ranging from 0.2 to 0.4, 0.4 to 0.6, and 0.1 to 0.3, respectively, and α + β + γ = 1. The training parameters are configured as follows: batch size of 1, gradient accumulation steps of 4, and learning rate of 5×10. -5 The cosine annealing learning rate decay strategy was adopted, and the training rounds were 100. During the training process, the text recognition accuracy and structural similarity index were monitored in real time. Training was stopped when the text recognition accuracy and structural similarity index did not improve for 10 consecutive rounds.

[0038] In this embodiment, preferably, the extraction prompt is conditional input text used to guide the model in performing the print extraction and repetition arrangement task, and the extraction prompt includes at least the following semantic elements: Semantic elements that indicate the model to perform the print extraction operation; The model indicates that the semantic elements of the print color, lines, and texture are preserved. The indicator model extracts semantic elements of the print and arranges them in a four-fold repetition. The model outputs semantic elements of the printed image without background interference; The extraction prompts are converted into conditional vectors by the CLIP text encoder and used as conditional inputs to the print extraction model. The extraction prompts can be expressed in English or Chinese. The English expression includes prompts containing the following keywords: extract, pattern, text, colors, lines, textures, repeat four times, background interference. The Chinese expression includes prompts containing the following keywords: extract, pattern, text, color, lines, texture, repeat four times, background interference. The print extraction model uses the DDIM sampling algorithm to generate intermediate images, with 50 sampling steps and a guiding scale of 7.5. The intermediate image containing the four-fold repeating print is generated in one of the following ways: Method 1: During the training process in step S2, the printed image arranged in a 2×2 grid is used as the target image. The trained printed image extraction model directly generates an intermediate image containing four repeated arrangements during inference. Method 2: In the reasoning process of step S3, the print extraction model first generates a single print image, and then copies the single print image four times through post-processing and splices them according to a 2×2 grid to obtain an intermediate image containing four repeated arrangements.

[0039] In this embodiment, preferably, step S4 specifically comprises: Background detection is performed on the intermediate image. If the intermediate image is in RGBA format, the Alpha channel is directly retained as a transparent background. If the intermediate image is in RGB format, a pre-trained segmentation model is used to predict the print mask, or the background is separated based on a color clustering method to generate a print image with a transparent background. According to the layout rules of the print in the intermediate image, any complete print unit is located and cut out from the print image with transparent background. The layout rules are 2×2 grids and each print unit is the same size. Adjust the cut-out printing unit to 1024×1024 pixels, perform super-resolution enhancement, color correction and text sharpening on the adjusted image, and output a single printing pattern in PNG format with a transparent background. The pre-trained segmentation model is the Segment Anything Model; the color clustering method is the K-means clustering algorithm, with the number of clusters set to 2, corresponding to the printed area and the background area respectively.

[0040] In this embodiment, preferably, it further includes step 5: receiving a specified folder path, reading all JPG, PNG, and JPEG image files in the folder; processing the read image files in parallel using the print extraction model; recording a processing log, including processing status, processing time, and reasons for processing failure; and automatically retrying images that fail to process, with a maximum of 3 retries.

[0041] Originally intended only as an output format requirement, the "four-fold repetition" was unexpectedly found to force the model to learn the consistent characteristics of the print, improving the extraction completeness by more than 15%. This synergistic effect exceeded expectations.

[0042] The combination of character recognition loss and perceptual loss not only plays its own role, but also produces a synergistic effect—OCR loss enables the model to generate clear text, while perceptual loss ensures the natural integration of text and images, avoiding the text stiffness problem caused by simple OCR loss.

[0043] LoRA was originally designed for generative tasks such as style transfer. This invention applies it to the inverse task of "extracting from the background" for the first time, and unexpectedly found that it effectively suppresses background generation while preserving details, and its performance is better than full model fine-tuning.

[0044] Based on the same inventive concept, this application also provides an apparatus corresponding to the method in Embodiment 1, as detailed in Embodiment 2. Example

[0045] like Figure 2 As shown, this embodiment provides a device for precise extraction of printed object patterns, including: The data construction module collects images of objects containing a single print as input images and obtains clean print images corresponding to each input image as target images; the input images and target images are preprocessed to unify image resolution and remove image noise to obtain a training dataset; The model training module inserts a LoRA module into the cross-attention layer of the diffusion model, and trains the diffusion model after inserting the LoRA module using the training dataset to obtain the print extraction model; wherein, the loss function used in the training process is a weighted sum of character recognition loss, perceptual loss and structural similarity loss; The print extraction module inputs the image of the object to be extracted into the print extraction model and inputs a preset extraction prompt word to guide the print extraction model to generate an intermediate image containing a print with four repeated arrangements. The post-processing module removes the background from the intermediate image, extracts a single printing unit from the background-removed image, and outputs a clean single printing pattern.

[0046] In this embodiment, preferably, the number of object images containing a single print in the data construction module is no less than 300, and the object material types include at least three of textiles, plastics, metals, paper, and ceramics; the preprocessing includes: uniformly adjusting the image resolution to 1024×1024 pixels, and using an adaptive threshold denoising algorithm to remove image noise; the adaptive threshold denoising algorithm adaptively adjusts the denoising intensity according to the local variance of the image, the larger the local variance of the image, the higher the denoising intensity, and the smaller the local variance, the lower the denoising intensity; The data construction module also includes data augmentation of the training dataset, which includes at least one of the following operations: random rotation, random scaling, color jitter, occlusion simulation, and background replacement.

[0047] In this embodiment, preferably, the diffusion model is Stable Diffusion v2.1 or SDXL; the LoRA module is inserted into the QKV projection layer of the U-Net network of the diffusion model, the rank of the LoRA module is set to 16, and the initial weights are 0; during training, only the parameters of the LoRA module are updated, and the base model parameters of the diffusion model are frozen. The loss function is specifically as follows: L_total = α × L_ocr + β × L_perceptual + γ × L_ssim Where L_ocr is the character recognition loss, which uses a pre-trained OCR model to extract the text sequence in the generated image and calculates the edit distance with the text annotation in the target image; L_perceptual is the perceptual loss. A pre-trained VGG16 network is used to extract feature maps of the generated image and the target image at multiple network layers, and the L1 distance is calculated. L_ssim is the structural similarity loss, calculated as L_ssim = 1 - SSIM(generated image, target image), where SSIM(·) is the structural similarity exponential function. α, β, and γ are weighting coefficients, with values ​​ranging from 0.2 to 0.4, 0.4 to 0.6, and 0.1 to 0.3, respectively, and α + β + γ = 1. The training parameters are configured as follows: batch size of 1, gradient accumulation steps of 4, and learning rate of 5×10. -5 The cosine annealing learning rate decay strategy was adopted, and the training rounds were 100. During the training process, the text recognition accuracy and structural similarity index were monitored in real time. Training was stopped when the text recognition accuracy and structural similarity index did not improve for 10 consecutive rounds.

[0048] In this embodiment, preferably, the extraction prompt is conditional input text used to guide the model in performing the print extraction and repetition arrangement task, and the extraction prompt includes at least the following semantic elements: Semantic elements that indicate the model to perform the print extraction operation; The model indicates that the semantic elements of the print color, lines, and texture are preserved. The indicator model extracts semantic elements of the print and arranges them in a four-fold repetition. The model outputs semantic elements of the printed image without background interference; The extraction prompts are converted into conditional vectors by the CLIP text encoder and used as conditional inputs to the print extraction model. The extraction prompts can be expressed in English or Chinese. The English expression includes prompts containing the following keywords: extract, pattern, text, colors, lines, textures, repeat four times, background interference. The Chinese expression includes prompts containing the following keywords: extract, pattern, text, color, lines, texture, repeat four times, background interference. The print extraction model uses the DDIM sampling algorithm to generate intermediate images, with 50 sampling steps and a guiding scale of 7.5. The intermediate image containing the four-fold repeating print is generated in one of the following ways: Method 1: During the training process of the model training module, the printed image arranged in a 2×2 grid is used as the target image. The trained printed image extraction model directly generates an intermediate image containing four repeated arrangements during inference. Method 2: During the inference process of the print extraction module, the print extraction model first generates a single print image, and then copies the single print image four times through post-processing and stitches them together in a 2×2 grid to obtain an intermediate image containing four repeated arrangements.

[0049] In this embodiment, preferably, the post-processing module specifically comprises: Background detection is performed on the intermediate image. If the intermediate image is in RGBA format, the Alpha channel is directly retained as a transparent background. If the intermediate image is in RGB format, a pre-trained segmentation model is used to predict the print mask, or the background is separated based on a color clustering method to generate a print image with a transparent background. According to the layout rules of the print in the intermediate image, any complete print unit is located and cut out from the print image with transparent background. The layout rules are 2×2 grids and each print unit is the same size. Adjust the cut-out printing unit to 1024×1024 pixels, perform super-resolution enhancement, color correction and text sharpening on the adjusted image, and output a single printing pattern in PNG format with a transparent background. The pre-trained segmentation model is the Segment Anything Model; the color clustering method is the K-means clustering algorithm, with the number of clusters set to 2, corresponding to the printed area and the background area respectively.

[0050] In this embodiment, preferably, it also includes a batch processing module, which receives a specified folder path, reads all JPG, PNG, and JPEG image files in the folder; processes the read image files in parallel using a printing extraction model; records a processing log, including processing status, processing time, and reasons for processing failure; and automatically retryes images that fail to process, up to 3 times.

[0051] Since the apparatus described in Embodiment 2 of the present invention is an apparatus used to implement the method of Embodiment 1 of the present invention, those skilled in the art can understand the specific structure and variations of the apparatus based on the method described in Embodiment 1 of the present invention, and therefore will not be described again here. All apparatuses used in the method of Embodiment 1 of the present invention fall within the scope of protection of the present invention.

[0052] Based on the same inventive concept, this application provides an electronic device embodiment corresponding to Embodiment 1, as detailed in Embodiment 3.

[0053] Example 3 This embodiment provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it can implement any of the implementation methods in Embodiment 1.

[0054] Since the electronic device described in this embodiment is the device used to implement the method in Embodiment 1 of this application, those skilled in the art can understand the specific implementation method and various variations of the electronic device in this embodiment based on the method described in Embodiment 1 of this application. Therefore, how the electronic device implements the method in the embodiment of this application will not be described in detail here. Any device used by those skilled in the art to implement the method in the embodiment of this application falls within the scope of protection of this application.

[0055] Based on the same inventive concept, this application provides a storage medium corresponding to Embodiment 1, as detailed in Embodiment 4.

[0056] Example 4 This embodiment provides a computer-readable storage medium storing a computer program thereon. When the computer program is executed by a processor, it can implement any of the implementation methods in Embodiment 1.

[0057] The technical solutions provided in this application embodiment have at least the following technical effects or advantages: 1. The present invention has a printed text recognition accuracy of ≥99% and a character confusion error rate (e.g., “96” is misjudged as “86”) ≤1%, which is 93.3% lower than the traditional Kontext model (error rate ≥15%), ensuring the accuracy of printed text information.

[0058] 2. The color reproduction of the printed pattern extracted by this invention is ≥98%, the retention rate of line and texture details is ≥99%, there are no missing or redundant elements, and the core features of the print are completely transmitted.

[0059] 3. The background interference removal rate of this invention is ≥99.5%, and the output pattern has no background ghosting or material texture interference, and can be directly reused without manual retouching.

[0060] 4. When batch processing 100 similar printed images, the similarity of the extracted results is ≥95%, which is 18.75% higher than the traditional batch size=1 trained model (similarity ≥80%), thus solving the problem of insufficient consistency caused by limited video memory.

[0061] 5. This invention supports the extraction of prints from 10+ common materials such as textiles, plastics, metals, and glass. The accuracy of text and the integrity of patterns under different materials are both ≥98%, without the need to adjust model parameters, and it has strong adaptability.

[0062] 6. The extraction time for printing on a single object image is ≤30 seconds, meeting the needs of rapid processing; the qualification rate of the extraction result is ≥98%, without the need for manual correction of text errors, supplementation of pattern details or removal of background interference, which reduces the time by 93.3% compared with traditional methods (manual intervention rate ≥30%), significantly reducing operating costs.

[0063] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0064] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0065] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0066] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0067] While specific embodiments of the present invention have been described above, those skilled in the art should understand that the specific embodiments described are merely illustrative and not intended to limit the scope of the present invention. Equivalent modifications and variations made by those skilled in the art in accordance with the spirit of the present invention should be covered within the scope of protection of the claims of the present invention.

Claims

1. A method for accurately extracting printed patterns from objects, characterized in that, Includes the following steps: Step S1: Collect an image of an object containing a single print as the input image, and obtain the clean print image corresponding to each input image as the target image; The input and target images are preprocessed to unify image resolution and remove image noise to obtain a training dataset; Step S2: Insert a LoRA module into the cross-attention layer of the diffusion model, and train the diffusion model after inserting the LoRA module using the training dataset to obtain the print extraction model; wherein, the loss function used in the training process is a weighted sum of character recognition loss, perceptual loss and structural similarity loss; Step S3: Input the image of the object to be extracted into the print extraction model and input the preset extraction prompt words to guide the print extraction model to generate an intermediate image containing a print with four repeated arrangements. Step S4: Remove the background from the intermediate image, extract a single printing unit from the image after background removal, and output a clean single printing pattern.

2. The method according to claim 1, characterized in that, In step S1, the number of object images containing a single print is no less than 300, and the object material types include at least three of the following: textiles, plastics, metals, paper, and ceramics. The preprocessing includes: uniformly adjusting the image resolution to 1024×1024 pixels, and using an adaptive threshold denoising algorithm to remove image noise. The adaptive threshold denoising algorithm adaptively adjusts the denoising intensity according to the local variance of the image; the larger the local variance, the higher the denoising intensity, and the smaller the local variance, the lower the denoising intensity.

3. The method according to claim 1, characterized in that, Step S1 further includes data augmentation of the training dataset, which includes at least one of the following operations: random rotation, random scaling, color jitter, occlusion simulation, and background replacement.

4. The method according to claim 1, characterized in that, The diffusion model is Stable Diffusion v2.1 or SDXL; the LoRA module is inserted into the QKV projection layer of the U-Net network of the diffusion model, the rank of the LoRA module is set to 16, and the initial weights are 0; during training, only the parameters of the LoRA module are updated, and the base model parameters of the diffusion model are frozen. The loss function is specifically as follows: L_total = α × L_ocr + β × L_perceptual + γ × L_ssim Where L_ocr is the character recognition loss, which uses a pre-trained OCR model to extract the text sequence in the generated image and calculates the edit distance with the text annotation in the target image; L_perceptual is the perceptual loss. A pre-trained VGG16 network is used to extract feature maps of the generated image and the target image at multiple network layers, and the L1 distance is calculated. L_ssim is the structural similarity loss, calculated as L_ssim = 1 - SSIM(generated image, target image), where SSIM(·) is the structural similarity exponential function. α, β, and γ are weighting coefficients, with values ​​ranging from 0.2 to 0.4, 0.4 to 0.6, and 0.1 to 0.3, respectively, and α + β + γ = 1. The training parameters are configured as follows: batch size of 1, gradient accumulation steps of 4, and learning rate of 5×10. -5 The cosine annealing learning rate decay strategy was adopted, and the training rounds were 100. During the training process, the text recognition accuracy and structural similarity index were monitored in real time. Training was stopped when the text recognition accuracy and structural similarity index did not improve for 10 consecutive rounds.

5. The method according to claim 1, characterized in that, The extraction prompts are conditional input texts used to guide the model in performing the print extraction and repetition tasks, and the extraction prompts contain at least the following semantic elements: Semantic elements that indicate the model to perform the print extraction operation; The model indicates that the semantic elements of the print color, lines, and texture are preserved. The indicator model extracts semantic elements of the print and arranges them in a four-fold repetition. The model outputs semantic elements of the printed image without background interference; The extraction prompts are converted into conditional vectors by the CLIP text encoder and used as conditional inputs to the print extraction model. The extraction prompts can be expressed in English or Chinese. The English expression includes prompts containing the following keywords: extract, pattern, text, colors, lines, textures, repeat four times, background interference. The Chinese expression includes prompts containing the following keywords: extract, pattern, text, color, lines, texture, repeat four times, background interference. The print extraction model uses the DDIM sampling algorithm to generate intermediate images, with 50 sampling steps and a guiding scale of 7.

5. The intermediate image containing the four-fold repeating print is generated in one of the following ways: Method 1: During the training process in step S2, the printed image arranged in a 2×2 grid is used as the target image. The trained printed image extraction model directly generates an intermediate image containing four repeated arrangements during inference. Method 2: In the reasoning process of step S3, the print extraction model first generates a single print image, and then copies the single print image four times through post-processing and splices them according to a 2×2 grid to obtain an intermediate image containing four repeated arrangements.

6. The method according to claim 1, characterized in that, Step S4 specifically involves: Background detection is performed on the intermediate image. If the intermediate image is in RGBA format, the Alpha channel is directly retained as a transparent background. If the intermediate image is in RGB format, a pre-trained segmentation model is used to predict the print mask, or the background is separated based on a color clustering method to generate a print image with a transparent background. According to the layout rules of the print in the intermediate image, any complete print unit is located and cut out from the print image with transparent background. The layout rules are 2×2 grids and each print unit is the same size. Adjust the cut-out printing unit to 1024×1024 pixels, perform super-resolution enhancement, color correction and text sharpening on the adjusted image, and output a single printing pattern in PNG format with a transparent background. The pre-trained segmentation model is the Segment Anything Model; the color clustering method is the K-means clustering algorithm, with the number of clusters set to 2, corresponding to the printed area and the background area respectively.

7. The method according to claim 1, characterized in that, It also includes step 5: receiving the specified folder path, reading all JPG, PNG, and JPEG image files in the folder; and processing the read image files in parallel using the print extraction model. Record processing logs, including processing status, processing time, and reasons for processing failure; Images that fail to process will be automatically retried, up to a maximum of 3 times.

8. A device for precise extraction of printed material, characterized in that: include: The data construction module collects images of objects with a single print as input images and obtains clean print images corresponding to each input image as target images. The input and target images are preprocessed to unify image resolution and remove image noise to obtain a training dataset; The model training module inserts a LoRA module into the cross-attention layer of the diffusion model, and trains the diffusion model after inserting the LoRA module using the training dataset to obtain the print extraction model; wherein, the loss function used in the training process is a weighted sum of character recognition loss, perceptual loss and structural similarity loss; The print extraction module inputs the image of the object to be extracted into the print extraction model and inputs a preset extraction prompt word to guide the print extraction model to generate an intermediate image containing a print with four repeated arrangements. The post-processing module removes the background from the intermediate image, extracts a single printing unit from the background-removed image, and outputs a clean single printing pattern.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the method as described in any one of claims 1 to 7.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1 to 7.