Processing method and model training method, device and system of a picture-text reader
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN GREEN CONNECTION TECH CO LTD
- Filing Date
- 2025-03-25
- Publication Date
- 2026-06-23
AI Technical Summary
Existing e-readers are inadequate in terms of multi-format support and parsing efficiency, failing to meet users' needs for efficient and convenient reading.
The system employs a multi-format parsing model from AI NAS devices to standardize the parsing of target files. Combining text analysis and image processing operations, it uses a deep learning model for cross-device synchronous updates, supporting intelligent parsing and rendering of multi-format files.
It achieves efficient parsing and rendering of multiple file formats, improving the user's reading smoothness and experience, supporting cross-device synchronous updates, and optimizing the presentation of reading content and data consistency between devices.
Smart Images

Figure CN120409457B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, and in particular to a processing method, model training method, apparatus and system for a text and image reader. Background Technology
[0002] With the rapid development of computer technology, traditional paper-based reading methods are gradually failing to meet people's reading needs, and efficient and convenient electronic reading methods are gradually gaining popularity.
[0003] Currently, mainstream e-book and comic book readers on the market rely on the local parsing capabilities of the terminal device, using dedicated software to support specific formats. This results in insufficient multi-format support and low parsing efficiency. Therefore, proposing a technical solution that supports multi-format parsing and improves parsing efficiency is crucial. Summary of the Invention
[0004] This invention provides a processing method, model training method, device, and system for an image and text reader, which can support multi-format parsing and improve parsing efficiency.
[0005] To address the aforementioned technical problems, the first aspect of this invention discloses a processing method for an image and text reader, the method being applied to an AI NAS device, the method comprising:
[0006] The target file is standardized by using the multi-format parsing model preset in the AI NAS device to obtain standardized content, which includes standardized text content and / or standardized image content.
[0007] When the standardized content includes the standardized text content, a text analysis operation is performed on the standardized text content to obtain the target text content. The text analysis operation includes at least one of chapter division operation, image-text separation operation, and language translation operation.
[0008] When the standardized content includes the standardized image content, image processing operations are performed on the standardized image content to obtain the target image content. The image processing operations include at least one of image enhancement operations, image denoising operations, and intelligent image stitching operations.
[0009] The target text content and / or the target image content are displayed based on a pre-built cross-device responsive interface, and the target user's reading data on the target text content and / or the target image content is updated synchronously on the device based on the AI NAS device.
[0010] As an optional implementation, in the first aspect of the present invention, the standardization parsing of the target file based on the preset multi-format parsing model in the AI NAS device to obtain standardized content includes:
[0011] Based on the multi-format parsing model preset in the AI NAS device, the standardized parsing task corresponding to the target file is decomposed into multiple sub-tasks corresponding to the target file;
[0012] The computational requirements of each subtask are dynamically evaluated based on a preset task scheduling algorithm, and the parallel processing mechanism of the target file is determined according to the computational requirements of each subtask.
[0013] According to the parallel processing mechanism, each subtask is processed in parallel using the multi-format parsing model to obtain the task processing result corresponding to each subtask.
[0014] The results of each task processing are merged to obtain standardized content.
[0015] As an optional implementation, in the first aspect of the present invention, when the standardized content includes the standardized text content, performing text analysis on the standardized text content to obtain the target text content includes:
[0016] When the standardized content includes the standardized text content, the chapter information of the standardized text content is identified by combining preset chapter features, and the standardized text content is divided into chapters based on the chapter information to obtain the target text content. The chapter information includes chapter name and hierarchical relationship; and / or,
[0017] Extract image features and text features from the standardized text content, and jointly encode the image features and text features to obtain the encoding result;
[0018] Based on preset feature weights and the encoding results, the standardized text content is subjected to image-text separation to obtain the target text content; and / or,
[0019] Identify non-defined language text content within the standardized text content, and perform language translation operations on the non-defined language text content based on user needs to obtain target text content. The user needs include translation language type requirements and / or translation range requirements.
[0020] As an optional implementation, in the first aspect of the present invention, when the standardized content includes the standardized image content, performing image processing operations on the standardized image content to obtain the target image content includes:
[0021] When the standardized content includes the standardized image content, image enhancement operations are performed on the standardized image content based on a super-resolution model to obtain the target image content; and / or,
[0022] Noise information is extracted from the standardized image content, and image denoising is performed on the standardized image content using a denoising autoencoder and edge-preserving filtering techniques to target the noise information, thereby obtaining the target image content. The noise information includes blemish information and / or watermark information; and / or...
[0023] Extract feature point information from the standardized image content, and predict the target stitching region in the standardized image content based on the feature point information;
[0024] Based on the feature point information and the target stitching region, an intelligent image stitching operation is performed on the segmented images in the standardized image content to obtain the target image content.
[0025] As an optional implementation, in the first aspect of the present invention, the device synchronization update of the target user's reading data on the target text content and / or the target image content based on the AI NAS device includes:
[0026] Acquire reading data of a target user on a target device for the target text content and / or the target image content, the reading data including reading progress data and reading setting data, the reading setting data including reading mode setting data and / or display content setting data;
[0027] The reading data is stored in the AI NAS device, and the reading data is synchronously updated to each reading device corresponding to the target user based on the AI NAS device.
[0028] As an optional implementation, in the first aspect of the present invention, the method further includes:
[0029] A deep learning model is constructed in the AI NAS device based on a preset structure;
[0030] The system identifies multiple file formats to be trained and generates data structure labels for each file format, including chapter information labels, text content labels, and image resource labels.
[0031] Obtain the training file and train the deep learning model based on the training file and the data structure labels of each file format to obtain the multi-format parsing model. The training file includes sub-files with multi-format complexity, including multilingual text sub-files and multi-resolution image sub-files.
[0032] As an optional implementation, in the first aspect of the present invention, the step of training the deep learning model based on the file to be trained and the data structure tags of each of the file formats to obtain the multi-format parsing model includes:
[0033] The structural features of the files to be trained are analyzed, and cluster analysis is performed on the files to be trained based on the structural features and the data structure labels of each file format to obtain the analysis results;
[0034] Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model;
[0035] The system obtains new format files and their parsing results from users during the use of the multi-format parsing model, and performs incremental learning on the multi-format parsing model based on these files and their parsing results to update the multi-format parsing model.
[0036] A second aspect of this invention discloses a method for building and training a multi-format parsing model, the method being applied to an AI NAS device, the method comprising:
[0037] In the AI NAS device, a deep learning model is built based on a preset framework, which includes a deep learning framework and a distributed computing framework.
[0038] The system identifies multiple file formats to be trained and generates data structure labels for each file format, including chapter information labels, text content labels, and image resource labels.
[0039] Obtain the training file and train the deep learning model based on the training file and the data structure labels of each file format to obtain a multi-format parsing model. The training file includes sub-files with multi-format complexity, including multilingual text sub-files and multi-resolution image sub-files.
[0040] As an optional implementation, in a second aspect of the invention, training the deep learning model based on the file to be trained and the data structure tags for each file format to obtain a multi-format parsing model includes:
[0041] The structural features of the files to be trained are analyzed, and cluster analysis is performed on the files to be trained based on the structural features and the data structure labels of each file format to obtain the analysis results;
[0042] Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model;
[0043] The system obtains new format files and their parsing results from users during the use of the multi-format parsing model, and performs incremental learning on the multi-format parsing model based on these files and their parsing results to update the multi-format parsing model.
[0044] As an optional implementation, in a second aspect of the present invention, the step of analyzing the structural features of the training file and performing cluster analysis on the training file based on the structural features and the data structure labels of each file format to obtain the analysis results includes:
[0045] The file to be trained is preprocessed to obtain the file content, which includes text structure content and / or image sequence content;
[0046] The file content is tagged according to the data structure tags of each file format to obtain the tagged file content, and the tagged file content is structurally analyzed to obtain the structural features of the tagged file content;
[0047] Based on the content of the tagged files and the structural features, cluster analysis is performed on the files to be trained to obtain the analysis results.
[0048] A third aspect of the present invention discloses a processing apparatus for a text and image reader, the apparatus being used in an AI NAS device, the apparatus comprising:
[0049] The parsing module is used to perform standardized parsing of the target file based on the preset multi-format parsing model in the AI NAS device to obtain standardized content, which includes standardized text content and / or standardized image content;
[0050] The text analysis module is used to perform text analysis operations on the standardized text content when the standardized content includes the standardized text content, to obtain the target text content. The text analysis operations include at least one of chapter division operation, image and text separation operation, and language translation operation.
[0051] The image processing module is used to perform image processing operations on the standardized image content when the standardized content includes the standardized image content, to obtain the target image content. The image processing operations include at least one of image enhancement operation, image denoising operation, and intelligent image stitching operation.
[0052] The display module is used to display the target text content and / or the target image content based on a pre-built cross-device responsive interface;
[0053] The update module is used to perform device synchronization updates based on the AI NAS device for the target user's reading data on the target text content and / or the target image content.
[0054] As an optional implementation, in a third aspect of the present invention, the parsing module performs standardized parsing of the target file based on a preset multi-format parsing model in the AI NAS device to obtain standardized content, specifically including:
[0055] Based on the multi-format parsing model preset in the AI NAS device, the standardized parsing task corresponding to the target file is decomposed into multiple sub-tasks corresponding to the target file;
[0056] The computational requirements of each subtask are dynamically evaluated based on a preset task scheduling algorithm, and the parallel processing mechanism of the target file is determined according to the computational requirements of each subtask.
[0057] According to the parallel processing mechanism, each subtask is processed in parallel using the multi-format parsing model to obtain the task processing result corresponding to each subtask.
[0058] The results of each task processing are merged to obtain standardized content.
[0059] As an optional implementation, in a third aspect of the present invention, when the standardized content includes the standardized text content, the text analysis module performs text analysis on the standardized text content to obtain the target text content in the following specific ways:
[0060] When the standardized content includes the standardized text content, the chapter information of the standardized text content is identified by combining preset chapter features, and the standardized text content is divided into chapters based on the chapter information to obtain the target text content. The chapter information includes chapter name and hierarchical relationship; and / or,
[0061] Extract image features and text features from the standardized text content, and jointly encode the image features and text features to obtain the encoding result;
[0062] Based on preset feature weights and the encoding results, the standardized text content is subjected to image-text separation to obtain the target text content; and / or,
[0063] Identify non-defined language text content within the standardized text content, and perform language translation operations on the non-defined language text content based on user needs to obtain target text content. The user needs include translation language type requirements and / or translation range requirements.
[0064] As an optional implementation, in a third aspect of the present invention, when the standardized content includes the standardized image content, the image processing module performs image processing operations on the standardized image content to obtain the target image content, specifically including:
[0065] When the standardized content includes the standardized image content, image enhancement operations are performed on the standardized image content based on a super-resolution model to obtain the target image content; and / or,
[0066] Noise information is extracted from the standardized image content, and image denoising is performed on the standardized image content using a denoising autoencoder and edge-preserving filtering techniques to target the noise information, thereby obtaining the target image content. The noise information includes blemish information and / or watermark information; and / or...
[0067] Extract feature point information from the standardized image content, and predict the target stitching region in the standardized image content based on the feature point information;
[0068] Based on the feature point information and the target stitching region, an intelligent image stitching operation is performed on the segmented images in the standardized image content to obtain the target image content.
[0069] As an optional implementation, in a third aspect of the present invention, the method by which the update module performs device synchronization updates based on the AI NAS device for the target user's reading data of the target text content and / or the target image content specifically includes:
[0070] Acquire reading data of a target user on a target device for the target text content and / or the target image content, the reading data including reading progress data and reading setting data, the reading setting data including reading mode setting data and / or display content setting data;
[0071] The reading data is stored in the AI NAS device, and the reading data is synchronously updated to each reading device corresponding to the target user based on the AI NAS device.
[0072] As an optional implementation, in a third aspect of the invention, the apparatus further includes:
[0073] A building module is used to build a deep learning model based on a preset structure in the AI NAS device;
[0074] The determination module is used to determine multiple file formats to be trained and generate data structure labels for each file format, the data structure labels including chapter information labels, text content labels and image resource labels;
[0075] The acquisition module is used to acquire the training file and train the deep learning model based on the training file and the data structure labels of each file format to obtain the multi-format parsing model. The training file includes sub-files with multi-format complexity, including multilingual text sub-files and multi-resolution image sub-files.
[0076] As an optional implementation, in a third aspect of the present invention, the acquisition module trains the deep learning model based on the file to be trained and the data structure tags of each file format to obtain a multi-format parsing model, specifically including:
[0077] The structural features of the files to be trained are analyzed, and cluster analysis is performed on the files to be trained based on the structural features and the data structure labels of each file format to obtain the analysis results;
[0078] Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model;
[0079] The system obtains new format files and their parsing results from users during the use of the multi-format parsing model, and performs incremental learning on the multi-format parsing model based on these files and their parsing results to update the multi-format parsing model.
[0080] The fourth aspect of the present invention discloses a processing system for a text and image reader, the system comprising at least an electronic device and an AI NAS device, wherein the electronic device and the AI NAS device are communicatively connected;
[0081] The electronic device is configured with an application that can access the AI NAS device and read the target text content and / or target image content processed by the AI NAS device based on the image reader processing method disclosed in the first aspect of the present invention.
[0082] The fifth aspect of the present invention discloses a computer storage medium storing computer instructions, which, when invoked, are used to execute the processing method of the image and text reader disclosed in the first aspect of the present invention.
[0083] Compared with the prior art, the embodiments of the present invention have the following beneficial effects:
[0084] In this embodiment of the invention, the target file can be standardized and parsed based on a pre-set multi-format parsing model in the AI NAS device to obtain standardized content. This means that parsing and rendering tasks can be migrated to the AI NAS device, supporting multi-format parsing and improving parsing efficiency. Text analysis is performed on the standardized text content to obtain the target text content, and image processing is performed on the standardized image content to obtain the target image content. This enables intelligent text analysis and image processing, optimizing reading content. The target text content and / or target image content are displayed based on a pre-built cross-device responsive interface. Furthermore, the AI NAS device synchronizes and updates the target user's reading data for the target text content and / or target image content across devices, supporting multi-device interface display and cross-device synchronization, improving user reading fluency and enhancing user experience. Attached Figure Description
[0085] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0086] Figure 1 This is a flowchart illustrating a text and image reader processing method disclosed in an embodiment of the present invention;
[0087] Figure 2 This is a flowchart illustrating another image and text reader processing method disclosed in an embodiment of the present invention;
[0088] Figure 3 This is a flowchart illustrating a training method for a multi-format parsing model disclosed in an embodiment of the present invention;
[0089] Figure 4 This is a schematic diagram of the processing device for a text and image reader disclosed in an embodiment of the present invention;
[0090] Figure 5 This is a schematic diagram of the processing device of another image reader disclosed in an embodiment of the present invention. Detailed Implementation
[0091] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0092] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this invention are used to distinguish different objects, not to describe a specific order. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, apparatus, product, or end that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or ends.
[0093] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of the invention. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0094] This invention discloses a text and image reader processing method, model training method, apparatus, and system. It can migrate parsing and rendering tasks to AI NAS devices, supports multi-format parsing, improves parsing efficiency, enables intelligent text analysis and image processing, optimizes reading content, supports multi-device interface display and cross-device synchronization, enhances user reading fluency, and improves user experience. These are described in detail below.
[0095] Example 1
[0096] Please see Figure 1 , Figure 1 This is a flowchart illustrating a text and image reader processing method disclosed in an embodiment of the present invention. Figure 1 The described processing method for a text and image reader can be applied to a processing device for a text and image reader, which can be applied to an AI NAS device. This processing device may include an intelligent server or intelligent platform for standardized parsing of e-books and / or electronic images. The intelligent server may include an AI NAS end server or a cloud server; however, this embodiment of the invention does not impose limitations. Figure 1As shown, the processing method of this image and text reader may include the following operations:
[0097] 101. Based on the multi-format parsing model preset in the AI NAS device, the target file is parsed in a standardized manner to obtain standardized content.
[0098] In this embodiment of the invention, optionally, an AI reader system may be pre-installed in the AI NAS device. The AI reader system may include a multi-format parsing model. The AI reader system can be used to parse and render files. That is, the parsing and rendering tasks for electronic files can be migrated to the AI reader system in the AI NAS device, utilizing its powerful computing capabilities and intelligent optimization functions to improve parsing and rendering efficiency. This invention does not limit this.
[0099] In this embodiment of the invention, optionally, the multi-format parsing model can be used to identify and parse ebooks and electronic images (such as comics) in various formats, including but not limited to PDF, EPUB, CBZ, and CBR. The multi-format parsing model can parse the target file based on a unified data structure and standardize the content of the parsed file to obtain standardized content. The target file may include electronic text files and / or electronic image files, and the standardized content may include standardized text content and / or standardized image content. This invention does not impose any limitations.
[0100] 102. When the standardized content includes standardized text content, perform text analysis on the standardized text content to obtain the target text content.
[0101] In this embodiment of the invention, optionally, when the standardized content includes standardized text content, that is, when parsing the e-book, text analysis can be performed on the standardized text content to obtain the target text content. The text analysis operation includes at least one of chapter division operation, image-text separation operation, and language translation operation. The chapter division operation may include extracting chapter names and hierarchical relationships from the e-book's layout format. The image-text separation operation may include separating the text and image content in the document. The language translation operation may perform paragraph-level translation or full-text translation of the non-native language document content in the document. This invention does not limit the scope of the invention.
[0102] 103. When the standardized content includes standardized image content, perform image processing operations on the standardized image content to obtain the target image content.
[0103] In this embodiment of the invention, optionally, when the standardized content includes standardized image content, that is, when performing analysis operations on electronic images, image processing operations can be performed on the standardized image content to obtain the target image content. The image processing operations include at least one of image enhancement operations, image denoising operations, and intelligent image stitching operations. The image enhancement operations may include using super-resolution technology to improve the clarity of the comic image, and adjusting the image contrast and brightness to optimize the display effect. The image denoising operations may include using a deep learning denoising model to remove noise and watermarks from the image. The intelligent image stitching operations may include intelligently and automatically stitching the segmented comic pages. This invention is not limited to these methods.
[0104] It should be noted that the execution order of steps 102 and 103 is not related; that is, steps 102 and 103 can be executed simultaneously or sequentially. It should be noted that executing steps 102 and 103 simultaneously can improve the processing efficiency of the target file, thereby improving the parsing efficiency of the target file.
[0105] 104. Display target text content and / or target image content based on a pre-built cross-device responsive interface, and update the target user's reading data on the target text content and / or target image content on the AI NAS device.
[0106] In this embodiment of the invention, optionally, the target text content and / or target image content can be displayed based on a pre-built cross-device responsive interface. The pre-built cross-device responsive interface can support access from multiple devices and multiple reading modes, including night mode, audiobook mode, etc., and supports touch screen operation, keyboard shortcuts, and voice control. Specifically, the responsive interface can be developed using Web technologies (such as HTML5 and JavaScript), compatible with devices such as PCs, tablets, and mobile phones, adapted to touch devices, and provide interactive operations such as swipe page turning and two-finger zoom. The reading data of the target user for the target text content and / or target image content can be synchronized and updated on the device based on the AI NAS device. Specifically, the AI NAS device can be connected to a cloud storage service to automatically synchronize the user's reading progress, bookmarks, notes, and other reading data. This invention is not limited to these limitations.
[0107] In this embodiment of the invention, optionally, during the process of displaying target text content and / or target image content based on a pre-built cross-device responsive interface, incremental parsing technology is implemented, loading content only when the user scrolls to the corresponding chapter, reducing resource consumption, and storing parsed files through caching technology to improve subsequent access speed. This invention does not limit this.
[0108] It is evident that implementation Figure 1The described image and text reader's processing method can standardize the target file based on a pre-set multi-format parsing model in the AI NAS device to obtain standardized content, perform text analysis on the standardized text content to obtain target text content, perform image processing on the standardized image content to obtain target image content, display the target text content and / or target image content based on a pre-built cross-device responsive interface, and perform device synchronization updates based on the AI NAS device for the target user's reading data of the target text content and / or target image content. It can migrate parsing and rendering tasks to the AI NAS device, support multi-format parsing, improve parsing efficiency, support cross-device synchronization, and improve user experience.
[0109] In an optional embodiment, when the standardized content includes standardized text content, performing text analysis on the standardized text content to obtain the target text content may include the following operations:
[0110] When standardized content includes standardized text content, the chapter information of the standardized text content is identified by combining preset chapter features, and the standardized text content is divided into chapters based on the chapter information to obtain the target text content. The chapter information includes the chapter name and hierarchical relationship; and / or,
[0111] Extract image and text features from standardized text content, and jointly encode the image and text features to obtain the encoding result;
[0112] Based on preset feature weights and encoding results, image-text separation is performed on standardized text content to obtain the target text content; and / or,
[0113] Identify non-defined language text content within standardized text content, and perform language translation operations on the non-defined language text content based on user needs to obtain the target text content. User needs include translation language type requirements and / or translation scope requirements.
[0114] In this optional embodiment, when the standardized content includes standardized text content, the chapter information of the standardized text content can be identified by combining preset chapter features. The preset chapter features may include a large number of chapter title font styles, chapter title keywords, etc. The chapter information may include chapter names and hierarchical relationships. The standardized text content can be divided into chapters based on the chapter information to obtain the target text content. That is, a directory tree of the standardized text content can be generated based on the chapter information. Optionally, a rule model (such as an analysis model based on HTML / XML structure) and a deep learning model can be combined to improve compatibility with different document structures. This embodiment does not limit this.
[0115] In this optional embodiment, image features and text features can be extracted from the standardized text content. Specifically, NLP combined with computer vision technology can be used to separate the text and image content in the document, and the image and text features can be jointly encoded to obtain the encoding result. Based on the preset feature weights and encoding result, the standardized text content can be separated into image and text to obtain the target text content. The preset feature weights can be set by the user or automatically determined based on the document information. The feature weights for each file format can be the same or different. Optionally, image annotations for the images can be automatically generated according to user needs or system settings to facilitate user reference. This embodiment does not impose any limitations.
[0116] In this optional embodiment, non-defined language text content within standardized text content can be identified. The user can pre-set their native language (e.g., Chinese or English). The text and speech in the AI reader will be displayed or played according to the user-defined native language. Non-defined language text content can include text content in languages other than the user-defined native language. For example, when the user's native language is Chinese, non-defined language text content can include text content in languages other than Chinese. Language translation can be performed on the non-defined language text content based on user needs to obtain the target text content. User needs include translation language type requirements and / or translation range requirements. The user can change the translated language type through translation language type requirements and change the scope of the translated text through translation range requirements. The AI reader supports paragraph-level translation and full-text translation, but this embodiment does not limit this.
[0117] In this optional embodiment, the TF-IDF algorithm and deep learning models (such as TextRank) can be used to generate keywords and content summaries for each chapter, and users can customize keyword weights according to actual needs. This embodiment does not impose any limitations on this.
[0118] As can be seen, implementing this optional embodiment can, when the standardized content includes standardized text content, identify the chapter information of the standardized text content by combining preset chapter features, perform chapter division operation on the standardized text content based on the chapter information, extract image features and text features from the standardized text content, and jointly encode the image features and text features to obtain the encoding result. Based on preset feature weights and the encoding result, perform image-text separation operation on the standardized text content, identify non-defined language text content in the standardized text content, and perform language translation operation on the non-defined language text content based on user needs to obtain the target text content. This can improve the efficiency of distinguishing image and text content, optimize the presentation method, improve the readability of non-native language content, and improve the user experience.
[0119] In another optional embodiment, when the standardized content includes standardized image content, performing image processing operations on the standardized image content to obtain the target image content may include the following operations:
[0120] When the standardized content includes standardized image content, image enhancement operations are performed on the standardized image content based on a super-resolution model to obtain the target image content; and / or,
[0121] Noise information is extracted from the standardized image content, and image denoising is performed on the standardized image content using a denoising autoencoder and edge-preserving filtering techniques to obtain the target image content. The noise information includes blemish information and / or watermark information; and / or...
[0122] Extract feature point information from the standardized image content, and predict the target stitching region in the standardized image content based on the feature point information;
[0123] Based on feature point information and the target stitching region, intelligent image stitching is performed on the segmented images in the standardized image content to obtain the target image content.
[0124] In this optional embodiment, when the standardized content includes standardized image content, image enhancement operations can be performed on the standardized image content based on a super-resolution model to obtain the target image content. The super-resolution model may include a super-resolution generative adversarial network (SRGAN), which can be used to improve image clarity and enhance detail. In the process of performing image enhancement operations on the standardized image content, a contextual attention mechanism can be introduced to reduce the probability of losing key details in the enhanced image. Furthermore, parameters such as image contrast and brightness can be adjusted to optimize the display effect. This embodiment does not impose any limitations.
[0125] In this optional embodiment, noise information can be extracted from the standardized image content. The noise information includes blemish information and / or watermark information, such as blemishes in a scanned document. The standardized image content can be denoised using a denoising autoencoder and edge-preserving filtering techniques to obtain the target image content. Specifically, the standardized image content can be denoised using a denoising autoencoder and ResNet architecture. Edge-preserving filtering techniques are introduced during the denoising process to ensure that the image clarity is not affected. The embodiment can also support manual denoising by the user and / or automatic denoising by the AI reader according to user needs. This embodiment is not limited to this.
[0126] In this optional embodiment, feature point information is extracted from the standardized image content, and the target stitching region in the standardized image content is predicted based on the feature point information. The feature point information may include boundary feature points of patterns or scenes in the image. The continuity parameters and abrupt change parameters of the feature points can be determined based on the feature point information, thereby predicting the target stitching region in the standardized image content. The target stitching region may represent the segmented region in the image caused by parsing or pagination. Based on the feature point information and the target stitching region, intelligent image stitching operation is performed on the segmented image in the standardized image content to obtain the target image content. It can support both vertical stitching and horizontal stitching modes, and can be automatically selected according to the image layout format. This embodiment does not limit this.
[0127] As can be seen, implementing this optional embodiment enables image enhancement operations on standardized image content based on a super-resolution model when the standardized content includes standardized image content. It extracts noise information from the standardized image content and performs image denoising operations targeting the noise information using a denoising autoencoder and edge-preserving filtering techniques. It also extracts feature point information from the standardized image content, predicts the target stitching region based on the feature point information, and performs intelligent image stitching operations on the segmented images in the standardized image content based on the feature point information and the target stitching region to obtain the target image content. This allows for dynamic adjustment of the enhancement effect based on the image content. The intelligent image stitching technology improves the comic reading experience, achieving intelligent and precise image processing and significantly enhancing the user experience.
[0128] In yet another optional embodiment, device synchronization updates of reading data of a target user on target text content and / or target image content based on an AI NAS device may include the following operations:
[0129] Acquire reading data of the target user on the target device for the target text content and / or target image content. The reading data includes reading progress data and reading settings data, including reading mode settings data and / or display content settings data.
[0130] The reading data is stored in the AI NAS device, and then synchronized and updated to each reading device corresponding to the target user based on the AI NAS device.
[0131] In this optional embodiment, the reading data of the target user on the target device for the target text content and / or target image content may include reading progress data and reading settings data. The reading settings data includes reading mode settings data and / or display content settings data. The reading mode settings data may include night mode, audiobook mode, etc., wherein the night mode can automatically adjust the background color and text brightness to reduce eye fatigue, the audiobook mode can use text-to-speech (TTS) technology to convert the e-book content into speech playback, and supports custom adjustment of speech speed and timbre. The display content settings data may include the display size, color, font, line spacing, etc. of the e-book text content, which are not limited in this embodiment.
[0132] In this optional embodiment, the reading data can be stored in an AI NAS device, and the user can log in to their account on multiple devices. When the user reads using an AI reader on a target device, the reading data can be stored in the AI NAS device based on real-time data synchronization via WebSocket. Then, the reading data is synchronized and updated to each reading device corresponding to the target user based on the AI NAS device, ensuring that the reading content is consistent when the user switches between multiple devices. During the data transmission and synchronization process, the data is encrypted to ensure user data security. This embodiment does not impose any limitations.
[0133] As can be seen, implementing this optional embodiment can acquire the reading data of the target user on the target device for the target text content and / or the target image content, store the reading data in the AI NAS device, and synchronize and update the reading data to each reading device corresponding to the target user based on the AI NAS device. This enables the user's reading progress and settings to be seamlessly synchronized across different devices, providing a consistent reading experience. It can also reduce the parsing waiting time when the user reads on different devices, reduce the number of file parsing operations, and improve the user's reading fluency.
[0134] Example 2
[0135] Please see Figure 2 , Figure 2 This is a flowchart illustrating a text and image reader processing method disclosed in an embodiment of the present invention. Figure 2 The described processing method for a text and image reader can be applied to a processing device for a text and image reader, which can be applied to an AI NAS device. This processing device may include an intelligent server or intelligent platform for standardized parsing of e-books and / or electronic images. The intelligent server may include an AI NAS end server or a cloud server; however, this embodiment of the invention does not impose limitations. Figure 2 As shown, the processing method of this image and text reader may include the following operations:
[0136] 201. Based on the multi-format parsing model preset in the AI NAS device, the standardized parsing task corresponding to the target file is decomposed into multiple sub-tasks corresponding to the target file.
[0137] In this embodiment of the invention, optionally, the multi-format parsing model preset in the AI NAS device can be a deep learning model based on the Transformer architecture and employing a distributed computing framework. This model can decompose the standardized parsing task of the target file into multiple sub-tasks corresponding to the target file. For example, when the target file is an e-book file, the sub-tasks may include extracting text content, parsing chapter structure, and extracting embedded images or tables, etc.; when the target file is an electronic image file, the sub-tasks may include decompressing image sequences, image preprocessing, and extracting text from images, etc. This invention does not impose any limitations.
[0138] 202. Dynamically evaluate the computational requirements of each subtask based on the preset task scheduling algorithm, and determine the parallel processing mechanism of the target file according to the computational requirements of each subtask.
[0139] In this embodiment of the invention, optionally, the preset task scheduling algorithm may include the Round-Robin algorithm or the dynamic priority scheduling algorithm. The computational requirements of each subtask can be dynamically evaluated based on the preset task scheduling algorithm. Specifically, the file size and task complexity corresponding to each subtask can be determined, and then the computational requirements of each subtask can be evaluated. Then, based on the computational requirements of each subtask, the parallel processing mechanism of the target file is determined. This invention does not impose any limitations.
[0140] In this embodiment of the invention, the parallel processing mechanism for the target file may optionally include a multi-threaded parallel processing mechanism, a distributed parallel processing mechanism, or a GPU-accelerated processing mechanism. The multi-threaded parallel processing mechanism may utilize multi-core CPUs and multi-threading technology on a single AI NAS device to process different files or different sub-tasks of the same file through concurrent threads. The distributed parallel processing mechanism may combine the distributed architecture of the AI NAS device to distribute the file parsing task to multiple AI NAS nodes or cloud servers for parallel processing. The GPU-accelerated processing mechanism may utilize GPU accelerators to further improve the processing speed for computationally intensive tasks (such as running deep learning models or image processing). This invention is not limited to these specific mechanisms.
[0141] 203. Based on the parallel processing mechanism, each subtask is processed in parallel using a multi-format parsing model to obtain the task processing result corresponding to each subtask.
[0142] Optionally, in this embodiment of the invention, each subtask can be processed in parallel using a multi-format parsing model according to a parallel processing mechanism to obtain the task processing result corresponding to each subtask. During the processing, asynchronous I / O technology can be used to perform I / O operations (such as file reading and writing) in parallel during the parsing process to avoid performance degradation caused by I / O bottlenecks. This invention does not impose any limitations.
[0143] 204. Merge the results of each task to obtain standardized content.
[0144] In this embodiment of the invention, optionally, after each subtask is completed, the results of the subtasks can be merged through a unified task manager to generate a standardized output data structure.
[0145] 205. When the standardized content includes standardized text content, perform text analysis on the standardized text content to obtain the target text content.
[0146] 206. When the standardized content includes standardized image content, perform image processing operations on the standardized image content to obtain the target image content.
[0147] 207. Display target text content and / or target image content based on a pre-built cross-device responsive interface, and update the target user's reading data on the target text content and / or target image content on the AI NAS device.
[0148] In this embodiment of the invention, for other descriptions of steps 205-207, please refer to the detailed description of steps 102-104 in Embodiment 1 of the invention. These descriptions will not be repeated in this embodiment of the invention.
[0149] It is evident that implementation Figure 2The described image and text reader's processing method decomposes the standardized parsing task of the target file based on a pre-set multi-format parsing model in the AI NAS device, obtaining multiple sub-tasks corresponding to the target file. It dynamically evaluates the computational requirements of each sub-task based on a pre-set task scheduling algorithm, determines the parallel processing mechanism for the target file according to the computational requirements of each sub-task, and performs parallel processing on each sub-task through the multi-format parsing model according to the parallel processing mechanism, obtaining the task processing result for each sub-task. The results of each task processing are then merged to obtain standardized content. This method reduces parsing time and improves resource utilization efficiency during file parsing through parallel processing, reducing performance degradation caused by single-task overload or resource waste. It parses multiple file formats simultaneously, sharing model training knowledge, thus improving the efficiency and accuracy of multi-format parsing. It performs text analysis on standardized text content to obtain target text content, and performs image processing on standardized image content to obtain target image content. The target text content and / or target image content are displayed based on a pre-built cross-device responsive interface, and are based on AI... NAS devices can synchronize and update reading data of target users for target text content and / or target image content, migrate parsing and rendering tasks to AI NAS devices, support multi-format parsing, improve parsing efficiency, support cross-device synchronization, and improve user experience.
[0150] In an optional embodiment, the processing method of the image and text reader may further include the following operations:
[0151] Build deep learning models based on a preset structure in AI NAS devices;
[0152] The program identifies multiple file formats to be trained and generates data structure labels for each file format, including chapter information labels, text content labels, and image resource labels.
[0153] Obtain the training file and train the deep learning model based on the training file and the data structure labels of each file format to obtain a multi-format parsing model. The training file includes sub-files with multi-format complexity, which include multilingual text sub-files and multi-resolution image sub-files.
[0154] In this optional embodiment, the deep learning model with the preset structure may include a deep learning model based on the Transformer architecture and employing a distributed computing framework. The various file formats to be trained may include PDF, EPUB, CBZ, and CBR, etc. The data structure tags for each file format may include chapter information tags, text content tags, and image resource tags, and each file format's data structure tags have a unified data structure, which is not limited in this embodiment.
[0155] In this optional embodiment, the training file may include sub-files with multiple format complexities. These sub-files may include multilingual text sub-files and multi-resolution image sub-files. Preprocessing of the training file may include, but is not limited to, decompression, extraction of XML structure and image sequences. The deep learning model can be trained based on the training file and the data structure tags of each file format to obtain a multi-format parsing model. This embodiment does not impose any limitations on this.
[0156] As can be seen, implementing this optional embodiment can build a deep learning model based on a preset structure in an AI NAS device, determine multiple file formats to be trained, generate data structure labels for each file format, obtain the file to be trained, and train the deep learning model based on the file to be trained and the data structure labels for each file format to obtain a multi-format parsing model. This model can perform multi-format parsing, overcome the limitations of traditional single-format parsing, and improve parsing performance and efficiency.
[0157] In another optional embodiment, training a deep learning model based on the file to be trained and the data structure labels for each file format to obtain a multi-format parsing model may include the following operations:
[0158] Analyze the structural features of the training files, and perform cluster analysis on the training files based on the structural features and the data structure labels of each file format to obtain the analysis results;
[0159] Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model;
[0160] The system obtains new format files and their parsing results from users during the use of the multi-format parsing model. Based on these new format files and their parsing results, it performs incremental learning on the multi-format parsing model to update it.
[0161] In this optional embodiment, the structural features of the files to be trained can be analyzed, and cluster analysis can be performed on the files to be trained based on the structural features and the data structure labels of each file format to obtain the analysis results. Specifically, a feature-based self-supervised learning method can be used to perform cluster analysis on the structural features of the input files to predict the parsing rules for unseen formats. Based on the analysis results, a deep learning model can be trained to obtain a multi-format parsing model. Furthermore, new format files and their parsing results can be obtained from user feedback during the use of the multi-format parsing model. Based on the new format files and their parsing results, incremental learning of the multi-format parsing model can be performed. Through the user feedback mechanism, the training set of the model can be dynamically updated, and the parsing results of new format files can be used for incremental learning of the model to continuously optimize the model's parsing performance and update the multi-format parsing model. This embodiment is not limited to this.
[0162] As can be seen, implementing this optional embodiment can analyze the structural features of the files to be trained, and perform cluster analysis on the files to be trained based on the structural features and data structure labels of each file format to obtain the analysis results. Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model. The model can obtain new format files and their parsing results from users during the use of the multi-format parsing model, and perform incremental learning on the multi-format parsing model based on the new format files and their parsing results to update the multi-format parsing model. This can improve the model's adaptability to unseen format files based on adversarial training and self-supervised learning, and improve the long-term scalability of the parsing model through incremental learning, further improving the model's flexibility and parsing efficiency.
[0163] Example 3
[0164] Please see Figure 3 , Figure 3 This is a flowchart illustrating a training method for a multi-format parsing model disclosed in an embodiment of the present invention. Figure 3 The training method for the described multi-format parsing model can be applied to the processing device of a text and image reader, which can be applied to an AI NAS device. The processing device of the text and image reader may include an intelligent server or intelligent platform for standardizing and parsing e-books and / or electronic images. The intelligent server may include an AI NAS end server or a cloud server; this embodiment of the invention is not limited thereto. Figure 3 As shown, the training method for this multi-format parsing model can include the following operations:
[0165] 301. Construct a deep learning model in the AI NAS device based on a preset framework.
[0166] In this embodiment of the invention, optionally, the preset framework may include a deep learning framework and a distributed computing framework. The deep learning framework may include the Transformer architecture, and the distributed computing framework may be a framework based on a multi-task scheduling algorithm, such as the Round-Robin algorithm or a dynamic priority scheduling algorithm. That is, the deep learning model can achieve multi-file format parsing and support multiple input streams through multi-task learning, and process inputs of different formats simultaneously in the same model. Each file format corresponds to an input encoder, and the encoder output is uniformly parsed through a shared decoder. This allows the model to share weights between different formats while retaining the optimization capability for specific formats. This invention does not limit the scope of the invention.
[0167] 302. Determine the various file formats to be trained and generate data structure labels for each file format.
[0168] In this embodiment of the invention, the various file formats to be trained may optionally include PDF, EPUB, CBZ, and CBR, etc. The data structure tags for each file format may include chapter information tags, text content tags, and image resource tags, and the data structure tags for each file format have a unified data structure, which is not limited in this invention.
[0169] 303. Obtain the training file and train the deep learning model based on the training file and the data structure labels of each file format to obtain a multi-format parsing model.
[0170] In this embodiment of the invention, optionally, the training file may include sub-files with multiple format complexities. These sub-files may include multilingual text sub-files and multi-resolution image sub-files. Preprocessing of the training file may include, but is not limited to, decompression, extraction of XML structure and image sequences. The deep learning model can be trained based on the training file and the data structure tags of each file format to obtain a multi-format parsing model. This invention does not impose any limitations.
[0171] It is evident that implementation Figure 3 The training method for the described multi-format parsing model can build a deep learning model based on a preset structure in an AI NAS device, determine multiple file formats to be trained, generate data structure labels for each file format, obtain the file to be trained, and train the deep learning model based on the file to be trained and the data structure labels for each file format to obtain a multi-format parsing model. This model can perform multi-format parsing, overcome the limitations of traditional single-format parsing, and improve parsing performance and efficiency.
[0172] In an optional embodiment, training a deep learning model based on the file to be trained and the data structure labels for each file format to obtain a multi-format parsing model may include the following operations:
[0173] Analyze the structural features of the training files, and perform cluster analysis on the training files based on the structural features and the data structure labels of each file format to obtain the analysis results;
[0174] Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model;
[0175] The system obtains new format files and their parsing results from users during the use of the multi-format parsing model. Based on these new format files and their parsing results, it performs incremental learning on the multi-format parsing model to update it.
[0176] In this optional embodiment, a feature-based self-supervised learning method can be used to perform cluster analysis on the structural features of the input file to predict parsing rules for unseen formats. Based on the analysis results, a deep learning model can be trained to obtain a multi-format parsing model. Furthermore, the model can obtain feedback from users regarding new format files and their parsing results during the use of the multi-format parsing model. Based on these new format files and their parsing results, the multi-format parsing model can undergo incremental learning. Through a user feedback mechanism, the training set of the model can be dynamically updated, and the parsing results of new format files can be used for incremental learning to continuously optimize the model's parsing performance and update the multi-format parsing model. This embodiment does not impose any limitations on this method.
[0177] As can be seen, implementing this optional embodiment can analyze the structural features of the files to be trained, and perform cluster analysis on the files to be trained based on the structural features and data structure labels of each file format to obtain the analysis results. Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model. The model can obtain new format files and their parsing results from users during the use of the multi-format parsing model, and perform incremental learning on the multi-format parsing model based on the new format files and their parsing results to update the multi-format parsing model. This can improve the model's adaptability to unseen format files based on adversarial training and self-supervised learning, and improve the long-term scalability of the parsing model through incremental learning, further improving the model's flexibility and parsing efficiency.
[0178] In another optional embodiment, the structural features of the files to be trained are analyzed, and clustering analysis is performed on the files to be trained based on the structural features and the data structure labels of each file format. The analysis results may include the following operations:
[0179] The training files are preprocessed to obtain file content, which includes text structure content and / or image sequence content.
[0180] The file content is tagged according to the data structure tags of each file format to obtain the tagged file content. The structure of the tagged file content is then analyzed to obtain the structural features of the tagged file content.
[0181] Based on the content and structural characteristics of the tagged files, cluster analysis is performed on the training files to obtain the analysis results.
[0182] In this optional embodiment, the training file can be preprocessed to obtain the file content. Specifically, the training file can be decompressed, and the file content can be extracted. The file content may include text structure content and / or image structure content. The text structure content includes XML structure content. The file content can be tagged according to the data structure tags of each file format to obtain the processed tagged file content. That is, the file content is assigned a unified data tag according to the data structure tags of each file format. The tagged file content is subjected to structural analysis to obtain the structural features of the tagged file content. Based on the tagged file content and the structural features, the training file is subjected to cluster analysis to obtain the analysis results. This embodiment is not limited.
[0183] As can be seen, implementing this optional embodiment can preprocess the training files to obtain file content, label the file content according to the data structure labels of each file format, obtain processed labeled file content, perform structural analysis on the labeled file content to obtain the structural features of the labeled file content, and perform cluster analysis on the training files based on the labeled file content and structural features to obtain the analysis results. This can improve the model training accuracy of the multi-format parsing model, thereby improving the parsing efficiency and performance of the multi-format parsing model.
[0184] Example 4
[0185] Please see Figure 4 , Figure 4 This is a schematic diagram of the processing device for a text and image reader disclosed in an embodiment of the present invention. Figure 4 The processing device of the described image and text reader can be applied in AI NAS devices. The processing device may include an intelligent server or intelligent platform for standardized parsing of e-books and / or electronic images. The intelligent server may include an AI NAS end server or a cloud server; this embodiment of the invention is not limited thereto. Figure 4 As shown, the processing device of the image and text reader may include:
[0186] The parsing module 401 is used to perform standardized parsing of the target file based on the multi-format parsing model preset in the AI NAS device to obtain standardized content, which includes standardized text content and / or standardized image content.
[0187] The text analysis module 402 is used to perform text analysis operations on the standardized text content when the standardized content includes standardized text content, to obtain the target text content. The text analysis operations include at least one of chapter division operation, image and text separation operation, and language translation operation.
[0188] Image processing module 403 is used to perform image processing operations on the standardized image content when the standardized content includes standardized image content, to obtain the target image content. The image processing operations include at least one of image enhancement operation, image denoising operation, and intelligent image stitching operation.
[0189] Display module 404 is used to display target text content and / or target image content based on a pre-built cross-device responsive interface;
[0190] The update module 405 is used to perform device synchronization updates based on the AI NAS device for the target user's reading data on the target text content and / or target image content.
[0191] It is evident that implementation Figure 4 The processing device of the described image and text reader can perform standardized parsing of target files based on a preset multi-format parsing model in the AI NAS device to obtain standardized content, perform text analysis on the standardized text content to obtain target text content, perform image processing on the standardized image content to obtain target image content, display the target text content and / or target image content based on a pre-built cross-device responsive interface, and perform device-synchronized updates based on the target user's reading data of the target text content and / or target image content on the AI NAS device. It can migrate parsing and rendering tasks to the AI NAS device, support multi-format parsing, improve parsing efficiency, support cross-device synchronization, and improve user experience.
[0192] In an optional embodiment, such as Figure 5 As shown, the parsing module 401 performs standardized parsing of the target file based on the preset multi-format parsing model in the AI NAS device. The specific methods for obtaining standardized content include:
[0193] Based on the multi-format parsing model preset in the AI NAS device, the standardized parsing task corresponding to the target file is decomposed into multiple sub-tasks corresponding to the target file.
[0194] The computational requirements of each subtask are dynamically evaluated based on a preset task scheduling algorithm, and the parallel processing mechanism of the target file is determined according to the computational requirements of each subtask.
[0195] Based on the parallel processing mechanism, each subtask is processed in parallel using a multi-format parsing model to obtain the task processing result corresponding to each subtask.
[0196] The results of each task are merged to obtain standardized content.
[0197] It is evident that implementation Figure 5 The described image and text reader's processing device can decompose the standardized parsing task corresponding to the target file based on a pre-set multi-format parsing model in the AI NAS device, obtaining multiple sub-tasks corresponding to the target file. It dynamically evaluates the computational requirements of each sub-task based on a pre-set task scheduling algorithm, and determines the parallel processing mechanism for the target file according to the computational requirements of each sub-task. Based on the parallel processing mechanism, it performs parallel processing on each sub-task through the multi-format parsing model, obtaining the task processing result corresponding to each sub-task. The results of each task processing are merged to obtain standardized content. During file parsing, the parallel processing mechanism reduces parsing time, improves resource utilization efficiency, and reduces performance degradation caused by single-task overload or resource waste. It can parse multiple file formats simultaneously, while sharing the model's training knowledge, improving the efficiency and accuracy of multi-format parsing. It performs text analysis operations on standardized text content to obtain the target text content, and performs image processing operations on standardized image content to obtain the target image content. It displays the target text content and / or target image content based on a pre-built cross-device responsive interface, and utilizes AI... NAS devices can synchronize and update reading data of target users for target text content and / or target image content, migrate parsing and rendering tasks to AI NAS devices, support multi-format parsing, improve parsing efficiency, support cross-device synchronization, and improve user experience.
[0198] In another alternative embodiment, such as Figure 5 As shown, when the standardized content includes standardized text content, the text analysis module 402 performs text analysis on the standardized text content to obtain the target text content in the following specific ways:
[0199] When standardized content includes standardized text content, the chapter information of the standardized text content is identified by combining preset chapter features, and the standardized text content is divided into chapters based on the chapter information to obtain the target text content. The chapter information includes the chapter name and hierarchical relationship; and / or,
[0200] Extract image and text features from standardized text content, and jointly encode the image and text features to obtain the encoding result;
[0201] Based on preset feature weights and encoding results, image-text separation is performed on standardized text content to obtain the target text content; and / or,
[0202] Identify non-defined language text content within standardized text content, and perform language translation operations on the non-defined language text content based on user needs to obtain the target text content. User needs include translation language type requirements and / or translation scope requirements.
[0203] It is evident that implementation Figure 5 The processing device of the described image-text reader can identify the chapter information of the standardized text content by combining preset chapter features when the standardized content includes standardized text content. Based on the chapter information, it performs chapter division operation on the standardized text content, extracts image features and text features from the standardized text content, and performs joint encoding on the image features and text features to obtain the encoding result. Based on preset feature weights and the encoding result, it performs image-text separation operation on the standardized text content, identifies non-defined language text content in the standardized text content, and performs language translation operation on the non-defined language text content based on user needs to obtain the target text content. This can improve the efficiency of distinguishing image-text content, optimize the presentation method, improve the readability of non-native language content, and enhance the user experience.
[0204] In yet another alternative embodiment, such as Figure 5 As shown, when the standardized content includes standardized image content, the image processing module 403 performs image processing operations on the standardized image content to obtain the target image content in the following specific ways:
[0205] When the standardized content includes standardized image content, image enhancement operations are performed on the standardized image content based on a super-resolution model to obtain the target image content; and / or,
[0206] Noise information is extracted from the standardized image content, and image denoising is performed on the standardized image content using a denoising autoencoder and edge-preserving filtering techniques to obtain the target image content. The noise information includes blemish information and / or watermark information; and / or...
[0207] Extract feature point information from the standardized image content, and predict the target stitching region in the standardized image content based on the feature point information;
[0208] Based on feature point information and the target stitching region, intelligent image stitching is performed on the segmented images in the standardized image content to obtain the target image content.
[0209] It is evident that implementation Figure 5 The processing device of the described image reader can perform image enhancement operations on the standardized image content based on a super-resolution model when the standardized content includes standardized image content. It extracts noise information from the standardized image content and performs image denoising operations targeting the noise information using a denoising autoencoder and edge-preserving filtering techniques. It also extracts feature point information from the standardized image content, predicts the target stitching region based on the feature point information, and performs intelligent image stitching operations on the segmented images in the standardized image content based on the feature point information and the target stitching region to obtain the target image content. The device can dynamically adjust the enhancement effect according to the image content. This intelligent image stitching technology improves the comic reading experience, achieves intelligent and precise image processing, and significantly improves the user experience.
[0210] In yet another alternative embodiment, such as Figure 5 As shown, the specific methods by which the update module 405 synchronizes and updates the reading data of the target user for the target text content and / or target image content based on the AI NAS device include:
[0211] Acquire reading data of the target user on the target device for the target text content and / or target image content. The reading data includes reading progress data and reading settings data, including reading mode settings data and / or display content settings data.
[0212] The reading data is stored in the AI NAS device, and then synchronized and updated to each reading device corresponding to the target user based on the AI NAS device.
[0213] It is evident that implementation Figure 5 The processing device of the described image and text reader can acquire reading data of a target user on a target device for target text content and / or target image content, store the reading data in an AI NAS device, and synchronize and update the reading data to each reading device corresponding to the target user based on the AI NAS device. This enables seamless synchronization of the user's reading progress and settings across different devices, providing a consistent reading experience. It also reduces the parsing waiting time when the user is reading on different devices, reduces the number of file parsing operations, and improves the user's reading fluency.
[0214] In yet another alternative embodiment, such as Figure 5 As shown, the processing device of the image and text reader may further include:
[0215] Module 406 is used to build deep learning models in AI NAS devices based on a preset structure;
[0216] The determination module 407 is used to determine the various file formats to be trained and generate data structure labels for each file format. The data structure labels include chapter information labels, text content labels, and image resource labels.
[0217] The acquisition module 408 is used to acquire the training file and train the deep learning model based on the training file and the data structure labels of each file format to obtain a multi-format parsing model. The training file includes sub-files with multi-format complexity, which include multilingual text sub-files and multi-resolution image sub-files.
[0218] It is evident that implementation Figure 5 The processing device of the described image and text reader can build a deep learning model based on a preset structure in an AI NAS device, determine multiple file formats to be trained, generate data structure labels for each file format, obtain the file to be trained, and train the deep learning model based on the file to be trained and the data structure labels for each file format to obtain a multi-format parsing model. This model can perform multi-format parsing, overcome the limitations of traditional single-format parsing, and improve parsing performance and efficiency.
[0219] In yet another alternative embodiment, such as Figure 5 As shown, the acquisition module 408 trains the deep learning model based on the file to be trained and the data structure labels of each file format, and obtains the multi-format parsing model in the following specific ways:
[0220] Analyze the structural features of the training files, and perform cluster analysis on the training files based on the structural features and the data structure labels of each file format to obtain the analysis results;
[0221] Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model;
[0222] The system obtains new format files and their parsing results from users during the use of the multi-format parsing model. Based on these new format files and their parsing results, it performs incremental learning on the multi-format parsing model to update it.
[0223] It is evident that implementation Figure 5The processing device of the described image and text reader can analyze the structural features of the files to be trained, and perform cluster analysis on the files to be trained based on the structural features and data structure labels of each file format to obtain the analysis results. Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model. The device can obtain new format files and their parsing results from users during the use of the multi-format parsing model, and perform incremental learning on the multi-format parsing model based on the new format files and their parsing results to update the multi-format parsing model. It can improve the model's adaptability to unseen format files based on adversarial training and self-supervised learning, and improve the long-term scalability of the parsing model through incremental learning, thereby further improving the model's flexibility and parsing efficiency.
[0224] Example 5
[0225] This invention discloses a processing system for a text and image reader, characterized in that the system includes at least an electronic device and an AI NAS device, and the electronic device and the AI NAS device are communicatively connected;
[0226] The electronic device is configured with an application that can access the AI NAS device and read the target text content and / or target image content processed by the AI NAS device based on the image reader processing method described in Embodiment 1 or Embodiment 2 of the present invention.
[0227] Example 6
[0228] This invention discloses a computer storage medium storing computer instructions. When these computer instructions are invoked, they are used to execute the steps in the image reader processing method described in Embodiment 1 or Embodiment 2 of this invention.
[0229] Example 7
[0230] This invention discloses a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform the steps in the image reader processing method described in Embodiment 1 or Embodiment 2.
[0231] The device embodiments described above are merely illustrative. The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; that is, they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0232] Through the detailed description of the above embodiments, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, including read-only memory (ROM), random access memory (RAM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), one-time programmable read-only memory (OTPROM), electrically-Erasable Programmable Read-Only Memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, disk storage, magnetic tape storage, or any other computer-readable medium that can be used to carry or store data.
[0233] Finally, it should be noted that the processing method, model training method, apparatus, and system for a text reader disclosed in the embodiments of the present invention are merely preferred embodiments of the present invention and are only used to illustrate the technical solutions of the present invention, not to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A processing method for a text and image reader, the method being applied in an AI NAS device, characterized in that, The method includes: The target file is standardized by using the multi-format parsing model preset in the AI NAS device to obtain standardized content, which includes standardized text content and / or standardized image content. When the standardized content includes the standardized text content, a text analysis operation is performed on the standardized text content to obtain the target text content. The text analysis operation includes at least one of chapter division operation, image-text separation operation, and language translation operation. When the standardized content includes the standardized image content, image processing operations are performed on the standardized image content to obtain the target image content. The image processing operations include at least one of image enhancement operations, image denoising operations, and intelligent image stitching operations. The target text content and / or the target image content are displayed based on a pre-built cross-device responsive interface, and the target user's reading data on the target text content and / or the target image content is updated synchronously on the device based on the AI NAS device; Furthermore, the method further includes: A deep learning model is constructed in the AI NAS device based on a preset structure; The system identifies multiple file formats to be trained and generates data structure labels for each file format, including chapter information labels, text content labels, and image resource labels. Obtain the training file and train the deep learning model based on the training file and the data structure labels of each file format to obtain the multi-format parsing model. The training file includes sub-files with multi-format complexity, which include multilingual text sub-files and multi-resolution image sub-files. And, the step of training the deep learning model based on the training file and the data structure labels of each file format to obtain the multi-format parsing model includes: The structural features of the files to be trained are analyzed, and cluster analysis is performed on the files to be trained based on the structural features and the data structure labels of each file format to obtain the analysis results; Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model; The system obtains new format files and their parsing results from users during the use of the multi-format parsing model, and performs incremental learning on the multi-format parsing model based on these files and their parsing results to update the multi-format parsing model.
2. The processing method of the image and text reader according to claim 1, characterized in that, The standardization parsing of the target file based on the preset multi-format parsing model in the AI NAS device yields standardized content, including: Based on the multi-format parsing model preset in the AI NAS device, the standardized parsing task corresponding to the target file is decomposed into multiple sub-tasks corresponding to the target file; The computational requirements of each subtask are dynamically evaluated based on a preset task scheduling algorithm, and the parallel processing mechanism of the target file is determined according to the computational requirements of each subtask. According to the parallel processing mechanism, each subtask is processed in parallel using the multi-format parsing model to obtain the task processing result corresponding to each subtask. The results of each task processing are merged to obtain standardized content.
3. The processing method of the image and text reader according to claim 1 or 2, characterized in that, When the standardized content includes the standardized text content, a text analysis operation is performed on the standardized text content to obtain the target text content, including: When the standardized content includes the standardized text content, the chapter information of the standardized text content is identified by combining preset chapter features, and the standardized text content is divided into chapters based on the chapter information to obtain the target text content. The chapter information includes chapter name and hierarchical relationship; and / or, Extract image features and text features from the standardized text content, and jointly encode the image features and text features to obtain the encoding result; Based on preset feature weights and the encoding results, the standardized text content is subjected to image-text separation to obtain the target text content; and / or, Identify non-defined language text content within the standardized text content, and perform language translation operations on the non-defined language text content based on user needs to obtain target text content. The user needs include translation language type requirements and / or translation range requirements.
4. The processing method of the image and text reader according to claim 1 or 2, characterized in that, When the standardized content includes the standardized image content, image processing operations are performed on the standardized image content to obtain the target image content, including: When the standardized content includes the standardized image content, image enhancement operations are performed on the standardized image content based on a super-resolution model to obtain the target image content; and / or, Noise information is extracted from the standardized image content, and image denoising is performed on the standardized image content using a denoising autoencoder and edge-preserving filtering techniques to target the noise information, thereby obtaining the target image content. The noise information includes blemish information and / or watermark information; and / or... Extract feature point information from the standardized image content, and predict the target stitching region in the standardized image content based on the feature point information; Based on the feature point information and the target stitching region, an intelligent image stitching operation is performed on the segmented images in the standardized image content to obtain the target image content.
5. The processing method of the image and text reader according to claim 1 or 2, characterized in that, The device synchronization update based on the AINAS device for the target user's reading data regarding the target text content and / or the target image content includes: Acquire reading data of a target user on a target device for the target text content and / or the target image content, the reading data including reading progress data and reading setting data, the reading setting data including reading mode setting data and / or display content setting data; The reading data is stored in the AI NAS device, and the reading data is synchronously updated to each reading device corresponding to the target user based on the AI NAS device.
6. A training method for a multi-format parsing model, characterized in that, The method is applied to an AI NAS device, and the method includes: In the AI NAS device, a deep learning model is built based on a preset framework, which includes a deep learning framework and a distributed computing framework. The system identifies multiple file formats to be trained and generates data structure labels for each file format, including chapter information labels, text content labels, and image resource labels. Obtain the training file and train the deep learning model based on the training file and the data structure labels of each file format to obtain a multi-format parsing model. The training file includes sub-files with multi-format complexity, which include multilingual text sub-files and multi-resolution image sub-files. And, the step of training the deep learning model based on the training file and the data structure labels of each file format to obtain a multi-format parsing model includes: The structural features of the files to be trained are analyzed, and cluster analysis is performed on the files to be trained based on the structural features and the data structure labels of each file format to obtain the analysis results; Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model; The system obtains new format files and their parsing results from users during the use of the multi-format parsing model, and performs incremental learning on the multi-format parsing model based on these files and their parsing results to update the multi-format parsing model.
7. The training method for the multi-format parsing model according to claim 6, characterized in that, The analysis involves identifying the structural features of the files to be trained, and then performing cluster analysis on the files based on these structural features and the data structure labels for each file format to obtain the analysis results, including: The file to be trained is preprocessed to obtain the file content, which includes text structure content and / or image sequence content; The file content is tagged according to the data structure tags of each file format to obtain the tagged file content, and the tagged file content is structurally analyzed to obtain the structural features of the tagged file content; Based on the content of the tagged files and the structural features, cluster analysis is performed on the files to be trained to obtain the analysis results.
8. A processing device for a text and image reader, characterized in that, The device is used in an AI NAS device, and the device includes: The parsing module is used to perform standardized parsing of the target file based on the preset multi-format parsing model in the AI NAS device to obtain standardized content, which includes standardized text content and / or standardized image content; The text analysis module is used to perform text analysis operations on the standardized text content when the standardized content includes the standardized text content, to obtain the target text content. The text analysis operations include at least one of chapter division operation, image and text separation operation, and language translation operation. The image processing module is used to perform image processing operations on the standardized image content when the standardized content includes the standardized image content, to obtain the target image content. The image processing operations include at least one of image enhancement operation, image denoising operation, and intelligent image stitching operation. The display module is used to display the target text content and / or the target image content based on a pre-built cross-device responsive interface; The update module is used to perform device synchronization updates based on the AI NAS device for the target user's reading data on the target text content and / or the target image content; The device also includes: A building module is used to build a deep learning model based on a preset structure in the AI NAS device; The determination module is used to determine multiple file formats to be trained and generate data structure labels for each file format, the data structure labels including chapter information labels, text content labels and image resource labels; The acquisition module is used to acquire the training file and train the deep learning model based on the training file and the data structure labels of each file format to obtain the multi-format parsing model. The training file includes sub-files with multi-format complexity, which include multilingual text sub-files and multi-resolution image sub-files. Furthermore, the acquisition module trains the deep learning model based on the file to be trained and the data structure labels of each file format to obtain a multi-format parsing model, specifically including the following methods: The structural features of the files to be trained are analyzed, and cluster analysis is performed on the files to be trained based on the structural features and the data structure labels of each file format to obtain the analysis results; Based on the analysis results, the deep learning model is trained to obtain a multi-format parsing model; The system obtains new format files and their parsing results from users during the use of the multi-format parsing model, and performs incremental learning on the multi-format parsing model based on these files and their parsing results to update the multi-format parsing model.
9. A processing system for a text and image reader, characterized in that, The system includes at least an electronic device and an AINAS device, and the electronic device is communicatively connected to the AINAS device; The electronic device is configured with an application that can access the AI NAS device and read the target text content and / or target image content processed by the AI NAS device based on the processing method of the image reader as described in any one of claims 1-5 through the application.
10. A computer storage medium, characterized in that, The computer storage medium stores computer instructions, which, when invoked, are used to execute the processing method of the image reader as described in any one of claims 1-5.