Document translation method, device, equipment, storage medium and computer program product

By acquiring standard block data from PDF documents using a preset document layout model and translating it, the problem of document format loss caused by OCR technology is solved, achieving accurate document translation and improved results.

CN122242532APending Publication Date: 2026-06-19BEIJING QIHOOD TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING QIHOOD TECHNOLOGY CO LTD
Filing Date
2024-12-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, when using OCR technology to process PDF documents, it is easy to lose the metadata contained in the document itself, resulting in poor document translation quality.

Method used

The standard block data of the target text blocks in the document to be translated is obtained by using a preset document layout model, the standard block data is translated, and a translated document is generated based on the document format.

Benefits of technology

It achieves accurate restoration of document format, thus improving the accuracy of document translation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242532A_ABST
    Figure CN122242532A_ABST
Patent Text Reader

Abstract

This application discloses a document translation method, apparatus, device, storage medium, and computer program product, relating to the field of data processing technology. The method includes: obtaining standard block data corresponding to target text blocks in a document to be translated through a preset document layout model; translating the standard block data to obtain target translation block data; determining the document format corresponding to the document to be translated based on the standard block data; and generating a translated document corresponding to the document to be translated based on the document format and the target translation block data. This invention can process a document to be translated using a preset document layout model to obtain standard block data corresponding to target text blocks in the document, and generate a corresponding translated document based on the document format and the translated standard block data. This solves the technical problem that OCR technology cannot accurately restore the document format when processing documents, resulting in poor document translation quality.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing technology, and in particular to document translation methods, apparatus, devices, storage media, and computer program products. Background Technology

[0002] Translating PDF document content into a specific language according to its format and presenting it side-by-side with the original text on the left and the translation on the right is often referred to as immersive translation. When performing immersive translation, to ensure the translation quality, the original and translated texts need to maintain consistency in layout, text style, and page background as much as possible. In other words, the accuracy of document format reproduction is a crucial factor in determining the quality of immersive translation.

[0003] In existing solutions, immersive translation products typically utilize OCR (Optical Character Recognition) technology to process PDF documents, extracting document formatting for format restoration. However, this method often results in the loss of metadata inherent in the PDF document itself, such as font, font weight, italics, and image information, leading to low accuracy in document restoration and consequently affecting the translation quality. Summary of the Invention

[0004] The main objective of this application is to provide a document translation method, apparatus, device, storage medium, and computer program product, aiming to solve the technical problem that the document format cannot be accurately restored when using OCR technology to process documents in the prior art, resulting in poor document translation effect.

[0005] To achieve the above objectives, this application proposes a document translation method, which includes:

[0006] Obtain standard block data corresponding to the target text block in the document to be translated by using a preset document layout model;

[0007] The standard block data is translated to obtain the target translated block data;

[0008] The document format corresponding to the document to be translated is determined based on the standard block data.

[0009] Based on the document format and the target translation block data, a translation document corresponding to the document to be translated is generated.

[0010] In one embodiment, the step of obtaining standard block data corresponding to the target text block in the document to be translated through a preset document layout model includes:

[0011] The document to be translated is read by a preset document reading program to obtain several original document pages;

[0012] The target block information corresponding to the target text block in the original document page is obtained by using a preset document layout model;

[0013] Based on the target block information, obtain the standard block data corresponding to the target text block.

[0014] In one embodiment, before the step of obtaining the target block information corresponding to the target text block in the original document page through a preset document layout model, the method further includes:

[0015] By using a preset document layout model, block recognition is performed on the original document page to determine all text blocks in the original document page;

[0016] Determine whether there are any abnormal blocks in the text block based on the block information corresponding to the text block;

[0017] If an abnormal block exists, it is removed to obtain the target text block.

[0018] In one embodiment, the step of obtaining the target block information corresponding to the target text block in the original document page through a preset document layout model includes:

[0019] Based on a preset document layout model, the marked text blocks and unmarked text blocks in the target text block are obtained;

[0020] The marked text blocks and the unmarked text blocks are processed into paragraphs to obtain paragraph information for the first block and paragraph information for the second block.

[0021] The target block information corresponding to the target text block is obtained based on the first block paragraph information and the second block paragraph information.

[0022] In one embodiment, the step of performing paragraph processing on the marked text block and the unmarked text block respectively to obtain paragraph information of the first block and paragraph information of the second block includes:

[0023] Determine the first text layout information corresponding to the marked text block, and the second text layout information corresponding to the unmarked text block;

[0024] The marked text block is processed according to the first text layout information to obtain the first block paragraph information;

[0025] The unmarked text block is processed into paragraphs based on the second text layout information to obtain the second block paragraph information.

[0026] In one embodiment, the step of obtaining the standard block data corresponding to the target text block based on the target block information includes:

[0027] The block category corresponding to the target text block is determined based on the target block information;

[0028] The text style information and text layout information corresponding to the target text block are obtained through a preset document parsing program;

[0029] Based on the block category, the text style information, and the text layout information, obtain the standard block data corresponding to the target text block.

[0030] Furthermore, to achieve the above objectives, this application also proposes a document translation apparatus, the apparatus comprising:

[0031] The data acquisition module is used to acquire standard block data corresponding to the target text block in the document to be translated through a preset document layout model;

[0032] The data translation module is used to translate the standard block data to obtain the target translated block data;

[0033] The document format determination module is used to determine the document format corresponding to the document to be translated based on the standard block data.

[0034] The translation document generation module is used to generate a translation document corresponding to the document to be translated based on the document format and the target translation block data.

[0035] In addition, to achieve the above objectives, this application also proposes a document translation device, the device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the document translation method as described above.

[0036] In addition, to achieve the above objectives, this application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, and which, when executed by a processor, implements the steps of the document translation method described above.

[0037] In addition, to achieve the above objectives, this application also provides a computer program product, which includes a computer program that, when executed by a processor, implements the steps of the document translation method described above.

[0038] This application provides a document translation method. The method discloses obtaining standard block data corresponding to target text blocks in a document to be translated through a preset document layout model; translating the standard block data to obtain target translation block data; determining the document format corresponding to the document to be translated based on the standard block data; and generating a translated document corresponding to the document to be translated based on the document format and the target translation block data. Compared to existing technologies that use OCR technology to process documents, which easily loses the metadata contained in the document itself, resulting in low accuracy in document restoration and thus affecting the document translation effect, this invention can process the document to be translated through a preset document layout model to obtain standard block data corresponding to target text blocks in the document, then translate the standard block data, and generate a translated document corresponding to the document to be translated based on the document format and the translated block data obtained after translation. This solves the technical problem in existing technologies where OCR technology cannot accurately restore the document format, resulting in poor document translation effects. Attached Figure Description

[0039] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0040] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0041] Figure 1 This is a flowchart illustrating an embodiment of the document translation method for this application.

[0042] Figure 2 This is an example diagram of standard block data in the document translation method of this application;

[0043] Figure 3 This is a flowchart illustrating Embodiment 2 of the document translation method for this application.

[0044] Figure 4 This is a diagram showing the target text block in the document page of the document translation method in this application;

[0045] Figure 5 This is a flowchart illustrating Embodiment 3 of the document translation method for this application.

[0046] Figure 6 This is a flowchart illustrating the overall method for translating the documents in this application.

[0047] Figure 7This is a schematic diagram of the module structure of the document translation device according to an embodiment of this application;

[0048] Figure 8 This is a schematic diagram of the device structure of the hardware operating environment involved in the document translation method in this application embodiment.

[0049] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0050] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.

[0051] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.

[0052] It should be noted that the executing entity in this embodiment can be a computing service device with data processing, network communication, and program execution functions, such as a tablet computer, personal computer, or mobile phone, or an electronic device or document translation device capable of performing the above functions. The following description uses a document translation device (hereinafter referred to as the device) as an example to illustrate this embodiment and the subsequent embodiments.

[0053] Based on this, embodiments of this application provide a document translation method, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the document translation method of this application.

[0054] In this embodiment, the document translation method includes steps S10 to S40:

[0055] Step S10: Obtain the standard block data corresponding to the target text block in the document to be translated through the preset document layout model.

[0056] It should be noted that the aforementioned preset document layout model can be a model used to process layout and text information in a document, such as a layout model. In this embodiment, the preset document layout model can be used to process document understanding tasks. It can capture visual elements (such as images, tables, text, etc.) and their layout information in the document to be translated, thereby better understanding the overall content and structure of the document. The document to be translated is an image-based document, that is, a document that displays text or patterns in image form. It can be any PDF (Portable Document Format) document that requires text translation, or other documents that meet this condition. For example, the document to be translated can be an English PDF document, which can be translated into a corresponding Chinese document using the document translation method proposed in this embodiment.

[0057] It should be noted that the aforementioned target text block can be a specific area of ​​the document divided based on a comprehensive understanding of the text, layout, and visual information in the document using a preset document layout model. In this embodiment, the target text block may include, but is not limited to, title blocks, reference blocks, formula blocks, table blocks, list blocks, table of contents blocks, code blocks, image blocks, etc.

[0058] It should be understood that the aforementioned standard block data can be normalized data used to describe target text blocks, and this standard block data can be arranged according to a prescribed format. (See reference...) Figure 2 , Figure 2 This is an example diagram of standard block data in the document translation method of this application. (Example:) Figure 2 As shown, the standard block data of the text block in this embodiment may specifically include: document page number (page), text content category (type, including: text, title, image, table, formula, menu, list, concern, etc.), text rendering method (showType, including plain text, markdown syntax (md), text content (data), block identifier (blockld), and block data (blockData). The block data (blockData) may include: text coordinates (bbox), angle (degree), basic text style (style), other styles (elseStyle), and reference position (concern).

[0059] In practical applications, after processing the document to be translated, the layout model can identify text blocks within the document. These text blocks not only contain text content but also incorporate the document's layout information, such as the text's position coordinates on the page (e.g., the coordinates of the top left and bottom right corners), size (e.g., width and height), and relative positional relationships with other text or image elements. Therefore, the device can use this data to determine the standard block data corresponding to the target text block identified by the layout model.

[0060] Step S20: Translate the standard block data to obtain the target translated block data.

[0061] It is understood that the aforementioned target translation block data can be block data obtained by translating the content data in the standard block data. In this embodiment, the device can input the content data in the standard block data into the translation model, and then translate the text content data through the translation model to obtain target translation block data that contains the translated text content data.

[0062] Step S30: Determine the document format corresponding to the document to be translated based on the standard block data.

[0063] It should be understood that the above document format can be various typesetting and layout parameters set in the document to be translated, such as text typesetting layout, style, etc. These parameters can determine the appearance and layout of the document, and the appearance and layout of the document can be adjusted by setting these parameters.

[0064] In this embodiment, the device can directly obtain information such as the coordinate position, size, and style of the target text block based on the standard block data, thereby determining the layout and arrangement of all text blocks in the document to be translated, and thus determining the document format of the document to be translated.

[0065] Step S40: Generate a translation document corresponding to the document to be translated based on the document format and the target translation block data.

[0066] In practical applications, after obtaining the target translation block data corresponding to the text blocks in the document to be translated, the target translation block data can be returned to the web browser or document generation program. This allows the web browser or document generation program to perform content replacement processing on the target translation block data based on the document format, thereby filling the target translation block data into the blank background document and finally generating the translated document corresponding to the document to be translated.

[0067] Further, step S40 includes:

[0068] Step S401: Perform content removal processing on the document to be translated to obtain the background document in the document to be translated.

[0069] It should be noted that the aforementioned background document can be a document obtained after removing the original content from the document to be translated. In this embodiment, the device can use the entire page of the document to be translated as the background image, and then erase the blocks in the page to finally obtain the background document in the document to be translated. The strategy for erasing the blocks can include: taking the average color of the blocks, Gaussian blurring, or having the image processed by an image model. This embodiment does not limit this.

[0070] Furthermore, before step S401, the method further includes: reading the page of the document to be translated using a preset document reading program; during the page reading process, marking the rendering blocks in the document to be translated to obtain the marked rendering blocks.

[0071] It should be understood that the aforementioned preset document reading program can be software or tools capable of opening, viewing, and processing documents. In this embodiment, the preset document reading program can read the document to be translated, thereby retrieving each page from the document.

[0072] It should be noted that the aforementioned rendering blocks can be blocks in the document to be translated that have undergone rendering processing. Correspondingly, the aforementioned marked rendering blocks can be marked rendering blocks.

[0073] Accordingly, step S401 includes: removing the marked rendering blocks to obtain the background document in the document to be translated.

[0074] In this embodiment, the document to be translated can first be read by a preset document reading program. During the page reading process, the currently rendered blocks can be marked. Then, when it is necessary to translate and replace the document to be translated, the document to be translated can be modified to remove the corresponding marked rendering blocks from the rendering instructions, thereby directly clearing the original content in the document to be translated from the source and obtaining the corresponding background document.

[0075] Step S402: Generate a translation document corresponding to the document to be translated based on the document format, the target translation block data, and the background document.

[0076] In practical applications, before replacing the target translation block data corresponding to the target text block in the document to be translated, it is necessary to remove the original content in the document to be translated, and then fill the target translation block data into the corresponding area of ​​the background document in the document to be translated based on the document format, and finally generate the translated document corresponding to the document to be translated.

[0077] This embodiment provides a document translation method. The method discloses obtaining standard block data corresponding to target text blocks in a document to be translated through a preset document layout model; translating the standard block data to obtain target translation block data; determining the document format corresponding to the document to be translated based on the standard block data; and generating a translated document corresponding to the document to be translated based on the document format and the target translation block data. Compared to existing technologies that easily lose metadata contained in the document itself when using OCR technology to process documents, resulting in low accuracy in document restoration and thus affecting the document translation effect, this embodiment can process the document to be translated through a preset document layout model to obtain standard block data corresponding to target text blocks in the document, then translate the standard block data, and generate a translated document corresponding to the document to be translated based on the document format and the translated block data obtained after translation. This solves the technical problem in existing technologies where OCR technology cannot accurately restore the document format when processing documents, leading to poor document translation results.

[0078] Based on the first embodiment of this application, in the second embodiment of this application, the content that is the same as or similar to that in the first embodiment described above can be referred to the above description, and will not be repeated hereafter. Based on this, please refer to... Figure 3 , Figure 3 This is a flowchart illustrating the second embodiment of the document translation method for this application.

[0079] In this embodiment, step S10 includes steps S11 to S13:

[0080] Step S11: Use a preset document reading program to read the pages of the document to be translated to obtain several original document pages.

[0081] It is understood that the aforementioned original document pages can be the unprocessed original pages of the document to be translated. In this embodiment, the device can read each page of the document to be translated using a preset document reading program to obtain all the original document pages in the document to be translated.

[0082] Step S12: Obtain the target block information corresponding to the target text block in the original document page through a preset document layout model.

[0083] It should be understood that the aforementioned target block information can be related information of text blocks identified from the document to be translated by the preset document layout model, such as identifiers, text content, text attributes, text content categories, metadata, and association information. This embodiment does not limit this. Among them, the identifier can be an identifier used to uniquely identify the text block, such as an ID or name; the text content can be the actual text information in the text block; the text attributes can be style information such as the font, font size, color, and alignment (such as left alignment, right alignment, center alignment, etc.); the text content category can be the content type of the text block, such as a title, paragraph, list item, etc.; the metadata can be metadata related to the text block, such as creation time, modification time, author, etc.; and the association information can be the association information between the text block and other document elements (such as images, tables, other text blocks), which may help in understanding the structure and content of the document.

[0084] Furthermore, prior to step S12, the method further includes: performing block recognition on the original document page using a preset document layout model to determine all text blocks in the original document page; determining whether there are abnormal blocks in the text blocks based on the block information corresponding to the text blocks; if there are, removing the abnormal blocks to obtain the target text blocks.

[0085] It should be noted that the block information corresponding to the text block can be the coordinate information of the text block obtained through the preset document layout model. In practical applications, when the preset document layout model performs block recognition on the original document page, it can not only identify all the text blocks on the original document page, but also obtain the coordinate information of these text blocks.

[0086] In the specific implementation, refer to Figure 4 , Figure 4 This is a diagram illustrating the target text blocks within a document page in the document translation method of this application. In this embodiment, a preset document layout model is used to identify blocks within the page of the document to be translated, thereby identifying and marking all text blocks on the page, such as... Figure 4As shown in the red box, these text blocks can include ordinary text blocks, as well as special text blocks such as tables, formulas, titles, citations, and pictures. Among them, the title block can be obtained by annotating the document using the Layout model. If the Layout annotation is inaccurate, it can be determined by the font thickness, size, and layout position of the text on the entire page. The citation block is used to establish association with subscripts, superscripts, links, or special format text (such as commonly used citation formats in papers [1]) on the page. It can also be obtained by annotating the document using the Layout model. The image block needs to be processed by OCR and the image processing model. OCR is used to extract text information from the image, and the image processing model can be used to erase the corresponding original text extracted. Finally, the processed image is used as the background, and the translated text is overlaid on the image to achieve the translation of the image. For the list block and the directory block, the device can detect the layout and content format through the program. For example, the end of the directory line is usually a number, and there is a progressive relationship between the size of the numbers. The list is represented by a list style symbol on the left and the text on the right is aligned to the right edge of the symbol. As for code blocks, since code blocks are usually presented with top and bottom borders, resembling tables but with content displayed as a whole block, and the content mostly has progressive indentation and alignment when indenting and unindenting, they can be roughly judged by the program and obtained by training annotations with the help of layout models.

[0087] It is understandable that the above-mentioned abnormal blocks can be text blocks that exhibit overlap, anomalies, or other similar phenomena.

[0088] In practical applications, the device first obtains the original coordinate information of the text in the original document page of the document to be translated through a document parsing program. At the same time, it can perform block recognition on the original document page through a layout model to obtain the text blocks in the original text page and the corresponding coordinate information of the text blocks. Then, it can compare the coordinate information with the original coordinate information to determine whether there are overlapping, abnormal or other phenomena in the blocks currently identified by the layout model. If so, these text blocks are identified as abnormal blocks and removed to obtain the target text block.

[0089] Further, step S12 includes:

[0090] Step S12a: Obtain the marked text blocks and unmarked text blocks in the target text block based on the preset document layout model.

[0091] It should be noted that the marked text blocks mentioned above can be text blocks that have been fully recognized by the preset document layout model; the unmarked blocks mentioned above can be text blocks that have not been fully recognized or have been misrecognized by the preset document layout model.

[0092] Step S12b: Perform paragraph processing on the marked text block and the unmarked text block respectively to obtain paragraph information of the first block and paragraph information of the second block.

[0093] It should be understood that paragraph processing of text blocks can be the process of splitting the text within a text block into natural paragraphs.

[0094] It should be noted that the above-mentioned first block paragraph information can be the position information of paragraphs in the marked text block; correspondingly, the above-mentioned second block paragraph information can be the position information of paragraphs in the unmarked text block.

[0095] Specifically, step S12b includes: determining the first text layout information corresponding to the marked text block and the second text layout information corresponding to the unmarked text block; performing paragraph processing on the marked text block according to the first text layout information to obtain the first block paragraph information; and performing paragraph processing on the unmarked text block according to the second text layout information to obtain the second block paragraph information.

[0096] It should be noted that the aforementioned first text layout information can be the marked text block and the distribution information of the text within the marked text block, such as the left and right boundaries of the block, whether the text belongs to the end of a line, the distance between the end of the line and the right boundary, whether the block has an ending punctuation mark, and whether the next line of text has a first-line indent, etc. This embodiment does not limit this. Correspondingly, the aforementioned second text layout information can be the unmarked text block and the distribution information of the text within the unmarked text block, such as the text block boundaries, the horizontal and vertical spacing of the text block, etc. This embodiment does not limit this.

[0097] In practical applications, when segmenting target text blocks, the processing methods differ for text blocks fully recognized by the Layout model and those that are incompletely or incorrectly recognized. Therefore, this embodiment first uses the recognition results of the Layout model to determine the fully recognized marked text blocks and the incompletely or incorrectly recognized unmarked text blocks within the target text block. For marked text blocks, the natural paragraph endings within the marked text block can be determined directly by comprehensively considering information such as the left and right boundaries of the block, whether the text belongs to the end of a line, the distance between the end of the line and the right boundary, the presence of a closing punctuation mark, and whether the next line has a first-line indent. Paragraph processing is then performed based on the determined natural paragraph endings to obtain the first block's paragraph information. Furthermore, for unmarked text blocks, the text content can be sorted from top to bottom and from left to right. The natural paragraph endings within the unmarked text block can be determined comprehensively using previously determined text block boundaries, horizontal and vertical spacing, and other information. Paragraph processing is then performed based on the determined natural paragraph endings to obtain the second block's paragraph information.

[0098] Step S12c: Obtain the target block information corresponding to the target text block based on the first block paragraph information and the second block paragraph information.

[0099] In this embodiment, the device can determine the coordinate information of the marked text blocks and unmarked text blocks in the target text block according to the first block paragraph information and the second block paragraph information, respectively, and perform paragraph splitting based on these coordinate information, thereby splitting out all paragraphs in the target text block, and finally obtaining the target block information corresponding to the target text block based on the block information corresponding to these paragraphs.

[0100] Furthermore, prior to step S12c, the method further includes: detecting whether a first special format block exists in the marked text block, the first special format block including at least one of a formula block and a table block; if it exists, performing semantic processing on the first special format block to obtain block semantic information.

[0101] It is understandable that the first special format block mentioned above can be a block with special text format, such as a formula block and a table block.

[0102] It should be understood that the aforementioned block semantic information can be the text content information in the first special format block.

[0103] In the specific implementation, if the first special format block is a formula block, semantic processing of the formula block can extract all the text and parameters in the formula block to obtain the corresponding block semantic information. Whether the formula block is inline or a whole block, the formula content needs to be extracted using a formula-specific model and finally embedded into the markdown content using LaTeX syntax. If the first special format block is a table block, semantic processing of the table block can extract all the text and parameters in the table block to obtain the corresponding block semantic information. Table blocks are relatively complex. Most table processing models can convert table images into markdown tables, but in immersive translation, markdown tables lose the original table layout and cannot express cell spanning. Therefore, in this embodiment, when processing table blocks, the text blocks can be uniformly integrated into cells by scanning the text programmatically.

[0104] Accordingly, step S12c includes: obtaining target block information based on the block semantic information, the first block paragraph information, and the second block paragraph information.

[0105] In this embodiment, after obtaining the block semantic information, the first block paragraph information, and the second block paragraph information, the device can integrate this information to obtain the target block information. Simultaneously, since the block semantic information is text information from formulas and tables, this information can be added to the target block information using Markdown syntax. Furthermore, the showType of the first special format block is marked as markdown (i.e.,...). Figure 2 (md in the context of Markdown). Markdown syntax is a lightweight markup language used for writing and formatting documents. Basic Markdown syntax can include: paragraphs and line breaks, bold and italics, headings, lists, links, and images, etc.

[0106] Step S13: Obtain the standard block data corresponding to the target text block based on the target block information.

[0107] In this embodiment, after determining the target block information corresponding to the target text block, the device can extract the target block information according to the data format required by the standard block data, thereby obtaining the corresponding standard block data.

[0108] This embodiment discloses a method for reading pages of a document to be translated using a preset document reading program to obtain several original document pages; obtaining target block information corresponding to target text blocks in the original document pages using a preset document layout model; and obtaining standard block data corresponding to the target text blocks based on the target block information. Since this embodiment can obtain target block information corresponding to target text blocks in the pages of the document to be translated using a preset document layout model, and obtain corresponding standard block data based on the target block information, it is possible to perform page-by-page translation of the document to be translated, which is beneficial to improving the accuracy of document translation.

[0109] Based on the first and / or second embodiments of this application, in the third embodiment of this application, the content that is the same as or similar to the above embodiments can be referred to the above description, and will not be repeated hereafter. Based on this, please refer to... Figure 5 , Figure 5 This is a flowchart illustrating Embodiment 3 of the document translation method for this application.

[0110] In this embodiment, step S13 includes steps S131 to S133:

[0111] Step S131: Determine the block category corresponding to the target text block based on the target block information.

[0112] It should be understood that the above-mentioned block categories can be the categories of text content in the target text block. In this embodiment, the block categories may include, but are not limited to, titles, citations, block-level formulas, inline formulas, tables, lists, directories, code blocks, images, etc.

[0113] Step S132: Obtain the text style information and text layout information corresponding to the target text block through a preset document parsing program.

[0114] It is understandable that the aforementioned preset document parsing program can be software or tools used to parse documents and extract information from them.

[0115] It should be understood that the above text style information can be the style information of the text in the target text block, including the font, size, bold, italic, and text color; the above text layout information can be the layout information of the text in the target text block, including paragraph first-line indentation, line height, superscript and subscript, and text alignment.

[0116] In this embodiment, a preset document parsing program can be used to parse the document to be translated, thereby extracting the text style information and text layout information of all text blocks in the document to be translated. Then, the text style information and text layout information corresponding to the target text block in the document page can be determined based on this information.

[0117] Step S133: Obtain the standard block data corresponding to the target text block based on the block category, the text style information, and the text layout information.

[0118] In this embodiment, since the standard block data includes document page number, text content category, text rendering method, text content, block identifier and block data, this embodiment can integrate data such as block type, text style information and text layout information corresponding to the target text block to obtain the standard block data corresponding to the target text block.

[0119] Further, step S133 includes:

[0120] Step S133a: Determine whether there is a second special format block in the target text block based on the block category. The second special format block includes at least one of the following: list block, directory block, and code block.

[0121] It is understood that the aforementioned second special format block can be a block with a special text format, such as a list block, a directory block, a code block, etc., and this embodiment does not impose any restrictions on it.

[0122] Step S133b: If it exists, extract information from the second special format block to obtain special block information.

[0123] It should be understood that the aforementioned special block information can be the text content information in the second special format block. In this embodiment, the directory block and list block usually need to be extracted separately, so that the text content in the directory block and list block can be handed over to the translation model for overall translation, thereby preserving more contextual information and improving the overall translation quality.

[0124] Step S133c: Obtain the standard block data corresponding to the target text block based on the block category, the text style information, the text layout information, and the special block information.

[0125] Further, step S133c includes: generating a block list corresponding to the target text block based on the block category, the text style information, the text layout information, and the special block information; determining whether there is associated text in the target text block; if so, performing text association processing on the target text block based on the associated text; adjusting the parameters of the block list according to the text association processing result; and obtaining standard block data corresponding to the target text block based on the adjusted block list.

[0126] It should be noted that the above list of blocks can be a list consisting of data corresponding to the target text blocks. In practical applications, the structure of the block list can be as follows: Figure 2 As shown, the device can first determine the target text block based on its block category, text style information, and text layout information, and then follow the steps outlined below. Figure 2 The list structure shown is used to generate a list of blocks corresponding to the target text block. In addition, if the target text block is a special text block, the corresponding list of blocks can be generated by combining the special block information.

[0127] It should be understood that the aforementioned associated text can be the text separated into columns or pages when the text in two target text blocks is split into columns or pages. Since associated text is usually related to the text in the next column or page in terms of content, in order to improve the accuracy of document translation, the device can perform text association processing on the target text blocks based on the associated text.

[0128] Specifically, the step of performing text association processing on the target text block based on the associated text includes: determining the preceding associated text and the following associated text in the associated text; merging the second text block corresponding to the following associated text into the first text block corresponding to the preceding associated text, so as to perform text association processing on the target text block.

[0129] It should be noted that the aforementioned preceding related text can be the text at the beginning of a text block in two related text segments; correspondingly, the aforementioned following related text can be the text at the end of a text block in two related text segments. The first text block and the second text block are the text blocks to which the preceding and following related texts belong, respectively. In this embodiment, when performing text association processing on the related text in the target text block, the following second text block can be merged into the preceding first text block, thereby achieving column content association or pagination content association.

[0130] It should be noted that after associating the related text in the target text block, the block type and block identifier of the second text block can be set to the block type and block identifier corresponding to its associated block, thereby adjusting the parameters of the block list and completing the extraction of all information in the original document page.

[0131] In the specific implementation, refer to Figure 6 , Figure 6 This is a flowchart illustrating the overall method for translating the documents in this application. Figure 6As shown, the device first uses a document reading program to read pages of the document to be translated, obtaining several document pages. Then, it calls the rendering stack API to retrieve text and images from the document pages, extracting text style information such as coordinates, angles, text type, font, size, color, bold, and italics. Since some metadata in the document is invisible on the page, the device performs visibility and garbled character detection on the text, excluding invisible and garbled text before performing OCR recognition and replacement. Simultaneously, the device can generate page screenshots using the page API, and then call the layout model to perform block recognition on the screenshot, obtaining all text blocks in the screenshot, including text, images, tables, formulas, titles, and citations. Subsequently, the device can determine whether there are overlapping or erroneous areas in these text blocks based on the block information. If so, it removes the overlapping or erroneous areas to obtain the target text block. Then, it determines the marked and unmarked text blocks in the target text block, performs semantic processing on the marked text blocks, and divides the marked and unmarked text blocks into paragraphs to finally obtain all the independent blocks on the page. Finally, the device can perform overall layout and formatting of these text blocks. Specifically, the device can format special blocks such as list blocks, directory blocks, and code blocks, and mark blocks that conform to center alignment and right alignment. It can also associate column content in cases of column breaks and pagination, and associate pagination content between the end-of-page text block and the start-of-page text of the next page. During association, later text blocks can be merged into earlier text blocks. Finally, the block type and block identifier of later text blocks can be set to the block type and block identifier corresponding to its associated blocks, thereby completing the extraction of all information within the original document page and ultimately obtaining standard block data. The data categories in the standard block data format can include: type, showType, data, page, blockld, blockData, etc.

[0132] This embodiment discloses a method for determining the block category corresponding to a target text block based on target block information; obtaining text style information and text layout information corresponding to the target text block through a preset document parsing program; and obtaining standard block data corresponding to the target text block based on the block category, text style information, and text layout information. Since this embodiment can obtain the corresponding standard block data based on the block category, text style information, and text layout information corresponding to the target text block, it can achieve accurate extraction of standard block data for all text blocks in the document to be translated, further improving the accuracy of subsequent document translation.

[0133] It should be noted that the above examples are only for understanding this application and do not constitute a limitation on the translation method of this application document. Any simple modifications based on this technical concept are within the protection scope of this application.

[0134] This application also provides a document translation device; please refer to [reference needed]. Figure 7 The document translation device includes:

[0135] Data acquisition module 10 is used to acquire standard block data corresponding to the target text block in the document to be translated through a preset document layout model;

[0136] Data translation module 20 is used to translate the standard block data to obtain target translation block data;

[0137] The document format determination module 30 is used to determine the document format corresponding to the document to be translated based on the standard block data.

[0138] The translation document generation module 40 is used to generate a translation document corresponding to the document to be translated based on the document format and the target translation block data.

[0139] The document translation device provided in this application, employing the document translation method described in the above embodiments, can solve the technical problem in the prior art where the accurate restoration of document format cannot be achieved when using OCR technology to process documents, resulting in poor document translation quality. Compared with the prior art, the beneficial effects of the document translation device provided in this application are the same as those of the document translation method provided in the above embodiments, and other technical features in the document translation device are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.

[0140] This application provides a document translation device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, which are executed by the at least one processor to enable the at least one processor to perform the document translation method in Embodiment 1 above.

[0141] The following is for reference. Figure 8The diagram illustrates a structural schematic of a document translation device suitable for implementing embodiments of this application. The document translation device in these embodiments may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Portable Application Description), PMPs (Portable Media Players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 8 The document translation device shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.

[0142] like Figure 8 As shown, the document translation device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access memory (RAM) 1004. The RAM 1004 also stores various programs and data required for the operation of the document translation device. The processing unit 1001, ROM 1002, and RAM 1004 are interconnected via a bus 1005. An input / output (I / O) interface 1006 is also connected to the bus. Typically, the following systems can be connected to the I / O interface 1006: input devices 1007 including, for example, a touchscreen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output devices 1008 including, for example, a liquid crystal display (LCD), speaker, vibrator, etc.; storage devices 1003 including, for example, magnetic tape, hard disk, etc.; and communication devices 1009. The communication device 1009 allows the document translation device to communicate wirelessly or wiredly with other devices to exchange data. Although the figure shows document translation devices with various systems, it should be understood that implementing or having all of the systems shown is not required. More or fewer systems may be implemented alternatively.

[0143] Specifically, according to the embodiments disclosed in this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from storage device 1003, or installed from ROM 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of the embodiments disclosed in this application.

[0144] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0145] This application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, the computer-readable program instructions being used to execute the document translation method in the above embodiments.

[0146] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination thereof.

[0147] The readable storage medium provided in this application is a computer-readable storage medium that stores computer-readable program instructions (i.e., a computer program) for executing the above-described document translation method. This solves the technical problem in the prior art where the accurate restoration of document format is impossible when using OCR technology to process documents, resulting in poor document translation quality. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as those of the document translation method provided in the above embodiments, and will not be repeated here.

[0148] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the document translation method described above.

[0149] The computer program product provided in this application can solve the technical problem in the prior art that the document format cannot be accurately restored when using OCR technology to process documents, resulting in poor document translation quality. Compared with the prior art, the beneficial effects of the computer program product provided in this application are the same as those of the document translation method provided in the above embodiments, and will not be repeated here.

[0150] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.

[0151] This invention discloses A1, a document translation method, the method comprising:

[0152] Obtain standard block data corresponding to the target text block in the document to be translated by using a preset document layout model;

[0153] The standard block data is translated to obtain the target translated block data;

[0154] The document format corresponding to the document to be translated is determined based on the standard block data.

[0155] Based on the document format and the target translation block data, a translation document corresponding to the document to be translated is generated.

[0156] A2. As described in A1, the step of obtaining the standard block data corresponding to the target text block in the document to be translated through a preset document layout model includes:

[0157] The document to be translated is read by a preset document reading program to obtain several original document pages;

[0158] The target block information corresponding to the target text block in the original document page is obtained by using a preset document layout model;

[0159] Based on the target block information, obtain the standard block data corresponding to the target text block.

[0160] A3. As described in A2, before the step of obtaining the target block information corresponding to the target text block in the original document page through a preset document layout model, the method further includes:

[0161] By using a preset document layout model, block recognition is performed on the original document page to determine all text blocks in the original document page;

[0162] Determine whether there are any abnormal blocks in the text block based on the block information corresponding to the text block;

[0163] If an abnormal block exists, it is removed to obtain the target text block.

[0164] A4. As described in A2, the step of obtaining the target block information corresponding to the target text block in the original document page through a preset document layout model includes:

[0165] Based on a preset document layout model, the marked text blocks and unmarked text blocks in the target text block are obtained;

[0166] The marked text blocks and the unmarked text blocks are processed into paragraphs to obtain paragraph information for the first block and paragraph information for the second block.

[0167] The target block information corresponding to the target text block is obtained based on the first block paragraph information and the second block paragraph information.

[0168] A5. The method described in A4, wherein the step of performing paragraph processing on the marked text block and the unmarked text block respectively to obtain paragraph information of the first block and paragraph information of the second block includes:

[0169] Determine the first text layout information corresponding to the marked text block, and the second text layout information corresponding to the unmarked text block;

[0170] The marked text block is processed according to the first text layout information to obtain the first block paragraph information;

[0171] The unmarked text block is processed into paragraphs based on the second text layout information to obtain the second block paragraph information.

[0172] A6. As described in A4, before the step of obtaining the target block information corresponding to the target text block based on the first block paragraph information and the second block paragraph information, the method further includes:

[0173] Detect whether there is a first special format block in the marked text block, the first special format block including at least one of: formula block and table block;

[0174] If it exists, the first special format block is semantically processed to obtain the block semantic information;

[0175] The step of obtaining the target block information corresponding to the target text block based on the first block paragraph information and the second block paragraph information includes:

[0176] Target block information is obtained based on the block semantic information, the first block paragraph information, and the second block paragraph information.

[0177] A7. As described in A2, the step of obtaining the standard block data corresponding to the target text block based on the target block information includes:

[0178] The block category corresponding to the target text block is determined based on the target block information;

[0179] The text style information and text layout information corresponding to the target text block are obtained through a preset document parsing program;

[0180] Based on the block category, the text style information, and the text layout information, obtain the standard block data corresponding to the target text block.

[0181] A8. As described in A7, the step of obtaining the standard block data corresponding to the target text block based on the block category, the text style information, and the text layout information includes:

[0182] Based on the block category, it is determined whether there is a second special format block in the target text block. The second special format block includes at least one of the following: list block, directory block, and code block.

[0183] If it exists, information is extracted from the second special format block to obtain the special block information;

[0184] Based on the block category, the text style information, the text layout information, and the special block information, obtain the standard block data corresponding to the target text block.

[0185] A9. The method described in A8, wherein the step of obtaining the standard block data corresponding to the target text block based on the block category, the text style information, the text layout information, and the special block information includes:

[0186] A block list corresponding to the target text block is generated based on the block category, the text style information, the text layout information, and the special block information;

[0187] Determine whether there is related text in the target text block;

[0188] If they exist, then the target text block is processed for text association based on the associated text;

[0189] The parameters of the block list are adjusted based on the text association processing results;

[0190] Obtain the standard block data corresponding to the target text block based on the adjusted block list.

[0191] A10. As described in A9, the step of performing text association processing on the target text block based on the associated text includes:

[0192] Determine the preceding and following associated texts in the associated text;

[0193] The second text block corresponding to the subsequent associated text is merged into the first text block corresponding to the preceding associated text to perform text association processing on the target text block.

[0194] A11. The method as described in any one of A1 to A10, wherein the step of generating a translation document corresponding to the document to be translated based on the document format and the target translation block data includes:

[0195] The document to be translated is processed to remove content, thereby obtaining the background document within the document to be translated.

[0196] Based on the document format, the target translation block data, and the background document, a translation document corresponding to the document to be translated is generated.

[0197] A12. The method described in A11, prior to the step of performing content removal processing on the document to be translated to obtain the background document in the document to be translated, further includes:

[0198] The document to be translated is read page by page using a preset document reading program;

[0199] During the page reading process, the rendering blocks in the document to be translated are marked to obtain the marked rendering blocks;

[0200] The step of removing content from the document to be translated to obtain the background document in the document to be translated includes:

[0201] The marked rendering blocks are removed to obtain the background document in the document to be translated.

[0202] The present invention also discloses B13, a document translation device, the device comprising:

[0203] The data acquisition module is used to acquire standard block data corresponding to the target text block in the document to be translated through a preset document layout model;

[0204] The data translation module is used to translate the standard block data to obtain the target translated block data;

[0205] The document format determination module is used to determine the document format corresponding to the document to be translated based on the standard block data.

[0206] The translation document generation module is used to generate a translation document corresponding to the document to be translated based on the document format and the target translation block data.

[0207] B14. In the apparatus described in B13, the data acquisition module is further configured to read pages of the document to be translated using a preset document reading program to obtain several original document pages; obtain target block information corresponding to the target text block in the original document pages using a preset document layout model; and obtain standard block data corresponding to the target text block based on the target block information.

[0208] B15. In the apparatus described in B14, the data acquisition module is further configured to perform block recognition on the original document page using a preset document layout model to determine all text blocks in the original document page; determine whether there are abnormal blocks in the text blocks based on the block information corresponding to the text blocks; if there are, remove the abnormal blocks to obtain the target text block.

[0209] B16. The apparatus as described in B14, wherein the data acquisition module is further configured to acquire marked text blocks and unmarked text blocks in the target text block based on a preset document layout model; perform paragraph processing on the marked text blocks and the unmarked text blocks respectively to obtain first block paragraph information and second block paragraph information; and acquire target block information corresponding to the target text block based on the first block paragraph information and the second block paragraph information.

[0210] B17. In the apparatus described in B16, the data acquisition module is further configured to determine the block category corresponding to the target text block based on the target block information; acquire text style information and text layout information corresponding to the target text block through a preset document parsing program; and acquire standard block data corresponding to the target text block based on the block category, the text style information, and the text layout information.

[0211] The present invention also discloses C18, a document translation device, the device comprising: a memory, a processor, and a document translation program stored in the memory and executable on the processor, the document translation program being configured to implement the steps of the document translation method as described above.

[0212] The present invention also discloses D19, a storage medium storing a document translation program, wherein the document translation program, when executed by a processor, implements the steps of the document translation method described above.

[0213] The present invention also discloses E20, a computer program product comprising a computer program that, when executed by a processor, implements the steps of the document translation method described above.

Claims

1. A document translation method, characterized in that, The method includes: Obtain standard block data corresponding to the target text block in the document to be translated by using a preset document layout model; The standard block data is translated to obtain the target translated block data; The document format corresponding to the document to be translated is determined based on the standard block data. Based on the document format and the target translation block data, a translation document corresponding to the document to be translated is generated.

2. The method as described in claim 1, characterized in that, The step of obtaining the standard block data corresponding to the target text block in the document to be translated through a preset document layout model includes: The document to be translated is read by a preset document reading program to obtain several original document pages; The target block information corresponding to the target text block in the original document page is obtained by using a preset document layout model; Based on the target block information, obtain the standard block data corresponding to the target text block.

3. The method as described in claim 2, characterized in that, Before the step of obtaining the target block information corresponding to the target text block in the original document page through a preset document layout model, the method further includes: By using a preset document layout model, block recognition is performed on the original document page to determine all text blocks in the original document page; Determine whether there are any abnormal blocks in the text block based on the block information corresponding to the text block; If an abnormal block exists, it is removed to obtain the target text block.

4. The method as described in claim 2, characterized in that, The step of obtaining the target block information corresponding to the target text block in the original document page through a preset document layout model includes: Based on a preset document layout model, the marked text blocks and unmarked text blocks in the target text block are obtained; The marked text blocks and the unmarked text blocks are processed into paragraphs to obtain paragraph information for the first block and paragraph information for the second block. The target block information corresponding to the target text block is obtained based on the first block paragraph information and the second block paragraph information.

5. The method as described in claim 4, characterized in that, The step of performing paragraph processing on the marked text block and the unmarked text block respectively to obtain paragraph information of the first block and paragraph information of the second block includes: Determine the first text layout information corresponding to the marked text block, and the second text layout information corresponding to the unmarked text block; The marked text block is processed according to the first text layout information to obtain the first block paragraph information; The unmarked text block is processed into paragraphs based on the second text layout information to obtain the second block paragraph information.

6. The method as described in claim 2, characterized in that, The step of obtaining the standard block data corresponding to the target text block based on the target block information includes: The block category corresponding to the target text block is determined based on the target block information; The text style information and text layout information corresponding to the target text block are obtained through a preset document parsing program; Based on the block category, the text style information, and the text layout information, obtain the standard block data corresponding to the target text block.

7. A document translation device, characterized in that, The device includes: The data acquisition module is used to acquire standard block data corresponding to the target text block in the document to be translated through a preset document layout model; The data translation module is used to translate the standard block data to obtain the target translated block data; The document format determination module is used to determine the document format corresponding to the document to be translated based on the standard block data. The translation document generation module is used to generate a translation document corresponding to the document to be translated based on the document format and the target translation block data.

8. A document translation device, characterized in that, The device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the document translation method as described in any one of claims 1 to 6.

9. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, it implements the steps of the document translation method as described in any one of claims 1 to 6.

10. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by a processor, implements the steps of the document translation method as described in any one of claims 1 to 6.