Method, apparatus, and computer storage media for providing a digital image of a text string

By optimizing the layout of text strings through spectral sparsity operations, the problem of low coding efficiency in digital raster images is solved, achieving more efficient coding and resource utilization, which is suitable for image and video processing applications.

CN118262011BActive Publication Date: 2026-06-12AXIS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
AXIS
Filing Date
2023-12-22
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies lack effective methods to optimize the spectral sparsity of digital raster images when rendering text, resulting in low coding efficiency and failure to fully utilize communication bandwidth or storage space.

Method used

By modifying the layout of text strings through spectral sparsity operations, including rotating, tilting, scaling, translating, and modifying the font of graphic elements, the spectral sparsity of the encoded blocks is ensured, making it suitable for block-by-block transformation encoding.

Benefits of technology

It improves the encoding efficiency of digital raster images, reduces the bit rate required for encoding, optimizes the utilization of system resources, and is suitable for applications such as image or video editing, content management, and video captioning.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118262011B_ABST
    Figure CN118262011B_ABST
Patent Text Reader

Abstract

Encoder-optimized text rendering is disclosed. A method (100) for rendering a text string in a digital raster image suitable for block-based transform encoding, the method comprising: obtaining (100) a partition of an image region into coding blocks for block-based transform encoding; representing (114) the text string as a plurality of graphical elements from a font arranged according to a tentative layout in the image region, wherein the tentative layout defines at least a position, an orientation and a size of each graphical element; modifying (116) the tentative layout by applying a spectral sparseness operation on at least one non-empty coding block, thereby obtaining a modified layout; and rendering (118) a digital raster image of the graphical elements arranged according to the modified layout.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of two-dimensional image data generation. Specifically, it proposes a technique for rendering text in raster images suitable for block-by-block transform encoding. Background Technology

[0002] Thanks to significant advancements in the field of digital image encoding, input images can be represented as a bitstream with an extremely low bit rate from which an image can be reconstructed without significantly degrading its visual quality. In some use cases, the input image is fixed, for example, it has already been recorded by a camera or pre-synthesized. The digital image encoding process can then typically control the bitstream's bit rate simply by applying various data compression techniques, including lossy and lossless compression. In other use cases, the input image can be modified to a certain extent. This is particularly applicable when the input image will be synthesized (rendered) simultaneously with image encoding, and the synthesis should meet end-user rendering specifications that leave some aspects undefined. For example, rendering specifications may define the geometry and position of multiple 3D objects, but not specify their colors and textures and / or the scene's lighting. In other words, the person or device responsible for rendering has the freedom to choose the colors, textures, or lighting during rendering, and each choice will be acceptable to the end user. The end user can be someone who will view the reconstructed image (e.g., a consumer), a processor performing optical character recognition (OCR) on the reconstructed image, or the owner of the system that actually performs image encoding on their behalf.

[0003] The inventors have recognized that this freedom can be used to improve the performance of image encoding processes. Notably, they have discovered untapped potential in the field of text rendering. Summary of the Invention

[0004] One object of this disclosure is to provide a method for rendering a text string in a digital raster image in a manner suitable for block-by-block transform encoding. Another object is to provide a text rendering method for generating a digital raster image with a sparse spectrum. A further object is to provide a method that satisfies one or more layout constraints. Another object is to provide a text rendering method suitable for generating overlay text to be encoded together with a background digital image. Yet another object is to provide an apparatus and computer program having these capabilities.

[0005] At least some of these objectives are achieved by the invention as defined by the independent claims. The dependent claims relate to advantageous embodiments.

[0006] In a first aspect of this disclosure, a method is provided for rendering a text string in a digital raster image suitable for block-by-block transform coding. From the perspective of an entity performing the method, the text string is predefined, i.e., the text string can be received from an end user (see discussion above) or created by an executed software application, and the entity must not modify the text string. The method includes: obtaining a segmentation of an image region into coded blocks for block-by-block transform coding; representing the text string as a plurality of graphic elements from a font arranged according to an exploratory layout in the image region, wherein the exploratory layout at least defines the position, orientation, and size of each graphic element; modifying the exploratory layout by applying a spectral sparsity operation to at least one non-empty coded block, thereby obtaining a modified layout; and rendering a digital raster image of the graphic elements arranged according to the modified layout.

[0007] Compared to if no spectral sparsity operation had been performed, spectral sparsity operations (or, in other words, spectral sparsification operations) make the spectrum of the coded blocks of a digital raster image sparser. That is, spectral sparsity operations are likely to reduce the number of non-zero transform coefficients produced when block-by-block transform coding is applied to the coded blocks. In turn, this means that the digital raster image can be encoded at a slightly lower bit rate. The inventors have recognized that spectral sparsity is a key enabler for efficient image coding, and they have developed a class of beneficial spectral sparsity operations that are non-destructive when applied to text strings. More precisely, spectral sparsity operations modify the tentative layout (constrained by optional layouts) but preserve the text string. It can be considered that the approach according to the first aspect is a way to more tightly integrate the text rendering process and the subsequent image coding process, thereby leveraging the synergy between these processes. In this way, the encoded image with the rendered text string makes more efficient use of the available communication bandwidth or storage space in the system.

[0008] The spectral sparsity operation, which will be further specified below, can be characterized as a model-based open-loop operation that does not presuppose any interaction with the subsequent image encoding process. The spectral sparsity method is model-based in the sense that its impact on digital raster images can be accurately predicted based on the large amount of experience accumulated by the inventors. The model-based open-loop method allows for economical use of processing resources and execution time, which ensures that the method according to the first aspect is suitable for important mass-market use cases. The spectral sparsity operation is not automatically characterized as a trial-and-error method. In contrast, the model-free closed-loop method used to achieve comparable bit rate reductions is likely to be more computationally expensive. The closed-loop method may, for example, include an iterative search, wherein each iteration includes (1) rendering the text string with a new layout, (2) encoding the image, and (3) evaluating changes in the size of the encoded image until a satisfactory size has been reached. Because the iterative search is not guided by experience regarding how the spectrum of the digital image responds to layout modifications, iterative search is likely to be less efficient than the method proposed herein. For example, each image may require a considerable number of encoding operations (2). Similarly, the model-free method, which can render text strings from multiple randomly generated layouts and select the best-performing layout based on the size of the encoded image, is expected to have equally poor performance.

[0009] In some embodiments, spectral sparsity operations include one or more of the following operations: rotation of a graphic element, skewing of a graphic element, isotropic or anisotropic rescaling of a complete graphic element, isotropic or anisotropic rescaling of a portion of a graphic element, translation of a graphic element, font modification, font replacement, and contrast modification. The inventors have identified specific guidelines for use with these subtypes of spectral sparsity operations, which will be discussed in detail below.

[0010] Specifically, spectral sparse operations can consist of rigid transformations of graphical elements, such as rotation, tilting, or translation.

[0011] In some embodiments, the method further includes obtaining one or more layout constraints, such as a specified font, maximum range, minimum range, and / or orientation.

[0012] In some embodiments, block-by-block transform coding may include projection onto an orthogonal basis of a two-periodic function, followed by a rounding operation toward zero.

[0013] In some embodiments, graphic elements include glyphs such as characters.

[0014] The second aspect of this disclosure relates to devices arranged to perform the methods of the first aspect. These devices may have different primary uses such as image or video editing, image or video content management, video captioning, image or video playback, and text processing, or may include other authoring tools. Furthermore, the devices may be designed for specific use cases with automatic annotation capabilities covering text strings, such as indoor or outdoor video surveillance. The devices within the second aspect of this disclosure generally share the effects and advantages of the first aspect, and they can be implemented with equivalent technical variations.

[0015] This disclosure further relates to a computer program containing instructions for causing a computer to perform the methods described above. The computer program may be stored or distributed on a data carrier. As used herein, "data carrier" can be a temporary data carrier such as modulated electromagnetic waves or light waves, or a non-temporary data carrier. Non-temporary data carriers include volatile and non-volatile memories such as permanent and non-permanent storage media of the magnetic, optical, or solid-state types. Still within the scope of "data carrier," such a memory may be permanently mounted or portable.

[0016] Generally, unless expressly defined herein, all terms used in the claims shall be interpreted according to their ordinary meaning in the art. Unless otherwise expressly stated, all references to “a / an / the element, device, component, means, step, etc.” shall be openly interpreted as referring to at least one instance of the element, device, component, means, step, etc. Unless expressly indicated, the steps of any method disclosed herein need not be performed in the exact order disclosed. Attached Figure Description

[0017] Now, by way of example, aspects and embodiments are described with reference to the accompanying drawings, in which:

[0018] Figure 1 This is a flowchart of a method for rendering text strings in a digital image;

[0019] Figure 2 This is a flowchart of a method for providing an encoded digital image with a rendered text string;

[0020] Figure 3 The diagram illustrates a frame used to specify certain layout constraints;

[0021] Figure 4 It is suitable for execution Figure 1 and Figure 2 A block diagram of the device used in the method;

[0022] Figure 5 and Figure 6 The illustration includes spectral sparse operations on the rotation of graphical elements;

[0023] Figure 7 The illustration includes spectral sparsity operations on tilted graphic elements;

[0024] Figure 8 , Figure 9 and Figure 10 The illustration includes spectral sparsity operations involving local rescaling and / or anisotropic rescaling of graphical elements.

[0025] Figure 11 and Figure 12 This shows the coded blocks of the raster image;

[0026] Figure 13 , Figure 14 , Figure 15 and Figure 16 The illustration includes spectral sparsity operations for translating graphical elements;

[0027] Figure 17 , Figure 18 , Figure 19 and Figure 20 The diagram includes spectral sparsity operations for various font modifications; and

[0028] Figure 21 It is a graph of the orthogonal basis of the bi-periodic function for each 8×8 pixel coded block. Detailed Implementation

[0029] Aspects of this disclosure will now be described more fully below with reference to the accompanying drawings, which illustrate certain embodiments of the invention. However, these aspects may be implemented in many different forms and should not be construed as limiting; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and these embodiments will fully convey the scope of all aspects of the invention to those skilled in the art. Throughout the specification, the same reference numerals refer to the same elements.

[0030] refer to Figure 1 A method 100 for rendering a text string in a digital raster image suitable for block-by-block transform encoding will now be described.

[0031] It is understood that a raster image comprises a matrix or grid of pixels, wherein the pixels are preferably square or rectangular. Pixels form image region 310. Figure 3 Raster images can be pixel-buffered images in block format. Images with graphic elements in vector format are not raster images.

[0032] The understanding is that a text string is an ordered sequence of characters selected from a predefined character table such as the Unicode table (ISO / IEC 10646). Characters can represent letters, syllables, ideograms, modifier letters, symbols, numbers, punctuation marks, mathematical symbols, currency symbols, separators, hyphens, etc.

[0033] Furthermore, it is understood that generalized transform coding includes two-periodic functions. Orthogonal basis projection image data:

[0034]

[0035] in, These are the transformation coefficients, and This is the image data for pixels (n1, n2). Note that for the lowest (k1, k2) pair, The limit to [0, N1] × [0, N2] can correspond to a single period or a constant value. Specifically, the basis can consist of real-valued two-period harmonic functions (e.g., DCT, DST, DFT, wavelet transform). The transform coefficients computed in the projection operation constitute a discrete representation of the spectrum of the image data. The encoded image may include transform coefficients after non-destructive data compression (e.g., entropy, Huffman, Lenper-Ziff, run-length, binary or non-binary arithmetic coding such as context-adaptive variable-length coding, CAVLC, context-adaptive binary arithmetic coding, CABAC) and / or other processing steps. Specifically, the transform coefficients may undergo rounding to zero, e.g., quantization. Rounding is important because when the transform coefficients are fed into one of the aforementioned non-destructive data compression techniques, zero transform coefficients will generally not be encoded as zero-valued numbers (“0.0”), but can be omitted in such a way that they occupy much less space in the image bitstream than non-zero transform coefficients. The expected transform coding is block-by-block transform in the sense that the processing of a coded block is independent of the processing of other coded blocks.

[0036] Optionally, the transform coefficients of a coded block can be predicted and specifically encoded via intra-frame prediction (“intra-predictive”) coding. According to this well-known predictive coding technique, the transform coefficients of a coded block are incrementally represented with reference to one or more earlier or later coded blocks. This results in efficient data compression, especially if the image depicts a natural scene with high spatial autocorrelation. The inventors have recognized that data compression is generally more significant if the spectrum of the coded blocks is sparse, i.e., if they contain a majority of zero transform coefficients.

[0037] In the first step 110 of method 100, a segmentation 320 of image region 310 is obtained. Based on instructions from the end user, segmentation 320 can be obtained as a predefined segmentation (e.g., according to a prior protocol or standard specification), or it can be generated by the entity performing method 100. Segmentation 320 defines a set of coded blocks such that each point of image region 310 belongs to a coded block. Equivalently, the union of the coded blocks equals image region 310. Figure 3 In the diagram, segment 320 is represented by dashed lines delineating the coded block. Figure 5 , Figure 6 , Figure 10 as well as Figures 13 to 19 In this context, the same dashed line notation is used to illustrate the boundaries of coded blocks. When the image is a frame of a video sequence, in the sense of the ITU-T H.26x video coding standard, coded blocks can be macroblocks; they can be transform blocks or prediction blocks, or blocks that serve both purposes. Note that... Figure 3 This is a simplified diagram for illustrative purposes only. In the common practice of this invention, finer divisions are typically used. For example, macroblocks in a video frame can be 4×4 pixels, 8×8 pixels, 16×16 pixels, 32×32 pixels, or 64×64 pixels.

[0038] Figure 21 The 64 images in the plane (n1,n2) of the basis of a real-valued double-periodic function constrained to discrete cosine functions in the 8×8 pixel case:

[0039]

[0040] Figure 21 The top row contains the following images:

[0041] ρ 0,0 (n1,n2),ρ 1,0 (n1,n2),ρ 2,0 (n1,n2),…,ρ 7,0 (n1,n2),

[0042] And find ρ in the bottom right corner. 7,7 The graph of (n1, n2). In Figure 21 In the middle, white represents And black represents Compared to coded blocks with complex appearances and / or high information content, coded blocks corresponding to a single basis function or a linear combination of a few basis functions will be encoded at a relatively lower cost (i.e., will have a sparser spectrum), while coded blocks with complex appearances and / or high information content will have to be formed from a large number of basis functions. Based on the... Figure 21Observations reveal that coded blocks with equidistant horizontal or vertical lines, for example, use basis functions with k1 = 0 or k2 = 0. It is likely to be encoded at low cost. Encoding blocks primarily consisting of diagonal elements typically use, for example, blocks with k1 = k2. Typically, it is encoded at low cost. Following the Nyquist-Shannon sampling theorem,

[0043]

[0044] It is a complete base for 8×8 pixel coded blocks.

[0045] The text string to be rendered constitutes the input data to method 100. That is, from the perspective of the entity executing the method, the text string is predefined; that is, the text string can be received from the end user or automatically created by the software application, and the entity should not modify the text string under normal circumstances. Method 100 may optionally accept one or more layout constraints as input. The layout constraints obtained in the optional second step 112 specify the maximum range 332 ( ) of the rendered text string. Figure 3 The layout constraints include the minimum range of the rendered text string (334) and / or the direction α of the rendered text string. Layout constraints can further specify one or more acceptable fonts. The text string will be rendered according to these layout constraints, which can sometimes limit the effectiveness of spectral sparsity operations to be performed.

[0046] In the third step 114, the text string is represented as multiple graphic elements from the font, arranged according to an exploratory layout in image region 310. The exploratory layout at least defines the position, orientation, and size of each graphic element. Graphic elements may include glyphs, and specifically, glyphs representing characters. It is emphasized that graphic elements in this sense are not abstract characters, but rather concrete representations of characters in the font (i.e., having concrete shapes). Furthermore, the characters and graphic elements in the text string do not necessarily have a one-to-one relationship; rather, they can be one-to-many or many-to-one. Graphic elements can be represented as vector graphics, such as scalable lines, curves, or polygons.

[0047] In the next step 116, the tentative layout is modified by applying a spectral sparsity operation. The spectral sparsity operation can be applied to a single coding block at a time, to a group of coding blocks at a time (e.g., consecutive coding blocks), or to the entire image region 310. The spectral sparsity operation is preferably limited to non-empty coding blocks, i.e., coding blocks containing at least one graphic element or a portion of a graphic element. As will be described in detail below, the spectral sparsity operation can include rotation of graphic elements (step 116.1), tilting of graphic elements (step 116.1), isotropic or anisotropic rescaling of complete graphic elements (step 116.2), isotropic or anisotropic rescaling of a portion of a graphic element (step 116.2), translation of graphic elements (step 116.3), font modification (step 116.4), font replacement or contrast modification (step 116.4), or various combinations thereof.

[0048] The output of step 116 is a modified layout that forms the basis for step 118, in which a digital raster image of the graphic elements is rendered according to the modified layout. Rendering may include rasterization, i.e., converting vector graphics into a matrix of pixels. Rasterization may include performing line drawing or curve drawing algorithms. Alternatively, if the font represents the graphic elements as bitmaps, the rendering in step 118 may include combining such bitmaps into an output digital raster image using the size and position of the bitmaps according to the modified layout.

[0049] Rendering can be performed as a single operation across the entire image area or individually on each encoded block.

[0050] Step 118 may further include a preprocessing step prior to actual rendering. In the preprocessing step, groups of graphic elements arranged according to a modified layout are combined, such as by deforming (extending) parts of the graphic elements toward each other and / or by adding ligatures or connectors. In some non-Latin scripts, including Arabic, the appearance of the resulting connections may be mandatory, and it can be used as an option in Latin scripts for writing similar to cursive script.

[0051] like Figure 2 As shown, the text rendering method 100 just described can be embedded in a method 200 for providing an encoded digital image of a text string. Such a method 200 may include rendering a digital raster image of the text string by performing method 100. Subsequently, a block-by-block transform coding operation 210 is applied to each block of the segmentation 320 of the image region. The transform coding may include a projection 210.1 onto an orthogonal basis of a two-periodic function (see Equation 1 above), followed by a rounding operation 210.2 to zero. The output of method 200 is an encoded digital image that can be represented as a set of transform coefficient values ​​with predefined codes.

[0052] Embodiments of method 200 can be particularly suitable for text overlay applications. In such embodiments, a background image is obtained (e.g., received from an end user, recorded by a camera, etc.) and combined with the digital raster image rendered in step 118. For example, pixels representing rendered graphic elements can be replaced with corresponding pixels in the background image; the complement of these pixels is processed to be transparent, i.e., the background image is unaffected here. A block-by-block transform encoding operation 210 is applied to the combined image. Optionally, pixels representing rendered graphic elements can be overlaid with configurable transparency, making the background image partially visible through the text. It is understood that the background image and the digital raster image may have to be adapted to each other, for example, through rescaling, cropping, or expanding operations, before combination.

[0053] Alternatively, a text string is represented on top of the background image (step 114), and a spectral sparsity operation (step 116) is applied to the combination of the background image and the text string based on an exploratory layout. This allows the spectral sparsity operation to leverage the synergy with the background image, for example, by making the arranged graphic elements similar to the background image in some coding blocks, resulting in a sparser spectrum and therefore lower cost for encoding. For example, if the spectral sparsity operation is successful, layout modifications in coding blocks can be identified such that the spectrum of overlapping graphic elements leaves zero-valued transform coefficients that are approximately the same as those zero-valued transform coefficients in the spectrum of the background image in the same coding block.

[0054] Methods 100 and 200 can be executed by a general-purpose computer. In particular, they can be executed by a computer with… Figure 4 The device 400 executes the basic functional structure shown in the figure. As illustrated, device 400 includes processing circuitry 414, memory 410, and external interface 418. An internal data bus 416 facilitates communication between these components. Memory 410 may be adapted to store a computer program 412 having instructions for implementing any of methods 100, 200. External interface 418 may be a communication interface that allows device 400 to communicate via a wide area network or local area network 420 with an analog device (not shown) operated by a consumer or video content author (e.g., a recording device). Furthermore, device 400 may communicate with a host computer 430, such as a server for storing raw or encoded image data, fonts, etc., or with networked (“cloud”) processing resources that device 400 can use as needed. It is understood that host computer 430 can be configured to offload computationally demanding operations within method 100, 200 from device 400.

[0055] Figure 5 and Figure 6 The diagram illustrates spectral sparse operations that include rotations of graphical elements.

[0056] The rotation can be defined in such a way as reducing the number of unique line directions in the coded block. Figure 5 In the left half, the symbol " / " (forward slash) is approximately parallel to the left stroke of the letter "A," making it expected that these graphic features can be encoded using the same or partially the same basis functions. However, in Figure 5 In the right half, the left strokes of the symbol " / " and the letter "A" have different directions, which will require a large number of transformation coefficients with non-zero values. Therefore, some embodiments of method 100 include execution based on... Figure 5 The layout of the right half has been modified to be more similar to Figure 5 Spectral sparsity operations on the left half of the spectrum.

[0057] The rotation can be defined in such a way that the line direction is aligned with the vertical or horizontal axis, or even with the diagonals of these axes. The vertical or horizontal axis corresponds to the axis of the pixel matrix of the digital raster image to be rendered, and this is how the basis functions are parameterized (variables n1, n2). Figure 6 In the right half of the block, the letter "L" is not aligned with the axis of the coded block. In this paper, the axis corresponds to the boundary of the coded block containing the letter "L," drawn with dashed lines. The misaligned letter "L" will be costly to encode because it cannot be represented as a combination of a small number of basis functions. From the perspective of block-by-block transform coding, Figure 6 The direction in the left half is likely more advantageous. For this reason, some embodiments of method 100 include execution that will be based on... Figure 6 The layout of the right half has been modified to be more similar to Figure 6 Spectral sparsity operations on the left half of the spectrum.

[0058] Figure 7 The illustration depicts a spectral sparsity operation involving slanted graphic elements. A rightward slant can correspond in some fonts to letters that are gradually slanted into italics. Conversely, slant can be used to remove one or more letters from the italicized text so that their strokes (e.g., bars, shoulders, bowl shapes, stems) are ultimately better aligned with the vertical or horizontal axis, or with the diagonal direction. Slant can help reduce the number of unique line directions in a coded block, help align line directions with the vertical or horizontal (or diagonal) axis, and / or help reduce the number of unique vertical or horizontal distances in a coded block. A “distance” in a coded block can be the thickness of a stroke or the length of a gap (space, interval). Furthermore, it is important to note that because the basis functions are periodic, transform coding will effectively sample the periodic expansion of the coded block, and therefore the spectral sparsity operation should be designed to minimize the number of unique distances in the periodic expansion of the coded block; this will be referenced below. Figure 12 Please provide an explanation.

[0059] Figure 8 and Figure 9 The illustration depicts spectral sparsity operations that include local rescaling and / or anisotropic rescaling of graphical elements.

[0060] exist Figure 8 The diagram illustrates how the x-height of the lowercase letter "h" can be varied through local rescaling in the vertical direction, allowing for a limitation on the number of unique vertical distances. (No further details are provided.) Figure 8 Horizontal rescaling is applied. Note that since the x-height is approximately half the height of a capital letter, it is assumed that the letter "h" occupies the entire coding block vertically. Figure 8 The shape in the middle has a single vertical distance. It is very likely that... Figure 8 The left and right shapes in the middle section include at least two unique vertical distances (i.e., x-height and rise height), and therefore they will require a significantly larger number of non-zero transformation coefficients. Accordingly, some embodiments of method 100 include implementing modifications to the layout to more closely resemble... Figure 8 Spectral sparse operations on intermediate shapes in the middle.

[0061] Figure 9 The illustration shows the effect of global rescaling in the horizontal direction only, as an example of anisotropic rescaling. Such rescaling can affect the number of unique distances within a coded block. Anisotropic rescaling also includes combinations of horizontal and vertical rescaling operations with different factors.

[0062] Figure 10 The illustration shows an example of how rescaling operations can be used for spectral sparsity purposes. Figure 10 The left half of the character corresponds to the tentative layout of the lowercase letter "o" and the number "0" (zero) occupying the common coding block. The letter "o" is slightly lower and narrower than the number "0". Figure 10 The right half of the graphic element, representing the digit "0", is isotropically scaled down to correspond to the output of a spectral sparsity operation with a height equal to that of the letter "o". As part of the spectral sparsity operation, it is further ensured that the horizontal spacing between the letter "o" and the digit "0" is approximately equal to the width of the letter "o"; this can be achieved by translating either the letter "o" or the digit "0" horizontally. As a result, the three horizontal distances indicated by the arrows are approximately equal, and the two vertical distances are also approximately equal. Based on these considerations, some embodiments of method 100 include execution according to... Figure 10 The layout of the left half has been modified to be more similar to Figure 10 The spectral sparsity operation is performed on the right half of the code. Note that because the transform coding is block-by-block, it is not necessary to consider adjacent letters "N" and numbers "5" in adjacent coding blocks.

[0063] To illustrate the impact of spectral sparsity operations on the pixel level. Figure 11 and Figure 12 An 8×8 pixel coded block of a raster image is shown. Figure 11 In the left half, the lines have a uniform horizontal thickness of 1 dark pixel, and they are horizontally separated by 3 bright pixels. For the reasons explained above, a horizontal periodic expansion of the coded block is considered, where the outer light pixels combine to form a total interval of 2 + 1 = 3 bright pixels. Conceptually, the vertical boundaries of the coded block can be considered "glued together," and so can the horizontal boundaries. In contrast, in Figure 11 In the right half, two bright pixels, one dark pixel, three bright pixels, and two dark pixels can be identified horizontally. This is because the only horizontal distance is... Figure 11 The right half is larger, so it will be more costly to encode. Accordingly, some embodiments of method 100 include execution that will be based on Figure 11 The layout of the right half has been modified to be more similar to Figure 11 Spectral sparsity operations on the left half of the spectrum.

[0064] exist Figure 12 In the first coded block, the upper part contains constant image data, and the lower part contains graphic elements similar to forward slashes, with horizontal distances of 1 and 7. (It is worth recalling that the distances in the periodic expansion are related.) Figure 12 In the second coded block (the middle one), the upper part contains constant image data, and the lower part contains two copies of the graphic element with horizontal distances of 1 and 3. This indicates that... Figure 12 The costs of encoding the first and second coding blocks are approximately equal. Accordingly, grouping two copies of the graphic element into the same coding block, rather than using a layout where they are located in two different coding blocks, will improve coding efficiency. Furthermore, considering... Figure 12 It is advantageous to include a third coded block with horizontal distances of 1 and 7 across its entire vertical range. The third coded block can be expected to be more efficient than... Figure 12 The first and second coding blocks in the code have even more zero-valued transform coefficients. Figure 12 The third coding block can be obtained from the second coding block by shifting the graphic element on the right upwards until it aligns with the graphic element on the left.

[0065] In view of these considerations, some embodiments of method 100 include performing a spectral sparsity operation that involves translating one or more graphic elements such that geometrically similar graphic elements are grouped into one coding block and / or geometrically dissimilar graphic elements are separated into different coding blocks. As a result, the number of unique line directions and / or unique vertical or horizontal distances in the coding block is likely to be reduced.

[0066] Figure 13 , Figure 14 , Figure 15 and Figure 16 The illustration includes spectral sparsity operations involving the translation of graphic elements. As mentioned above, the purpose of such translation can be to cluster geometrically similar graphic elements and / or separate geometrically dissimilar graphic elements per coding block. These expressions can be understood in their usual sense. Alternatively, in some embodiments, the identification of “geometrically similar” and “geometrically dissimilar” graphic elements can be systematized to identify spectrally similar and spectrally dissimilar graphic elements. This is achieved by caching the spectra (transform coefficients) of earlier block-by-block transform codes from various graphic elements. The spectra can be cached in full or in a simplified format; a useful simplified format can specify which transform coefficients in each spectrum are non-zero. For example, if two graphic elements have the same or nearly the same set of non-zero transform coefficients, they can be considered spectrally similar, and otherwise considered spectrally dissimilar. Spectrally similar graphic elements should be clustered, and spectrally dissimilar graphic elements should be separated into different coding blocks.

[0067] To avoid overly close-range (non-global) optimization of graphic element placement, such spectral sparsity operations involving the translation of graphic elements are preferably applied at once to a search window of multiple adjacent coding blocks. Within the search window, multiple possible redistributions of graphic elements to different coding blocks are evaluated, and a favorable one is selected. Among the various possible redistributions of graphic elements, a more suitable one may satisfy one or more of the following criteria: a smaller total number of non-zero transform coefficients; a smaller percentage of high-frequency coefficients (large values ​​of k1 and k2); and smaller variation in the composition of the set of non-zero transform coefficients between consecutive coding blocks (relative to the intra-frame prediction scan order). Obviously, under this redistribution, characters in the text string are not permuted, but rather they retain their order. However, to limit computational complexity, the window should not be too wide, which may be the case if the entire long text string is being processed in a single spectral sparsity operation. For example, the search window can include 2 to 10 coding blocks in the writing direction, such as 3, 4, or 5 coding blocks. Furthermore, a sliding search window can be used.

[0068] Figure 13The illustration shows how translation is used to group geometrically similar graphical elements per coded block. Figure 13 In the left half, the left coded block is shared by fragments of the letter "F" and the character "\" (backslash), while the right coded block contains a partial copy and two complete copies of the character "\". This is potentially suboptimal for encoding purposes, firstly because fragments of the letter "F" and the character "\" are geometrically dissimilar. Furthermore, it is suboptimal because the character "\" occupies two coded blocks. In the periodic expansion of coded blocks with two vertical boundaries imagined as "glue together," this corresponds to the major discontinuity that is costly to represent in periodic basis functions. Figure 13 The right side of the character "\" is then shifted, and the three copies of the character "\" are shifted to the right so that all three copies are contained in the right coding block, and the letter "F" is in the left coding block alone. Because there are only two unique line directions in the left coding block and only a single unique line direction in the right coding block, this shift is expected to reduce the total number of non-zero transform coefficients generated. Furthermore, the shift has also reduced the number of coding blocks occupied by the leftmost character "\". In view of this, some embodiments of method 100 include execution according to... Figure 13 The layout of the left half has been modified to be more similar to Figure 13 The spectral sparsity operation on the right half of the spectrum.

[0069] Figure 14 The illustration shows how translation is used in each coded block to separate geometrically dissimilar graphic elements. Figure 14 In the left half, the graphic element representing the letter "P" is contained within a single coded block. Because the letter's bowl shape and stem are geometrically dissimilar, it will have to be represented using a relatively large number of basis functions (i.e., at a relatively high cost). The spacing between the parts has been exaggerated for visual appeal. Figure 14 As shown in the right half, this can be improved, for example, by splitting the letter into a "D"-like part and an "I"-like part, and translating these parts into two different encoded blocks. This illustrates that the characters and graphic elements of the text string can have a one-to-many relationship. Alternatively, the letter "P" can be split into an "I"-like part and a horizontally flipped "C"-like part to be placed in two different encoded blocks. In view of this, some embodiments of method 100 include performing actions according to... Figure 14 The layout of the left half has been modified to be more similar to Figure 14 The spectral sparsity operation on the right half of the spectrum.

[0070] Figure 15 The illustration shows how translation can be used to rearrange graphic elements within a coding block, thereby reducing the number of unique vertical distances within the coding block. Figure 15In the left half, the two graphic elements in the form of diacritics are located close to the third graphic element representing the lowercase letter "o". Therefore, the coded block has multiple vertical distances. In contrast, in Figure 15 In the right half, the diacritic point has been shifted upwards to a position approximately the same height as the "o" itself, above it. As a result, the vertical distances corresponding to the thickness of the point and the "o" are approximately equal. Furthermore, the intervals corresponding to the height of the "o" in the encoded block (x-height, downward vertical double arrow), the vertical interval from the diacritic point to the top of the "o" (upward vertical double arrow), and the complete letter... The sum of the free intervals above and below forms a triplet with approximately equal intervals. (It is worth recalling that distance is of paramount importance in the periodic expansion of the coded block; for vertical distance, vertical periodic expansion should be considered.) Accordingly, some embodiments of method 100 include execution based on... Figure 15 The layout of the left half has been modified to be more similar to Figure 15 The spectral sparsity operation on the right half of the spectrum.

[0071] Figure 16 The illustration shows how to use translation to rearrange graphic elements within a coding block, thereby reducing the number of unique vertical and unique horizontal distances within the coding block. Figure 16 The left half and Figure 15 The left half is the same. Figure 16 In the right half, the pitch change point has not only been like Figure 15 They were shifted upwards as in the middle, and they have also been further spaced out horizontally to have a horizontal spacing approximately the same as the width of the "o". Figure 16 In the right half, not only the number of unique vertical distances but also the number of unique horizontal distances has been minimized. This can be beneficial to some extent given the subsequent predictive coding. If intra-frame predictive coding is applied, the translation can represent even more significant advantages. Accordingly, some embodiments of method 100 include performing actions based on... Figure 16 The layout of the left half has been modified to be more similar to Figure 16 The spectral sparsity operation on the right half of the spectrum.

[0072] from Figure 16 Starting from the right half, it is possible to obtain even further spectral sparsity by converting the diacritics into diacritics bars, which some readers may accept as variant characters. To illustrate this mathematically, consider... Figure 16 The following pixels represent the right half of the image:

[0073]

[0074] The DCT spectrum of such a coded block has the following appearance, where * denotes non-zero transform coefficients:

[0075]

[0076] If it looks like this, replace the pitch mark with the pitch bar:

[0077]

[0078] Then the DCT spectrum becomes

[0079]

[0080] What we see is that the number of non-zero transform coefficients has been reduced by three, allowing the coded block to be represented digitally in a more compact way.

[0081] As mentioned above, step 118 may include a preprocessing step of connecting one or more graphic elements. Connecting is potentially useful in environments where it can improve the visual appearance of a modified layout through spectral sparsity operations performed by translation. For example, connecting can make uneven spacing between letters appear more uniform and therefore less visible. Another use is to satisfy a minimum range 334 of a text string within a specified image region 310. Figure 3 The layout constraints of the text string representing a word. For example, the letters of a text string representing a word can be distributed across the minimum range of 334, but it is still considered a word due to ligatures. The concatenation of letters usually adds negligible coding effort and may even be beneficial. In the special case of Arabic script, such concatenation can be achieved through so-called kashida, that is, by expanding parts of graphic elements or inserting glyphs (tatwil) that act as ligatures. For further details, see the research paper “Arabic Text Adjustment: An Overview of Historical Approaches to Arabic Text Adjustment and Recommendation Algorithms” by MJEBenatia et al., TUGboat, Vol. 27 (2006), No. 2, pp. 137-146.

[0082] Figure 17 , Figure 18 , Figure 19 and Figure 20 The illustration depicts spectral sparsity operations, including various font modifications or replacements. Generally, the font modifications and replacements being evaluated simplify the geometry of graphic elements. This can help reduce the number of unique line directions in a coded block, help align line directions with vertical or horizontal (or diagonal) axes, and / or help reduce the number of unique vertical or horizontal distances in a coded block. Note that if method 100 is performed under layout constraints of a specified font or a list of acceptable fonts, the font replacements considered must be correspondingly limited.

[0083] Figure 17The illustration shows a font modification that represents the graphic element of the uppercase letter "E" in a serif font. Figure 17 The left half of the letter "E" was transformed into a sans-serif variant of the letter "E" (the right half). In other words, the transformation simplified the terminal shape of the graphic element (i.e., by removing the serifs). The font modification further unified the width of the different strokes on the letter "E". Because Figure 17 The graphic elements in the right half have fewer unique line directions and / or fewer unique distances, so it is likely to be encoded at a lower cost. In view of this, some embodiments of method 100 include execution based on... Figure 17 The layout of the left half has been modified to be more similar to Figure 17 The spectral sparsity operation on the right half of the graph. Equivalently, the spectral sparsity operation can replace the “E” from the serif font with the “E” from the sans-serif font (especially the “E” from the sans-serif font with a uniform stroke width).

[0084] Figure 18 The illustration shows font modifications through typeface weight within its uniform coding block. Example font weights include ultralight, light, medium-light, bold, and heavy. Figure 18 In the left half, the three graphic elements representing the lowercase letter "a" have three different weights, corresponding to the numerous unique vertical and horizontal distances in the coded block. In the right half, the weights have been approximated through font replacement or appropriate rescaling, which facilitates economical transform coding. Accordingly, some embodiments of method 100 include performing operations according to... Figure 18 The layout of the left half has been modified to be more similar to Figure 18 The spectral sparsity operation on the right half of the spectrum.

[0085] refer to Figure 17 This demonstrates that the widths of different strokes on a graphic element can be standardized. Figure 19 The illustration shows a font modification that unifies the width of specific strokes on graphic elements. Figure 19 In the right half of the image, graphic elements from fonts with inconsistent stroke weights were used to form the text string "abc". In the left half of the same image, a font with uniform stroke weights was used. Furthermore, in... Figure 19 The spikelets and serifs seen in the right half are not present in the left half. These two changes tend to make the arrangement of graphic symbols in the left half less costly to encode. Accordingly, some embodiments of method 100 include execution based on... Figure 19 The layout of the right half has been modified to be more similar to Figure 19 Spectral sparsity operations on the left half of the spectrum.

[0086] Figure 20 The illustration allows modification of the font by adjusting the aperture of graphic elements. More precisely, Figure 20 Three graphic elements representing the lowercase letter "e" and varying in length relative to the lower open arc are shown. The spacing from the vertex to the horizontal bar can be varied without the graphic element losing its meaning as the letter "e". This fact can be used to control the number of unique horizontal and / or unique vertical distances within the coded block. In view of these considerations, some embodiments of method 100 include performing an aperture transformation of the graphic elements similar to... Figure 20 The spectral sparsity operation of the changes between the three graphical elements in the image.

[0087] As described above Figure 21 .

[0088] As numerous examples have already shown, the inventors propose a way to facilitate efficient transformation coding regarding the arrangement of graphic elements constituting a given text string in a pattern of coded blocks. More specifically, the inventors have developed a toolkit for operations that change the tentative layout of graphic elements into a modified layout with a sparser spectrum and thus can be represented more compactly. These operations preserve the integrity of the text string and thus retain its communicative meaning, and are generally visually inconspicuous to non-specialist viewers. The discreteness of the spectral sparse operations can be further ensured by implementing one of those embodiments that accepts and adheres to layout constraints.

[0089] The aspects of this disclosure have been described above primarily with reference to several embodiments. However, as will be readily understood by those skilled in the art, other embodiments besides those disclosed above are equally possible within the scope of the invention as defined by the appended claims. For example, although the examples primarily relate to letters from the Latin alphabet, those skilled in the art will understand that the techniques disclosed herein are readily applicable to other languages ​​such as Greek, Cyrillic, Arabic, and ideographic scripts.

Claims

1. A method for providing a digital image of a text string, the method comprising: The text string is rendered in a digital raster image using the following steps: The obtained image region becomes a segment of the coding block used for the block-by-block transform coding; The text string is represented as a plurality of graphic elements from the font, arranged according to an exploratory layout in the image region, wherein the exploratory layout at least defines the position, orientation, and size of each graphic element; The tentative layout is modified by applying spectral sparsity operations to at least one non-empty coded block, thereby obtaining a modified layout; and Render a digital raster image of the graphic elements arranged according to the modified layout, and The digital image of the text string is obtained by encoding the digital raster image or the combined image obtained by combining the background image and the digital raster image using block-by-block transform coding.

2. The method according to claim 1, wherein, The spectral sparsity operations include one or more of the following operations: rotation of graphic elements, tilting of graphic elements, isotropic or anisotropic rescaling of complete graphic elements, isotropic or anisotropic rescaling of a portion of graphic elements, translation of graphic elements, font modification, font replacement, and contrast modification.

3. The method according to claim 1, wherein, The spectral sparsity operations include rotating or tilting the graphic elements.

4. The method according to claim 3, wherein, The rotation or tilt reduces the number of unique line directions in the coded block.

5. The method according to claim 3, wherein, The rotation or tilt aligns the line direction with the vertical or horizontal axis.

6. The method according to claim 1, wherein, The spectral sparsity operation includes isotropic or anisotropic rescaling of the entire graphic element, or isotropic or anisotropic rescaling of a portion of the graphic element.

7. The method according to claim 6, wherein, The rescaling reduces the number of unique vertical or unique horizontal distances in the encoded block.

8. The method according to claim 1, wherein, The spectral sparse operation includes the translation of graphical elements.

9. The method according to claim 8, wherein, The translation of each coding block clusters geometrically similar graphic elements and / or separates geometrically dissimilar graphic elements, thereby reducing the number of unique line directions and / or unique vertical or horizontal distances in the coding block.

10. The method according to claim 9, wherein, The translation is based on the spectrum of at least one cached from the earlier block-by-block transform encoding of these graphic elements to group graphic elements that are similar in spectrum and / or separate graphic elements that are dissimilar in spectrum per encoding block.

11. The method according to claim 8, wherein, The translation rearranges graphic elements within a coding block to thereby reduce the number of unique line directions and / or unique vertical or horizontal distances within the coding block.

12. The method according to claim 8, wherein, The translation reduces the number of coding blocks occupied by graphic elements.

13. The method according to claim 1, wherein, The spectral sparsity operations include font modification or font replacement.

14. The method according to claim 13, wherein, The font modification or font replacement simplifies the geometry of the graphic elements.

15. The method according to claim 13, wherein, The font modification or font replacement includes one or more of the following operations: simplifying the terminal shape of the graphic element; removing tassels or serifs from the graphic element; unifying the width of the strokes on the graphic element; unifying the width of multiple strokes on the graphic element. To unify the font weight within a coded block; to unify the aperture of one or more graphic elements.

16. An apparatus for providing a digital image of a text string, the apparatus comprising a processor configured to: The text string is rendered in a digital raster image using the following steps: Obtain image region segmentation; The segmentation is transformed into coded blocks for the block-by-block transformation encoding; The text string is represented as a plurality of graphic elements from the font, arranged according to an exploratory layout in the image region, the exploratory layout at least defining the position, orientation and size of each graphic element; The exploratory layout is modified by applying spectral sparsity operations to at least one non-empty coded block, thereby obtaining a modified layout. as well as Render a digital raster image of the graphic elements arranged according to the modified layout, and The digital image of the text string is obtained by encoding the digital raster image or the combined image obtained by combining the background image and the digital raster image using block-by-block transform coding.

17. A non-transitory computer storage medium having instructions stored thereon for implementing a method, which, when executed on a device with processing capabilities, is used to provide a digital image of a text string, the method comprising: The text string is rendered in a digital raster image using the following steps: The obtained image region becomes a segment of the coding block used for the block-by-block transform coding; The text string is represented as a plurality of graphic elements from the font, arranged according to an exploratory layout in the image region, wherein the exploratory layout at least defines the position, orientation, and size of each graphic element; The tentative layout is modified by applying spectral sparsity operations to at least one non-empty coded block, thereby obtaining a modified layout; and Render a digital raster image of the graphic elements arranged according to the modified layout, and The digital image of the text string is obtained by encoding the digital raster image or the combined image obtained by combining the background image and the digital raster image using block-by-block transform coding.