Method, device and storage medium for processing inclined text picture
By using the Inception_V2 network and Bi-LSTM network for feature extraction and correction in tilted text detection, the problem of low detection efficiency is solved, achieving more efficient tilted text correction, improving detection accuracy and user experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA UNITED NETWORK COMM GRP CO LTD
- Filing Date
- 2022-12-08
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods for detecting tilted text have low efficiency, especially when using CTPN to detect tilted text in images.
The Inception V2 network is used for feature extraction, combined with a Bi-LSTM network for sequence feature extraction, and a fully connected network is used to obtain the position coordinate information of the tilted text. The image is rotated and corrected according to the tilt angle, and the tilted text detection model is used to correct the tilted text.
It improves detection efficiency, reduces parameter quantity and resource consumption, enhances detection accuracy and user experience, eliminates 5*5 convolution, and adopts BN algorithm to further improve detection speed and accuracy.
Smart Images

Figure CN116128746B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image text correction technology, and in particular to a method, apparatus and storage medium for processing tilted text images. Background Technology
[0002] Image text correction technology involves detecting and correcting skewed text in images containing skewed text, thereby obtaining images with straight text. In recent years, with the rise and development of deep learning, computer vision has undergone tremendous changes and transformations. As an important research area in computer vision, skewed text detection has also entered the era of deep learning, and significant progress has been made in its thinking, methods, and results.
[0003] Currently, common methods for detecting slanted text typically involve first obtaining preliminary detection results of slanted text in an image using a deep learning model, and then further optimizing these results using relevant algorithms. For example, a Connectionist Text Proposal Network (CTPN) can be used to obtain preliminary detection results of slanted text in an image. The working principle of CTPN is as follows: First, the input image containing slanted text is preprocessed according to the input requirements of the Visual Geometry Group (VGG16) network to obtain an image that meets the input requirements. This preprocessing includes scaling and / or padding operations. Then, the VGG16 network is used to extract features from the image, generating a feature map. This feature map is then input into a Bidirectional Long Short-Term Memory (Bi-LSTM) network for sequence feature extraction, and finally fed into a fully connected network to obtain the detection region of the slanted text in the image.
[0004] However, CTPN has the problem of low detection efficiency in detecting tilted text in images. Summary of the Invention
[0005] This application provides a method, apparatus, and storage medium for processing tilted text images to solve the problem of low detection efficiency.
[0006] In a first aspect, this application provides a method for processing tilted text images, comprising: acquiring an image to be processed, the image containing at least one line of tilted text; inputting the image to be processed into a tilted text detection model, extracting features from the image using the Inception_V2 network in the tilted text detection model to obtain a feature map; extracting sequence features from the feature map using a Bi-Long Short-Term Memory (Bi-LSTM) network in the tilted text detection model to obtain sequence features, which are then input into a fully connected network in the tilted text detection model to obtain position coordinate information of each line of tilted text output by the fully connected network; determining the tilt angle of each line of tilted text based on the position coordinate information of the tilted text; and rotating the image to be processed according to the tilt angle corresponding to each line of tilted text to obtain a correction result corresponding to the image to be processed.
[0007] Optionally, the image to be processed is rotated according to the tilt angle corresponding to at least one line of tilted text to obtain the correction result corresponding to the image to be processed, including: determining the average tilt angle corresponding to at least one line of tilted text as the rotation angle; using the rotation angle, rotating the image to be processed to obtain the correction result corresponding to the image to be processed.
[0008] Optionally, the image to be processed is rotated according to the tilt angle corresponding to at least one line of tilted text to obtain the correction result corresponding to the image to be processed. This includes: responding to the selection operation of the strategy scheme in the interactive interface, determining the rotation angle based on the tilt angle corresponding to at least one line of tilted text and the target strategy scheme of the selection operation, wherein the strategy scheme is used to provide the strategy for determining the rotation angle; and rotating the image to be processed using the rotation angle to obtain the correction result corresponding to the image to be processed.
[0009] Optionally, the Inception_V2 network contains multiple convolutional layers. In the last convolutional layer, a sliding window with a stride of N is used for feature extraction to obtain a feature map, where N is greater than 1.
[0010] Optionally, the tilt angle of the tilted text is determined based on the position coordinate information of the tilted text, including: determining the minimum bounding rectangle of the tilted text region based on the position coordinate information of the tilted text; and obtaining the tilt angle of the tilted text based on the minimum bounding rectangle.
[0011] Optionally, the tilted text detection model is trained by: acquiring sample images and the height information of the tilted text contained in the sample images; and training the tilted text detection model using the sample images and the height information to obtain the trained tilted text detection model.
[0012] Optionally, obtaining the height information of the slanted text contained in the sample image includes: displaying the sample image on the interactive page; receiving a text box input operation facing the slanted text in the sample image; obtaining the height information of the slanted text in the sample image based on the text box generated by performing the text box input operation, with the text box and the boundary of the slanted text being aligned.
[0013] Secondly, this application provides a processing apparatus for tilted text images, comprising: an acquisition module for acquiring an image to be processed, the image to be processed containing at least one line of tilted text; a processing module for inputting the image to be processed into a tilted text detection model, extracting features from the image to be processed through the Inception_V2 network in the tilted text detection model to obtain a feature map; extracting sequence features from the feature map through the Bi-LSTM network in the tilted text detection model to obtain sequence features and inputting them into a fully connected network in the tilted text detection model to obtain position coordinate information of each line of tilted text output by the fully connected network; a first determination module for determining the tilt angle of each line of tilted text in the at least one line of tilted text based on the position coordinate information of the tilted text; and a second determination module for rotating the image to be processed according to the tilt angle corresponding to each of the at least one line of tilted text to obtain a correction result corresponding to the image to be processed.
[0014] Optionally, the second determining module can be used to determine the average tilt angle of at least one line of tilted text as the rotation angle; using the rotation angle, the image to be processed is rotated to obtain the correction result corresponding to the image to be processed.
[0015] Optionally, the second determining module can also be used to respond to the selection operation of the strategy scheme applied to the interactive interface, and determine the rotation angle based on the tilt angle corresponding to at least one line of tilted text and the target strategy scheme applied by the selection operation, wherein the strategy scheme is used to provide the strategy for determining the rotation angle; using the rotation angle, the image to be processed is rotated to obtain the correction result corresponding to the image to be processed.
[0016] Optionally, the tilted text detection model includes the Inception_V2 network, which contains multiple convolutional layers. In the last convolutional layer, a sliding window with a stride of N is used for feature extraction to obtain a feature map, where N is greater than 1.
[0017] Optionally, the first determining module can be used to determine the minimum bounding rectangle of the tilted text region based on the position coordinate information of the tilted text; and to obtain the tilt angle of the tilted text based on the minimum bounding rectangle.
[0018] Optionally, the tilted text detection model is trained by: acquiring sample images and the height information of the tilted text contained in the sample images; and training the tilted text detection model using the sample images and the height information to obtain the trained tilted text detection model.
[0019] Optionally, the tilted text detection model is trained as follows: a sample image is displayed on the interactive page; a text box input operation is received facing the tilted text in the sample image; the height information of the tilted text in the sample image is obtained based on the text box generated by the text box input operation, the text box and the boundary of the tilted text are aligned, and the tilted text detection model is trained using the sample image and the height information to obtain the trained tilted text detection model.
[0020] Thirdly, this application provides an electronic device, including: a memory and a processor; the memory for storing program instructions; and the processor for calling the program instructions to perform a method for processing a tilted text image as provided in any of the first aspects above.
[0021] Fourthly, this application provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the method for processing tilted text images as provided in any of the first aspects above.
[0022] Fifthly, this application provides a computer program product, including a computer program; when the computer program is executed, it implements the method for processing tilted text images as provided in any of the first aspects above.
[0023] The method, apparatus, and storage medium for processing tilted text images provided in this application involve acquiring an image to be processed, which contains at least one line of tilted text; inputting the image to be processed into a tilted text detection model; extracting features from the image using the Inception_V2 network in the tilted text detection model to obtain a feature map; extracting sequence features from the feature map using the Bi-LSTM network in the tilted text detection model to obtain sequence features, which are then input into the fully connected network in the tilted text detection model to obtain the position coordinate information of each line of tilted text output by the fully connected network; determining the tilt angle of each line of tilted text based on the position coordinate information of the tilted text; and rotating the image to be processed according to the tilt angle corresponding to each line of tilted text to obtain the correction result corresponding to the image to be processed. The tilted text detection model in this application uses the Inception_V2 network to extract features from the image to be processed. The Inception_V2 network has no limit on the size of the input image to be processed, so there is no need to preprocess the image, which improves the detection efficiency. In addition, the Inception_V2 structure eliminates the 5*5 convolution, which can reduce the number of parameters and resource consumption. The Batch Normalization (BN) algorithm it uses can further improve the detection efficiency and accuracy, and improve the user experience. Attached Figure Description
[0024] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0025] Figure 1 This is a schematic diagram illustrating an application scenario provided in the embodiments of this application;
[0026] Figure 2 A flowchart illustrating the method for processing tilted text images provided in this application embodiment;
[0027] Figure 3 This is a schematic diagram of the interactive interface provided in an embodiment of this application;
[0028] Figure 4a A schematic diagram of candidate boxes for conventional model training provided in the embodiments of this application;
[0029] Figure 4b A schematic diagram of candidate bounding boxes for training the tilted text detection model provided in an embodiment of this application;
[0030] Figure 4c A schematic diagram of candidate box comparison provided for embodiments of this application;
[0031] Figure 5 A schematic diagram of the structure of the tilted text image processing device provided in the embodiments of this application;
[0032] Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.
[0033] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation
[0034] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.
[0035] Figure 1 This is a schematic diagram illustrating an application scenario provided in an embodiment of this application. For example... Figure 1 As shown, this application scenario includes user 101 and terminal device 102.
[0036] Terminal device 102 provides user 101 with a processing service for tilted text images, which is the implementation carrier of the tilted text image processing method provided in this application. For example, terminal device 102 may include an interactive interface and a processing system. The interactive interface can provide user 101 with processing options and the processing results of the tilted text image by the processing system. Specifically, user 101 inputs a tilted text image to be processed / corrected to terminal device 102, and terminal device 102 processes and displays the tilted text image based on the processing options input by user 101.
[0037] Terminal device 102 can be: mobile phone, tablet computer, laptop computer, handheld computer, mobile internet device (MID), smart home device (e.g., refrigerator, television, air conditioner, electricity meter, etc.), smart robot, wireless terminal device in remote medical surgery, or wireless terminal device in smart home, etc., without limitation.
[0038] Figure 2 This is a flowchart illustrating the method for processing tilted text images provided in an embodiment of this application. Figure 2 As shown, the processing method includes:
[0039] S201: Obtain the image to be processed, which contains at least one line of slanted text.
[0040] The image to be processed contains at least one line of slanted text, and may also contain multiple lines of slanted text. In embodiments containing multiple lines of slanted text, the lines of slanted text may have the same slant angle or different slant angles. The size of the image to be processed may also vary.
[0041] S202: Input the image to be processed into the tilted text detection model. The inception_V2 network in the tilted text detection model extracts features from the image to obtain a feature map. The Bi-LSTM network in the tilted text detection model extracts sequence features from the feature map to obtain sequence features, which are then input into the fully connected network in the tilted text detection model to obtain the position coordinate information of each line of tilted text output by the fully connected network.
[0042] The tilted text detection model provided in this application includes an Inception_V2 network, a Bi-LSTM network, and a fully connected network, wherein the Inception_V2 network is a convolutional neural network. It can be understood that the tilted text detection model is a fully convolutional neural network.
[0043] Optionally, the Inception_V2 network contains multiple convolutional layers. In the last convolutional layer, a sliding window with a stride of N is used for feature extraction to obtain a feature map, where N is greater than 1. For example, N can be 3, meaning that in the last convolutional layer, a sliding window with a stride of 3 is used for feature extraction to obtain a feature map. This improves the detection speed of the detection model while ensuring its detection accuracy.
[0044] Specifically, the structure of inception_V2 includes 1*1 convolution, 3*3 convolution and 3*3 max pooling. Inception_V2 performs feature extraction on the image to be processed, including: using 1*1 convolution, 3*3 convolution and 3*3 max pooling to obtain data of different scales of the image to be processed, and fusing these different scales of data to obtain feature maps.
[0045] Inception_V2 removes the 5x5 convolution branch because text height is typically around 10 pixels. Using a large convolution kernel often results in unclear line segmentation, a surge in parameters, excessive resource consumption, and increased processing time. Therefore, removing the 5x5 convolution branch reduces the number of parameters and resource consumption of the skewed text detection model, improving detection efficiency. Furthermore, Inception_V2 employs Batch Normalization (BN), a regularization method that standardizes the input layer information distribution to an N(0,1) Gaussian distribution, significantly improving convergence speed and further enhancing the detection speed of the skewed text detection model.
[0046] The Bi-LSTM network structure consists of two independent Long Short Term Memory (LSTM) neural networks. The input sequence is fed into the two LSTM neural networks in both forward and reverse order for feature extraction. The word vector formed by concatenating the two output vectors (i.e., the extracted feature vectors) is used as the final feature representation of the word.
[0047] A fully connected network (FCN) is a relatively simple artificial neural network structure, belonging to the feedforward neural network category. It consists of an input layer, hidden layers, and an output layer, with each hidden layer potentially containing multiple neurons. A fully connected network can transform the two-dimensional feature map output by convolution into a one-dimensional vector.
[0048] In this embodiment, the image to be processed is input into the tilted text detection model. The Inception_V2 network extracts features from the image to obtain a feature map. A Bi-LSTM network then extracts sequence features from the feature map, which are input into the fully connected network of the tilted text detection model. This results in the position coordinate information of each line of tilted text output by the fully connected network. The position coordinate information may include the position coordinates of some or all of the text detected by the tilted text detection model. Since Inception_V2 has no limit on the size of the input image, preprocessing such as padding is unnecessary, further improving the detection efficiency of the tilted text detection model.
[0049] S203: For each line of slanted text in at least one line of slanted text, determine the slant angle of the slanted text based on the position coordinate information of the slanted text.
[0050] For example, the tilt angle of the tilted text can be determined based on the position coordinates of the end characters in each line of tilted text. Specifically, if the position coordinates of the end characters A and B are detected as (x1, y1) and (x2, y2), the slope k of the line A→B can be calculated:
[0051]
[0052] Calculate the arctangent using the slope k, then convert the resulting radians to degrees; this gives the tilt angle of the tilted text in that line. Specifically, this process can be implemented using computer program code, as shown below:
[0053] result=np.arctan(k)*57.29577;
[0054] print"The angle of inclination of the line is: "+str(result)+" degrees".
[0055] S204: Rotate the image to be processed according to the tilt angle corresponding to at least one line of tilted text to obtain the correction result corresponding to the image to be processed.
[0056] For example, rotation functions can be called to rotate the image to be processed, such as the functions `imrotate()`, `Rotate()`, or `WarpAffine()` in the OpenCV library. The correction result for the image to be processed refers to the rotation result that makes the tilted text in the image horizontal, so that the user can see the horizontally arranged text content when viewing the correction result directly.
[0057] This embodiment of the application obtains an image to be processed, which contains at least one line of slanted text. The image is input into a slanted text detection model, where the Inception_V2 network extracts features to obtain a feature map. A Bi-LSTM network in the same model then extracts sequence features from the feature map, which are input into a fully connected network. The fully connected network outputs the position coordinates of each line of slanted text. For each line of slanted text, the slant angle is determined based on its position coordinates. The image is then rotated according to the slant angles of the at least one line of slanted text, resulting in a corrected image. This embodiment uses the Inception_V2 network for feature extraction. Since the Inception_V2 network has no size limit on the input image, preprocessing is unnecessary, improving detection efficiency. Furthermore, the Inception_V2 structure eliminates the 5x5 convolution, reducing parameter count and resource consumption. Its Batch Normalization (BN) algorithm further enhances detection efficiency and accuracy, improving user experience.
[0058] Based on the above embodiments, optionally, the image to be processed is rotated according to the tilt angle corresponding to at least one line of tilted text to obtain the correction result corresponding to the image to be processed. This includes: determining the average tilt angle corresponding to at least one line of tilted text as the rotation angle; using the rotation angle, rotating the image to be processed to obtain the correction result corresponding to the image to be processed. In this embodiment, the average tilt angle of each line of tilted text is used as the rotation angle for rotating the image to be processed, providing a specific method for obtaining the rotation angle.
[0059] In other embodiments, optionally, the image to be processed is rotated according to the tilt angle corresponding to at least one line of tilted text to obtain the correction result corresponding to the image to be processed. This includes: responding to a selection operation of a strategy scheme in the interactive interface, determining a rotation angle based on the tilt angle corresponding to at least one line of tilted text and the target strategy scheme applied by the selection operation, wherein the strategy scheme is used to provide a strategy for determining the rotation angle; and rotating the image to be processed using the rotation angle to obtain the correction result corresponding to the image to be processed.
[0060] Figure 3 This is a schematic diagram of the interactive interface provided in an embodiment of this application. Figure 3As shown, the interactive interface includes, for example, an "Image Display Area," "Click to Select Input Image," "Input Strategy Scheme," and "Display Final Effect" options. The strategy scheme can include determining the rotation angle of the image to be processed. For example, it can output the average of the tilt angles corresponding to at least one line of tilted text by default, or display the tilt angle corresponding to each line of tilted text, allowing the user to select from multiple tilt angles.
[0061] When determining the rotation angle based on the tilt angle corresponding to at least one line of tilted text and the target strategy scheme of the selected operation, the interactive interface can display the correction result of rotating the image to be processed based on that angle in the image display area when the user clicks on each tilt angle. After comparing multiple correction results, the user selects the most satisfactory correction result as the final output result.
[0062] When performing specific operations based on the interactive interface provided in this application embodiment, the user first inputs an image to be processed into the terminal device, and the interactive interface can display the image to be processed in the image display area. Specifically, the user can select the image to be processed by clicking the "Input Image" option. This image can be backed up to the terminal device's storage system in advance, or it can be connected to the terminal device through other storage devices, which is not limited here.
[0063] After inputting the image to be processed, the terminal device outputs a rotation angle by executing the steps exemplified in S201 to S204. The rotation angle may include one or more. For example, the user selects one or more rotation angles by clicking on the input strategy option. At this time, the interactive interface can display the correction result of rotating the image to be processed based on each rotation angle selected by the user.
[0064] After selecting the rotation angle for the image to be processed, the user can select the "Show Final Effect" option to display and output the correction result of the image.
[0065] In other embodiments, users can be provided with other customization options. For example, multiple rotation angles can be sorted from smallest to largest, allowing users to customize the start and end points of the slanted text according to their needs, and then sum and average all angles between the start and end points. Alternatively, users can refer to the slanted text detection model to customize and input the target rotation angle value, and the terminal device can rotate the image to be processed based on the user-input target rotation angle value.
[0066] In some embodiments, determining the tilt angle of the tilted text based on its position coordinate information includes: determining the minimum bounding rectangle of the tilted text region based on its position coordinate information; and obtaining the tilt angle of the tilted text based on the minimum bounding rectangle. The position coordinate information output by the tilted text detection model includes the position coordinate information of some or all of the text in at least one line of tilted text. Using this position coordinate information, a minimum bounding rectangle for each line of tilted text is generated, and the tilt angle of the text is determined based on the tilt angle of the rectangle's border, which improves the error tolerance to some extent.
[0067] In some embodiments, the slanted text detection model is trained by: acquiring sample images and the height information of the slanted text contained in the sample images; and training the slanted text detection model using the sample images and the height information to obtain the trained slanted text detection model. Specifically, a large number of sample images containing slanted text are collected. For each sample image containing slanted text, height information about the text is input, for example, directly inputting the text height data, or manually drawing a bounding box to limit the text height in the sample image. These sample images and the corresponding height information are then input into the slanted text detection model for model training.
[0068] In this embodiment, the tilted text detection model generates candidate boxes based on height information, compares the candidate boxes with the ground truth using the Intersection over Union (IoU) loss, and distinguishes between foreground and background by setting different IoU thresholds, thus completing error propagation. The model parameters are adjusted by the loss between the actual output and the expected output to achieve model training.
[0069] Optionally, obtaining the height information of the slanted text contained in the sample image includes: displaying the sample image on the interactive page; receiving a text box input operation targeting the slanted text in the sample image; and obtaining the height information of the slanted text in the sample image based on the text box generated by the text box input operation, wherein the text box and the boundary of the slanted text are aligned. In this embodiment, after inputting the sample image into the training model, a text box input operation is performed on the sample image. For example, a text box is drawn on the sample image, and the text height is limited, so that the slanted text detection model can obtain the height information of the slanted text based on the input text box.
[0070] In the field of object detection, a series of candidate boxes (Anchors) are usually pre-set when training detection models to complete the matching and correction between the true values and the predicted values. Figure 4a A schematic diagram of candidate boxes for conventional model training provided in the embodiments of this application, such as... Figure 4aAs shown, to address the issue of the target's size and location being variable, the implementation requires setting multiple candidate boxes of different sizes on the feature map. However, the more candidate boxes set, the longer the model's computation time. Figure 4b The schematic diagram of the candidate boxes for training the tilted text detection model provided in the embodiments of this application is a candidate box determined based on the height information of the tilted text. Figure 4c This is a comparative illustration of candidate bounding boxes provided in an embodiment of this application. In comparison, this application uses a method of pre-inputting text height information, eliminating the need to set excessive candidate bounding box ratios. This improves both the model's computational speed and its detection accuracy.
[0071] The above embodiments provide a detailed description of the method for processing tilted text images provided in this application. The following will specifically explain the apparatus, electronic device, storage medium, and program product for processing tilted text images provided in the embodiments of this application.
[0072] Figure 5 This is a schematic diagram of the structure of the device for processing tilted text images provided in an embodiment of this application. Figure 5 As shown, the processing apparatus 500 includes:
[0073] The acquisition module 501 is used to acquire the image to be processed, which contains at least one line of slanted text.
[0074] Processing module 502 is used to input the image to be processed into the tilted text detection model, extract features from the image to be processed through the inception_V2 network in the tilted text detection model to obtain a feature map; extract sequence features from the feature map through the Bi-LSTM network in the tilted text detection model to obtain sequence features and input them into the fully connected network in the tilted text detection model to obtain the position coordinate information of each line of tilted text output by the fully connected network.
[0075] The first determining module 503 is used to determine the tilt angle of each line of tilted text in at least one line of tilted text based on the position coordinate information of the tilted text.
[0076] The second determining module 504 is used to rotate the image to be processed according to the tilt angle corresponding to at least one line of tilted text, so as to obtain the correction result corresponding to the image to be processed.
[0077] Optionally, the second determining module 504 can be used to determine that the average tilt angle of at least one line of tilted text is the rotation angle; using the rotation angle, the image to be processed is rotated to obtain the correction result corresponding to the image to be processed.
[0078] Optionally, the second determining module 504 can also be used to respond to the selection operation of the strategy scheme in the interactive interface, and determine the rotation angle based on the tilt angle corresponding to at least one line of tilted text and the target strategy scheme of the selection operation, wherein the strategy scheme is used to provide the strategy for determining the rotation angle; using the rotation angle, the image to be processed is rotated to obtain the correction result corresponding to the image to be processed.
[0079] Optionally, the tilted text detection model includes the Inception_V2 network, which contains multiple convolutional layers. In the last convolutional layer, a sliding window with a stride of N is used for feature extraction to obtain a feature map, where N is greater than 1.
[0080] Optionally, the first determining module 503 can be used to determine the minimum bounding rectangle of the tilted text region based on the position coordinate information of the tilted text; and to obtain the tilt angle of the tilted text based on the minimum bounding rectangle.
[0081] Optionally, the tilted text detection model is trained by: acquiring sample images and the height information of the tilted text contained in the sample images; and training the tilted text detection model using the sample images and the height information to obtain the trained tilted text detection model.
[0082] Optionally, the tilted text detection model is trained as follows: a sample image is displayed on the interactive page; a text box input operation is received facing the tilted text in the sample image; the height information of the tilted text in the sample image is obtained based on the text box generated by the text box input operation, the text box and the boundary of the tilted text are aligned, and the tilted text detection model is trained using the sample image and the height information to obtain the trained tilted text detection model.
[0083] The apparatus provided in this application embodiment can be used to perform the above-described method for processing tilted text images. Its implementation and technical effects are similar, and will not be described again here.
[0084] Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Figure 6 As shown, the electronic device 600 includes:
[0085] Processor 601, memory 602, communication interface 603 and system bus 604.
[0086] The memory 602 and the communication interface 603 are connected to the processor 601 via the system bus 604 and communicate with each other. The memory 602 is used to store computer execution instructions, the communication interface 603 is used to communicate with other devices, and the processor 601 is used to execute the computer execution instructions to execute the scheme of the tilted text image processing method as described in the above method embodiment.
[0087] Specifically, processor 601 may include one or more processing units. For example, processor 601 may be a CPU, a Digital Signal Processing (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general-purpose processor may be a microprocessor or any conventional processor. The steps of the method disclosed in the application can be directly manifested as being executed by a hardware processor, or executed by a combination of hardware and software modules within the processor.
[0088] The memory 602 can be used to store program instructions. The memory 602 may include a program storage area and a data storage area. The program storage area may store the operating system, at least one application program required for a function (such as sound playback), etc. The data storage area may store data created during the use of the electronic device 600 (such as audio data), etc. Furthermore, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, Universal Flash Storage (UFS), etc. The processor 601 executes various functional applications and data processing of the electronic device 600 by running the program instructions stored in the memory 602.
[0089] Communication interface 603 can provide solutions for wireless communication, including 2G / 3G / 4G / 16G, applied to electronic device 600. Communication interface 603 can receive electromagnetic waves via an antenna, and perform filtering, amplification, and other processing on the received electromagnetic waves before transmitting them to a modem processor for demodulation. Communication interface 603 can also amplify the signal modulated by the modem processor and radiate it as electromagnetic waves via the antenna. In some embodiments, at least some functional modules of communication interface 603 can be housed in processor 601. In some embodiments, at least some functional modules of communication interface 603 and at least some modules of processor 601 can be housed in the same device.
[0090] The system bus 604 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This system bus 604 can be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, it is represented by only one thick line in the diagram, but this does not indicate that there is only one bus or one type of bus.
[0091] It should be noted that the number of memory units 602 and processor units 601 is not limited in this embodiment; each can be one or more. Figure 6 The illustration shows an example; the memory 602 and the processor 601 can be connected via wired or wireless means, such as a bus connection. In practical applications, the electronic device 600 can be various forms of computers or mobile terminals. Computers include, for example, laptops, desktop computers, workbenches, servers, blade servers, mainframe computers, etc.; mobile terminals include, for example, personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices.
[0092] The electronic device in this embodiment can be used to execute the technical solutions in the above method embodiments. Its implementation principle and technical effect are similar, and will not be repeated here.
[0093] This application also provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the method for processing tilted text images in the above-described method embodiments.
[0094] This application also provides a computer program product, including a computer program; when the computer program is executed, it implements the method for processing tilted text images as described in the above method embodiments.
[0095] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this application are indicated by the following claims.
[0096] It should be understood that this application is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this application is limited only by the appended claims.
Claims
1. A method for processing tilted text images, characterized in that, include: Obtain an image to be processed, wherein the image to be processed contains at least one line of slanted text; The image to be processed is input into the tilted text detection model. The Inception_V2 network in the tilted text detection model extracts features from the image to obtain a feature map. The bidirectional long short-term memory (Bi-LSTM) network in the tilted text detection model extracts sequence features from the feature map to obtain sequence features, which are then input into the fully connected network in the tilted text detection model to obtain the position coordinate information of each line of tilted text output by the fully connected network. The position coordinate information is used to represent the overall region of the corresponding line of tilted text. For each line of slanted text in the at least one line of slanted text, the minimum bounding rectangle of the slanted text region is determined based on the position coordinate information of the slanted text. The tilt angle of the tilted text is obtained based on the minimum bounding rectangle. Based on the tilt angle corresponding to each of the at least one line of tilted text, the image to be processed is rotated to obtain the correction result corresponding to the image to be processed.
2. The treatment method according to claim 1, characterized in that, The step of rotating the image to be processed according to the tilt angle corresponding to each of the at least one line of tilted text to obtain the correction result corresponding to the image to be processed includes: The average tilt angle corresponding to each of the at least one line of tilted text is determined to be the rotation angle; The image to be processed is rotated using the specified rotation angle to obtain the corresponding correction result.
3. The treatment method of claim 1, wherein The step of rotating the image to be processed according to the tilt angle corresponding to each of the at least one line of tilted text to obtain the correction result corresponding to the image to be processed includes: In response to a selection operation of a strategy scheme applied to the interactive interface, based on the tilt angles corresponding to the at least one line of tilted text, and according to the target strategy scheme applied by the selection operation, a rotation angle is determined, wherein the strategy scheme is used to provide a strategy for determining the rotation angle. The image to be processed is rotated using the specified rotation angle to obtain the corresponding correction result.
4. The treatment method according to any one of claims 1 to 3, characterized in that, The Inception_V2 network contains multiple convolutional layers. In the last convolutional layer, a sliding window with a stride of N is used to extract features to obtain the feature map, where N is greater than 1.
5. The treatment method according to any one of claims 1 to 3, characterized in that, The tilted text detection model was trained in the following way: Obtain the height information of the sample image and the tilted text contained in the sample image; The tilted text detection model is trained using the sample images and the height information to obtain the trained tilted text detection model.
6. The treatment method of claim 5, wherein, Obtaining the height information of the slanted text contained in the sample image includes: The sample image is displayed on the interactive page; Receive text box input operations targeting the tilted text in the sample image; Based on the text box generated by performing the text box input operation, the height information of the tilted text in the sample image is obtained, and the text box is aligned with the boundary of the tilted text.
7. A processing apparatus for slanting a text picture, characterized by comprising: include: The acquisition module is used to acquire the image to be processed, wherein the image to be processed contains at least one line of slanted text. The processing module is used to input the image to be processed into the tilted text detection model, extract features from the image through the Inception_V2 network in the tilted text detection model to obtain a feature map; extract sequence features from the feature map through the bidirectional long short-term memory Bi-LSTM network in the tilted text detection model to obtain sequence features, and input them into the fully connected network in the tilted text detection model to obtain the position coordinate information of each line of tilted text output by the fully connected network; wherein, the position coordinate information is used to represent the overall region of the corresponding line of tilted text; The first determining module is used to determine the minimum bounding rectangle of the tilted text region for each line of tilted text in the at least one line of tilted text, based on the position coordinate information of the tilted text; and to obtain the tilt angle of the tilted text based on the minimum bounding rectangle. The second determining module is used to rotate the image to be processed according to the tilt angle corresponding to the at least one line of tilted text, so as to obtain the correction result corresponding to the image to be processed.
8. An electronic device, comprising: include: Memory, processor; The memory is used to store program instructions; The processor is configured to invoke the program instructions to execute the method for processing tilted text images as described in any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the method for processing tilted text images as described in any one of claims 1 to 6.