Image correction method and device, electronic equipment and storage medium
An image correction and target image technology, applied in the field of image processing, can solve the problems of inconvenient character recognition, low detection accuracy, and inability to accurately detect document boundaries, etc., and achieve the effect of good correction effect and high precision.
Active Publication Date: 2022-03-08
BEIJING CENTURY TAL EDUCATION TECH CO LTD
19 Cites 0 Cited by
AI-Extracted Technical Summary
Problems solved by technology
[0002] With the continuous development of computer technology, the application of electronic documents is becoming more and more extensive. In the process of converting paper documents into electronic documents, it is necessary to use cameras, scanners, smart terminals and other tools to shoot. In the process of shooting or scanning , it is inevitable that the image of the document after shooting or scanning will be distorted and deformed, which is not convenient for subsequent operations such as text recognition
However, the existing i...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View moreMethod used
A kind of neural network model training method that the embodiment of the present disclosure provides, by converting the obtained document image samples, obtain massive augmentation samples, then train the neural network model based on the massive document image samples, the training samples are sufficient, The accuracy of the trained neural network model is relatively high, so that the neural network model can accurately complete operations such as classification and calculation of corner coordinates, and the training speed is relatively fast.
Understandably, on the basis of above-mentioned S520, the feature extraction module in the neural network model extracts the feature information of the document image sample after normalization, namely the feature information of the above-mentioned 512*512 size document image sample, The characteristic information mainly includes the characteristic information of the document content, which removes the characteristic inform...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View moreAbstract
The invention relates to an image correction method, and the method comprises the steps: obtaining a target image, carrying out the normalization processing of the target image, inputting the normalized target image into a pre-trained neural network model, and enabling the neural network model to comprise a feature extraction module, a classification module and a calculation module, the feature extraction module extracts feature information of the target image, the classification module classifies the target image according to the feature information, the calculation module generates a first corner coordinate corresponding to the target image according to the feature information, and if it is determined that the target image comprises a document according to a classification result, the first corner coordinate is obtained; and correcting the target image according to the first angular point coordinates to obtain a corrected target image. According to the invention, the document in the image can be detected to obtain the boundary, the document is corrected, the precision is high, and the correction effect is good.
Application Domain
Technology Topic
Image
Examples
- Experimental program(1)
Example Embodiment
[0032] In order to understand the above objects, features, and advantages of the present disclosure, the embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown, it is to be understood that the present disclosure can be implemented in various forms, and should not be construed as being limited to the embodiments set forth herein, and the opposite is provided for More thorough and completely understood. It should be understood that the accompanying drawings and examples of the present disclosure are for exemplary effects, not for limiting the scope of the present disclosure.
[0033] It should be understood that the various steps described in the method embodiments of the present disclosure can be performed in different order, and / or in parallel. Furthermore, the method embodiment may include additional steps and / or omitted the steps shown. The scope of this disclosure is not limited thereto.
[0034] The term "comprising" as used herein and its deformation is open to include "including but not limited to". The term "is based on" "is" at least partially based ". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Related definitions of other terms will be given in the following description. It should be noted that the "first", "second", and the like mentioned in the present disclosure are only used to distinguish different devices, modules, or units, are not used to define the order of the functions performed by these devices, modules, or units. Or interdependent relationship.
[0035] It should be noted that "one", "one" modification mentioned in this disclosure is schematic rather than restrictive, and those skilled in the art will appreciate that "one or the one or the other Multiple.
[0036] The name or information of the interaction between the plurality of devices in the present disclosure is only for illustrative purposes, and is not intended to limit the range of such messages or information.
[0037] Specifically, the image correction method can be performed by a terminal or server. Specifically, the terminal or server can correct the document content in the target image through a neural network model. The active body of the training method of the neural network model may be the same, and may be different.
[0038] For example, in an application scenario, such as figure 1 Distance figure 1 A schematic diagram of an application scenario provided for an embodiment of the present disclosure, a server 12 training a neural network model. The terminal 11 acquires the neural network model completed by the training completion from the server 12, and the terminal 11 is corrected to the document content in the target image by the neural network model completed by the training. The target image can be obtained by the terminal 11. Alternatively, the target image is acquired from the terminal 11 from other devices. Alternatively, the target image is an image obtained after the terminal 11 performs image processing of the preset image, which may be obtained by the terminal 11, or the preset image may be obtained from the terminal 11 from other devices. Here, do not specifically limit other devices.
[0039] In another application scenario, server 12 training the neural network model. Further, the server 12 is corrected to the document content in the target image by the neural network model of training. The server 12 acquires the target image can be similar to the way the target image is acquired as described above, and details are not described later.
[0040] In yet another application scenario, the terminal 11 training the neural network model. Further, the terminal 11 is called the content of the document in the target image by the neural network model of training.
[0041] It will be appreciated that the neural network model training method provided in the present disclosure, the image correction method is not limited to several possible scenes as described above. Since the neural network model of training is applicable to the image correction method, the neural network model training method can be described before introducing the image correction method.
[0042] The following is a server 12 training neural network model as an example to introduce a neural network model training method, that is, the training process of neural network model. It will be appreciated that the neural network model training method also applies to the scene of the terminal 11 training neural network model.
[0043] figure 2 A flow chart of a neural network model training method provided by the embodiment of the present disclosure, including figure 2 The following steps S210 to S240 are shown:
[0044] S210, acquire the document image sample and the identity of the document image sample.
[0045]It will be appreciated that the terminal takes a plurality of documents to obtain multiple document images, uploading multiple document images to the server, the multi-document image as a trained sample set for neural network model, for training neural network models, where the terminal can be Mobile phones, servers are platform training for neural network model training. It will be appreciated that the document content included in the document image taken by the terminal is complete, and the content of the document is complete, that is, the four boundaries including the document content in the document image, and the documentation can be tilted or perspective. . After the server gets a plurality of document image samples, tag each document image sample in multiple document image samples, that is, the tag of the tag of the above acquired document image sample, which is the label of the effective document, which is easy to follow up the neural network. The classification module of the model is trained. The classification module is primarily two-class, classification results are image and invalid documents of the valid document, and the image of invalid documents can be understood as an image that does not include document content or only relatively small partial document content. image.
[0046] S220, the angular point of the document in the document image sample is labeled to get the third corner coordinate.
[0047] It will be appreciated that after the above S210, after the server receives the document image sample, the four corners of the document sample in the document image sample are labeled with the labeling tool to obtain 4 third corner coordinates, and the angular point can be a line segment. The end point, or the maximum point of the topic curvature, or the intersection of the line segment, the four intersection coordinates of 4 boundary lines in document image samples in the document image sample are as an example, and the description .
[0048] Optionally, the angular point of the document in the document image sample is labeled, and the third corner coordinates are obtained, including: sequentially acquire multiple contours of the document in the document image sample in order; calculate the intersection of multiple contours, generate at least A third corner coordinate.
[0049] It can be understood that the step of obtaining the third corner coordinate in the above S220 includes: using the labeling tool to sequentially acquire multiple contours of document content in the document image sample in order, specifically acquire four contours of the document content, according to clockwise The direction is labeled against the four contours, and distinguishes the left left, for example, the upper contour of the document content is red, the right outline is labeled as green, and the outline is marked as blue, the left contour is labeled as purple; completed After the four contour labels, calculate the intersection between the two outlines in the four contours, generate 4 third corner coordinates.
[0050] Exemplary, see image 3 , image 3 A sample image provided by the present disclosure embodiment, image 3 The document image sample 310 after the label is included, and the document image sample 310 includes four contour lines of the document content, which is rejected as an upper contour 311, the right contour 312, the lower contour 313, the left contour line 314, 4 contours The line is distinguished in different colors. image 3 It also includes 4 corners corresponding to the document content, 4 corners such as image 3 The black polka dot is shown in which the intersection of the left contour 314 and the upper contour line 311 is recorded as angular point 1, the intersection of the upper contour 311 and the right contour 312 is recorded as angular point 2, right contour 312 and the lower contour. The intersection of the line 313 is recorded as angular point 3, and the intersection of the lower contour 313 and the left venom 314 is recorded as angular point 4.
[0051] S230, transform according to the document image sample and the third corner coordinate, resulting in a plurality of document image samples after an increase, and each document image sample has a corresponding third corner coordinate of each document image sample in a plurality of document image samples.
[0052] It will be appreciated that on the basis of the above S220, the tilt of the phone is randomly simulates the tilt of the mobile phone photographing, and the disconnect trapezoid, light change, contrast change, blur, noise, etc. are randomly simulated. Transform, get a massive increased document image sample, and do not need to separately label the third corner coordinates for the increased document image sample, respectively, and can directly generate a conversion rule of the document image sample after the document image sample. The third corner coordinate synchronization of the document image sample is synchronized, and the third corner coordinate corresponding to the increased document image sample is obtained, which is the third corner of each document image sample in a plurality of document image samples after the increase. Point coordinates can reduce the operational steps for the increased operation.
[0053] Optionally, in the above S230 converted according to the document image sample and the third angle coordinate, a plurality of document image samples after the increase are obtained, and the specifics include: transform the third corner point coordinate based on the preset transformation matrix to obtain the first Four corner coordinates; parameters of the preset transformation matrix are determined according to the third corner coordinates and the fourth corner point coordinates; the document image samples are converted according to the preset transformation matrix of the determination parameters to obtain a plurality of document image samples after the increased.
[0054] It will be appreciated that the plurality of document image samples are converted in accordance with the document image sample in the above S230, which includes the following steps to convert the third corner coordinates obtained by the preset conversion matrix to change the fourth corner coordinate after the transformation. The preset conversion matrix can be understood as a perspective transformation matrix, such as converting 4 third angular point coordinates through the perspective transformation matrix, respectively, respectively, and obtains four corresponding fourth corner coordinates. The parameters in the preset transformation matrix are then determined according to the third corner coordinates and the fourth corner coordinate, and the preset transform matrix may be a matrix of 3 * 3 size. At least some of the parameters in the preset transformation matrix are unknown, and can be constructed. The relationship between the third corner coordinates and the fourth corner coordinates determines unknown parameters in the preset transform matrix. Then, according to the preset conversion matrix of the determined parameters, the document image sample and the third corner coordinate of the marked, which are obtained, and each document image sample is obtained in the plurality of document image samples after the increased wide document. The third corner coordinate.
[0055] Exemplary, see image 3 , image 3 Including the increased renewal image 320 and the Conversion Image 330, the Conversion Image 320, and the Conversion Image 330, the Conversion Image 320, and the Conversion Image 330 generated according to the document image sample 310, can be seen as a preset conversion matrix of the document image sample 310 based on the determination parameter. Perspective transform images. It will be appreciated that a plurality of pixel points that are filled in the conventional image 330 obtained after the document image sample 310 are rotated, that is, the remaining blank area other than the scaled document image sample 310 after the increase in the scaled image 330. For the pixel point of the pixel point, the pixel point is filled to ensure that the increased capacity image 330 and the document image sample 310 have the same size (wide), the plurality of pixel values filling may be 128 or 0, and can be based on user requirements. Set the pixel value of the filled.
[0056] Optionally, the above-described third angular point coordinates and the fourth angular point coordinate determines the preset conversion matrix, specifically includes generating a first matrix according to the third corner coordinates and the fourth corner coordinate; generates according to the fourth corner coordinate The second matrix; the parameters of the preset conversion matrix are determined according to the first matrix and the second matrix.
[0057] It will be appreciated that the transform matrix will be described by the following formulas (1) to (5). The preset transformation matrix and its parameters are shown in the formula (1).
[0058] M = [[[A, B, C],
[0059] [D, E, F], Formula (1)
[0060] [g, h, 1]]]]]]
[0061] Among them, M represents the preset conversion matrix of 3 * 3, and the unknown parameters of the preset transform matrix, the parameter value of the third row in the preset transformation matrix is 1.
[0062] X_ = (A * x + B * Y + C) / (g * x + h * y + 1)
[0063] Y_ = (D * x + E * Y + F) / (g * x + h * y + 1) formula (2)
[0064] Among them, the third corner coordinate is (x, y), the fourth corner coordinate is (x_, y_), and the third corner coordinate generates a fourth corner coordinate based on the parameters of the preset change matrix.
[0065] It will be appreciated that the four third corner coordinates of document image samples are recorded (x0, y0), (x1, y1), (x2, y2), (x3, y3), 4 third corner coordinates composition Array, recorded as SRC = [[X0, Y0], [X1, Y1], Y2], [X3, Y3]], and 4 fourth corner coordinates after conversion are recorded (x_0, y_0), respectively. (X_1, y_1), (x_2, y_2), (x_3, y_3), the four fourth corner coordinates after the conversion constitute a second array, recorded as DST = [[x_0, y_0], [x_1, y_1] [x_2, y_2], [x_3, y_3]], facilitating the parameters of the preset conversion matrix.
[0066] It will be appreciated that the first matrix is generated according to the third corner coordinates and the fourth corner coordinate, that is, the first matrix M_SD is generated according to the first array SRC and the second array DST, and the first matrix is shown in the formula (3).
[0067] M_SD = [[X0, Y0, 1, 0, 0, 0, -X0 * X_0, -Y0 * X_0],
[0068] [0, 0, 0, X0, Y0, 1, -X0 * Y_0, -Y0 * Y_0],
[0069] [x1, y1, 1, 0, 0, 0, -x1 * x_1, -y1 * x_1],
[0070] [0, 0, 0, x1, y1, 1,-x1 * y_1, -Y1 * Y_1],
[0071] [X2, Y2, 1, 0, 0, 0, -x2 * x_2, -y2 * x_2],
[0072] [0, 0, 0, x2, y2, 1,-x2 * y_2, -Y2 * y_2],
[0073] [x3, y3, 1, 0, 0, 0,-x3 * x_3, -y3 * x_3],
[0074] [0, 0, 0, X3, Y3, 1, -X3 * Y_3, -Y3 * Y_3]] Formula (3)
[0075] It will be appreciated that the second matrix is generated according to the fourth corner coordinate, that is, the second matrix T_SD is generated according to the second array described above, and the second matrix is as shown in the formula (4).
[0076] T_SD = [[x_0], [y_0], [x_1], [y_1], [x_2], [y_2], [x_3], [y_3]] Formula (4)
[0077] It will be appreciated that the parameters of the preset conversion matrix are determined according to the first matrix and the second matrix, that is, eight parameters of the preset conversion matrix according to the first matrix M_SD and the second matrix T_SD, specifically see the formula (5) .
[0078] P_m_sd = m_sd_i * t_sd formula (5)
[0079] Where P_M_SD is an array of 8 parameters of the preset transformation matrix, m_sd_i is the inverse matrix of the first matrix M_SD, T_SD is the second matrix.
[0080]S240, the neural network model is trained according to a third corner coordinate corresponding to each document image sample in a plurality of document image samples and a plurality of document image samples.
[0081] It will be appreciated that on the basis of the above S230, the document image sample is input to a predicted classification result corresponding to the document image sample and the predicted third corner point coordinates, according to the predicted classification result and prediction of the model output. The third corner coordinate is compared with the identification of the set text image sample, and calculates the loss function, and the value of the loss function is performed by calculating the value of the obtained loss function. renew.
[0082] Exemplary, see Figure 4 , Figure 4 A structural diagram of a neural network model provided for the embodiment of the present disclosure, Figure 4 Including feature extraction module 410, classification module 420, and computing module 430, the feature extraction module 410 includes a backbone network layer and a pool layer, and the backbone network layer can be a residual network, and the specifically can be a RESTNET 101 network, and the pool layer can be self. AdaptiveAvgPool, the backbone network layer is used to extract the depth characteristics of the document image sample, and the pool layer is used to filter out useless features in the depth feature, and useless features can be understood as a background feature. The classification module 420 includes at least one convolution layer, and the classification module 420 is configured to classify the document image sample based on the depth feature information output from the feature extraction module 410, to determine if the document image sample is an image including a valid document, and the calculation module 430 includes At least one convolution layer is used to calculate an angular point coordinate of the document image in the document image sample according to the depth feature information output from the feature extraction module 410, for example image 3 The coordinates of the four corners are convenient for follow-up to correct the document image samples according to the angle coordinates.
[0083] A neural network model training method provided by the present disclosure, by transitioning the acquired document image sample, obtains a massive increase in the sample, and then training the neural network model based on massive document image samples, training samples, after training Neural network model accuracy is relatively high, so that neural network models can accurately complete classification and calculation of angular coordinates, and the speed of training is also faster.
[0084] Figure 5 A flow chart of a neural network model training method provided in the present disclosure, optional, a third corner coordinate corresponding to each document image sample in a plurality of document image samples and a plurality of document image samples, a neural network The model is trained, that is, the training process inside the neural network model, including such as Figure 5 The following steps S510 to S560:
[0085] S510, according to the size of each document image sample, according to the plurality of document image samples, normalize the third corner coordinate of each document image sample.
[0086] It will be appreciated that the server is based on the size of each document image sample in a plurality of document image samples, that is, the width and high of the document sample image, and normalize the third corner coordinates corresponding to each document image sample, see the following Formula (6).
[0087] X1 = float (x) / im_w
[0088] Y1 = float (y) / IM_H formula (6)
[0089] Among them, the third corner coordinates (X, Y), the normalized third corner coordinates (x1, y1), the width of the document sample image is IM_W, high as IM_H.
[0090] S520, normalize each document image sample in multiple document image samples to a preset wide.
[0091] It will be appreciated that after the above S510, the server is generated by increasing production of massive text image samples according to the text image sample, and the massive text image sample can be normalized, and the size of the massive image sample can be scaled or narrowed to the same size. For example, normalized to 512 * 512.
[0092] S530, the feature extraction module extracts the feature information of the normalized document image sample.
[0093] It will be appreciated that on the basis of the above S520, the characteristic extracting module in the neural network model extracts the feature information of the normalized document image sample, that is, the characteristic information of the above-mentioned 512 * 512 size document image sample, this feature information It mainly includes feature information for document content, and maximizes the characteristic information of the background of the document image sample, which can effectively improve the accuracy of the neural network model. For example, the number of text image samples input into the neural network model is n, and the channel number of each text image sample is c, and the width is w, high is H, for example, RGB is 3 channels, the trunk network layer pair in the feature extraction module The input text image sample is characterized, and the characteristic information of n h * w * 1024 is output. H * W is the size of the feature extraction module for extracting the text image sample, then n-h * w * The feature information of 1024 is input to the pool layer, and the pool layer is compressed, and the feature information of the n 5 * 5 * 1024 is output. The size of the characteristic after compression is 5 * 5, wherein the pool The volume of the grain layer can be set to (5, 5).
[0094] S540, the classification module classifies the document image sample according to the feature information, and the predictive identifier is obtained, and the first loss function is calculated based on the first loss function based on the predicted identifier and the identity of the document image sample.
[0095] It will be appreciated that the classification module in the neural network model is classified according to the feature information output by the feature extraction module, that is, the classification module outputs N 5 * 5 * 1024 based on the feature extraction module according to the feature extraction module. The feature information is classified, and N predictive identifiers are obtained, that is, the predictive identifier corresponding to each text image sample, the predictive identifier may be an image including a document or an image of the document; the classification module includes a convolution layer, a convolution layer The size of 5 * 5, the N predictive identifier output by the classification module is N 1 * 1 * 1, that is, only one classification result is output. After the prediction recognition is obtained, the predicted identity of the document image and the pre-set identifier are calculated as the first loss function, and the first loss function is the second classification cross entropy loss function.
[0096] S550, the calculation module calculates a predictive angular point coordinate of the document image sample according to the feature information, and calculates the second loss value based on the second loss function based on the predicted angular coordinates and the third corner point coordinates.
[0097] It will be appreciated that on the basis of the above S530, the calculation module in the neural network model calculates four predicted angular point coordinates of document image samples in the document image sample based on feature extraction module, that is, the calculation module outputs according to the feature extraction module The characteristic information of the N 5 * 5 * 1024 calculates the angular point coordinates, obtaining the predictive angular coordinates corresponding to the N document image samples, which is to obtain four predicted angular coordinates corresponding to each text image sample; calculation module Including the convolutional layer, the size of the convolution layer is 5 * 5, and the predicted angular coordinates of each document image sample output by the calculation module are 1 * 1 * 8, the size is 1 * 1, the number of channels is 8. After determining the predicted corner coordinates corresponding to each document image sample, the predicted angle coordinates corresponding to the document image sample, and the third corner coordinate as the input of the second loss function, calculate the second loss value, the second loss function is all Partial loss function.
[0098] The S560 updates the parameters of the neural network model based on the first loss value and the second loss value.
[0099] It will be appreciated that the parameters of the neural network model are updated based on the first loss value and the second loss value, that is, the network parameters of the characteristic extraction module, classification module, and calculation module are updated in the update neural network model.
[0100] The present disclosure provides a neural network model training method. After obtaining a massive increased image, normalize the size of the massive increased image and the third corner coordinates, which will make the subsequent calculation loss value, will normalize Massive increased image input into the neural network model, calculates the classification module and the calculation module corresponding to the loss value to update the network parameters of each level in the neural network model, which can improve the accuracy of each module in the neural network model, and then Improve the accuracy of the entire neural network model, and the model training effect is better.
[0101] On the basis of the above embodiments, Image 6 A flow chart of an image correction method provided by the embodiment of the present disclosure, that is, the process of application of the neural network model of the training is performed, and the terminal captures the image and uploads to the server as an example to explain, including such as Image 6 The following steps S610 to S640 are shown:
[0102] The S610 acquires the target image and normalizes the target image.
[0103] It will be appreciated that the terminal shooting document generates a target image, and then transmits the target image to the server, the server receives the target image and normalizes the target image, and the normalization process refers to the normalization of the target image to the training nerve. When the network model is used as the size of the document image sample, for example, the size of the target image is normalized to 512 * 512.
[0104] S620, in the neural network model of the normalized target image input to a pre-training completion, the neural network model includes feature extraction module, classification module, and computing module, feature extraction module to extract the characteristic information of the target image, classification module according to characteristics Information is classified for the target image, and the calculation module generates the first corner coordinate corresponding to the target image according to the feature information.
[0105] It will be appreciated that, on the basis of the above S610, the normalized target image is input to the neural network model of the pre-training completion, the neural network model outputs the classification result for the target image and the first corner corresponding to the target image. Dist-sites. Among them, the neural network model includes feature extraction module, classification module, and computing module. The feature extraction module is used to extract feature information about the document in the target image, and the classification module is used to classify the target image according to the feature information, and then determined according to the classification results. Does the target image include a valid document, the calculation module is used to generate a first corner coordinate corresponding to the target image according to the feature information, and the first corner point coordinates may be invalid, the classification module, and the computing module can be performed simultaneously.
[0106] S630, if you determine the document according to the classification result, the first corner point coordinate is obtained.
[0107] It will be appreciated that after the above S620, the neural network model outputs the classification result corresponding to the target image, determines whether or not the target image includes a valid document according to the classification result, and the valid document can be understood to include a complete document or include most of the document. The range of valid documents can be used according to user requirements, and the label of the document image samples can be set according to the training neural network model. The classification results can be text or numbers. For example, the output of the classification result is 1, then the target image includes a valid document. The output of the output is 0, the valid document is not included in the target image. If the target image is determined according to the classification result, the first corner coordinate of the model output is acquired. If the target image does not include the document in the target image according to the classification result, it is directly to end the correction process directly. .
[0108] S640, correct the target image according to the first corner point coordinates to obtain a corrected target image.
[0109]Optionally, the target image is corrected in accordance with the first corner point coordinate, and the target image is corrected, and the corresponding target image is obtained. Specifically, according to the size of the target image and the first corner point coordinate, obtain the second corner point coordinate, based on the first The two-angle coordinates are corrected to the target image to obtain the corrected target image.
[0110] It will be appreciated that on the basis of the above S630, according to the size of the target image and the first corner coordinate, the second corner point coordinate is obtained, that is, the first corner coordinate of the model output to the original image (target image) According to the training process of the neural network model, the first corner coordinate output from the neural network model is corresponding to the normalized target image, so the first corner point obtained based on normalized target images. The coordinates are converted to an angular coordinate of the unrecognized target image, that is, the first corner coordinate (X1, Y1) of the model output is mapped to the second corner coordinate (X, Y), and the mapping formula is viewed. Equation (7). After determining the second corner point coordinate, based on the second corner point coordinates, all pixel points in the target image are reversed, and the corrected target image, for example image 3 The corrected image shown in 340, the document is not distorted or tilted in the document after the corrected image 340.
[0111] X = int (x1 * im_w)
[0112] Y = int (Y1 * IM_H) formula (7)
[0113] Among them, the second corner point coordinates (X, Y), the first corner point coordinates (x1, y1), the width of the target image is IM_W, high as IM_H.
[0114] It will be appreciated that after the above S640, after the corrected target image is obtained, the corrected target image is divided according to the first corner point coordinates, and the document is valid, and then the document valid area map can be identified. Get character recognition results, the accuracy of the character recognition result is relatively high, and the speed of character recognition is also faster.
[0115] An image correction method provided by the present disclosure, the acquired target image is normalized, and the neural network model outputs the classification result of the target image and the first corner point coordinate, and then determines the target image according to the classification result. Does it include a valid document, prior to correcting the target image, prioritizing whether the target image includes a valid document, avoiding the case where the target image does not include the document to correct the target image, and the method is more flexible. Effectively reduce resource loss, if it is determined according to the classification result, the target image includes a valid document, then obtains the first corner point coordinates and corrects the target image. If the target image is not included in the target image, the correction process is directly ended. The correction method provided in this disclosure is relatively high, and the correction speed is faster, and the method is also flexible.
[0116] Figure 7 A structural diagram of an image correcting device provided for an embodiment of the present disclosure. The image correction device provided by the present disclosure may perform the processing flow provided by the image processing method embodiment, such as Figure 7 As shown, the image correction device 700 includes:
[0117] The first acquisition unit 710 is used to acquire the target image, and normalize the target image;
[0118] The processing unit 720 is used to input a target image after normalization to a pre-trained neural network model. The neural network model includes feature extraction modules, classification modules, and computing modules, feature extraction modules to extract the characteristic information of the target image. The classification module classifies the target image according to the feature information, and the calculation module generates the first corner coordinate corresponding to the target image according to the feature information;
[0119] The second acquisition unit 730 is used to determine a document in the target image according to the classification result, and acquire the first corner point coordinate;
[0120] The correction unit 740 is configured to correct the document in the target image in accordance with the first corner point coordinates to obtain a corrected target image.
[0121] Optionally, the target image is corrected in accordance with the first corner point coordinates in the correcting unit 740 to obtain the correct target image, which is specifically used for:
[0122] Depending on the size of the target image and the first corner coordinate, the second corner point coordinate is obtained;
[0123] The target image is reversed based on the second corner point coordinates to obtain the corrected target image.
[0124] Optionally, the device 700 also includes a training unit, and the training unit is specifically used for:
[0125] Get the document image sample and the identity of the document image sample;
[0126] The angular point of the document in the document image sample is labeled to get the third corner coordinate;
[0127] The conversion is performed according to the document image sample and the third corner coordinates, and the plurality of document image samples after the increase are obtained. There are corresponding third corner coordinates for each document image sample in a plurality of document image samples after the increase.
[0128] The neural network model is trained according to a plurality of document image samples and a third corner coordinate corresponding to each document image sample in multiple document image samples.
[0129] Optionally, the angular point of the document in the document image sample in the training unit is labeled, and the third corner coordinate is obtained, specifically for:
[0130] After sequential sequential sequential acquisition of multiple outlines in document image samples;
[0131] Calculate the intersection of multiple outlines to generate at least one third corner coordinate.
[0132] Alternatively, the training unit is converted according to the document image sample and the third corner coordinate, and the plurality of document image samples after the increase are obtained.
[0133] The third corner point coordinate is converted according to the preset transformation matrix to obtain the fourth corner coordinate;
[0134] Determine the parameters of the preset conversion matrix according to the third corner coordinates and the fourth corner coordinate;
[0135] The document image sample is transformed according to the preset transform matrix of the determination parameters, and a plurality of document image samples after the increase are obtained.
[0136] Optionally, the training unit determines the parameters of the preset conversion matrix according to the third corner coordinates and the fourth corner coordinate, specifically for:
[0137] Generate a first matrix according to the third corner coordinates and the fourth corner coordinate;
[0138] Generate a second matrix according to the fourth corner coordinate;
[0139] The parameters of the preset conversion matrix are determined according to the first matrix and the second matrix.
[0140] Optionally, the neural network model is trained according to a third corner coordinate corresponding to each document image sample in a plurality of document image samples and a plurality of document image samples, which is specifically used for:
[0141] According to the size of each document image sample according to multiple document image samples, the third corner coordinate corresponding to each document image sample is normalized;
[0142] Normalization of each document image sample in multiple document image samples is normalized to a preset width;
[0143] Feature extraction module extracts the characteristic information of the normalized document image sample;
[0144] The classification module classifies the document image sample according to the feature information, and the predictive identifier is obtained, and the first loss function is calculated based on the first loss function based on the predictive identifier and the identity of the document image sample.
[0145] The calculation module calculates the predicted angular coordinates of the document image sample according to the feature information, and calculates the second loss value based on the second loss function according to the predicted angular coordinates and the third corner point coordinate;
[0146] The parameters of the neural network model are updated according to the first loss value and the second loss value.
[0147] The apparatus provided by the present embodiment, the principle and the resulting technical effects and the foregoing method embodiments, as a brief description, the device embodiment is not mentioned, and the corresponding content in the foregoing method embodiment can be referred to.
[0148] The exemplary embodiment of the present disclosure also provides an electronic device comprising: at least one processor; and a memory connected to at least one processor. The memory stores a computer program that can be performed by at least one processor, and the computer program is performed when executed by at least one processor, performs the method of performing the electronic device according to the present disclosure
[0149] The exemplary embodiment of the present disclosure also provides a computer program product, comprising a computer program, wherein the computer program is performed when executed by the computer's processor executing the computer to perform a method according to the present disclosure embodiment.
[0150] refer to Figure 8 The structural block diagram of the electronic device 800 that can be applied as the server or client of the present disclosure will now be described, which can be applied to an example of hardware devices of various aspects of the present disclosure. Electronic equipment is intended to represent various forms of digital electronic computer equipment, such as, laptop, desktop computer, workbench, personal digital assistant, server, blade server, large computer, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connection and relationship, and their functions are merely examples, and are not intended to limit the implementation of the present disclosure as described herein.
[0151] Such as Figure 8 As shown, the electronic device 800 includes a calculation unit 801, which can perform various types according to a computer program stored in the read only memory (ROM) 802 or from the storage unit 808 to the random access memory (RAM) 803. Appropriate action and processing. In the RAM 803, the various programs and data required for the device 800 can also be stored. Computing units 801, ROM 802, and RAM 803 are connected to each other through bus 804. The input / output (I / O) interface 805 is also connected to the bus 804.
[0152] The plurality of components in the electronic device 800 are connected to the I / O interface 805, including: input unit 806, output unit 807, storage unit 808, and communication unit 809. The input unit 806 can be any type of device capable of inputting information to the electronic device 800, and the input unit 806 can receive the input digital or character information, and generate key signal inputs related to user settings and / or functional control of the electronic device. Output unit 807 can be any type of device that can present information, and may include, but are not limited to, a display, speaker, video / audio output terminal, a vibrator, and / or a printer. The storage unit 804 can include, but is not limited to, a disk, an optical disk. Communication unit 809 allows electronic device 800 to exchange information / data from other devices, such as computer networks, such as the Internet, and may include, but are not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and / or chips Groups, such as Bluetooth TM devices, WiFi devices, WiMAX devices, cellular communication devices, and / or the like.
[0153]Computing unit 801 can be a general and / or dedicated processing component having processing and computing power. Some examples of calculating unit 801 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various operational machine learning model algorithms calculation unit, digital signal processing (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 801 performs the various methods and processes described above. For example, in some embodiments, a text identification method or a training method of identifying a network can be implemented as a computer software program that is tangily included in a machine readable medium, such as a storage unit 808. In some embodiments, the partial or all of the computer program may be loaded and / or mounted to the electronic device 800 via the ROM 802 and / or communication unit 809. In some embodiments, the calculation unit 801 can be configured to perform a text identification method or a training method of identifying a network by other suitable means (e.g., by means of firmware).
[0154] The program code for implementing the method of the present disclosure can be written any combination of one or more programming languages. These program code can provide a processor or controller for a general purpose computer, a dedicated computer, or another programmable data processing device such that the program code is performed by the processor or the controller to perform the functions specified in the flowchart and / or block diagram / The operation is implemented. The program code can be performed entirely on the machine, partially executed on the machine, execute on the machine as a stand-alone software package and is performed on the remote machine or on the remote machine or server.
[0155] In the context of the present disclosure, the machine readable medium may be a tangible medium, which may contain or store procedures for instruction execution systems, devices or devices, or combined with instruction execution systems, devices, or devices. The machine readable medium can be a machine readable signal medium or a machine readable storage medium. Machine readable media can include, but are not limited to, electron, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or any suitable combination of the above. More specific examples of machine readable storage media include electrical connection, portable computer disc, hard disk, random access memory (RAM), read-only memory (ROM) based on one or more lines of electrical connection, read-only memory (ROM), erased-programmable read-only memory (EPROM or flash memory), fiber optic, convenient compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above.
[0156] As used herein, the term "machine readable medium" and "computer readable medium" refer to any computer program product, device, and / or device for providing machine instructions and / or data to a programmable processor. (For example, a disk, an optical disk, a memory, a programmable logic device (PLD), includes a machine readable medium that receives a machine instruction as a machine readable signal. The term "machine readable signal" refers to any signal for providing machine instructions and / or data to the programmable processor.
[0157] In order to provide interaction with the user, the systems and techniques described herein can be implemented on a computer, which is: display device for displaying information to the user (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor ); And keyboards and pointing devices (e.g., mouse or trackballs), users can provide input to the computer by this keyboard and the pointing device. Other types of devices can also be used to provide interactions with the user; for example, feedback to the user can be any form of sensing feedback (eg, visual feedback, audible feedback, or haptic feedback); and can be in any form (including Acoustic input, voice input, or haptic input) to receive input from the user.
[0158] The systems and techniques described herein can be implemented in a computing system (e.g., a data server) including the background component, or a computing system (e.g., application server) including an intermediate member component, or a computing system including a front end member (eg, With a user computer with a graphical user interface or a web browser, the user can interact with the system and technique of the system and technology described herein by this graphical user interface or the network browser), or including this background component, an intermediate member, Or in any combination of the front end member. The components of the system can be connected to each other by digital data communication (eg, a communication network) in any form or a medium. Examples of the communication network include: LAN, WAN (WAN), and the Internet.
[0159] Computer systems can include clients and servers. Client and servers are generally away from each other and are usually interacting through a communication network. The relationship between the client and the server is generated by running on the corresponding computer and having a client-server relationship with each other.
[0160] The above is only the specific embodiments of the present disclosure, making the disclosure or implementation of the present disclosure. A variety of modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Thus, the present disclosure will not be limited to these embodiments herein, but to conform to the widest range of the principles and novel features disclosed herein.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more Similar technology patents
Content-oriented load equalizing method and apparatus
InactiveCN1489069AHigh precisionLow costProgram controlMemory systemsDynamical optimizationIp address
Owner:HUAWEI TECH CO LTD
Precise positionable and compensable heavy-load mechanical arm
ActiveCN103753526AIncrease flexibilityHigh precisionProgramme-controlled manipulatorGripping headsAutomatic controlManipulator
Owner:CHONGQING JIAOTONG UNIVERSITY
Method and device for forecasting reservoir yield
Owner:PETROCHINA CO LTD
Single target tracking method based on convolution neural network
InactiveCN106709936AImprove robustnessHigh precisionImage enhancementImage analysisNetwork modelConvolution
Owner:BEIJING UNIV OF TECH
Binocular camera-based high-precision visual sense positioning map generation system and method
InactiveCN105674993AHigh precisionInstruments for road network navigationCharacteristic pointBinoculars
Owner:WUHAN KOTEI TECH CORP
Classification and recommendation of technical efficacy words
- High precision
- Good correction effect
System and method for searching for a query
ActiveUS20070011154A1High precisionWeb data indexingSemantic analysisElectronic documentDigital document
Owner:TEXTDIGGER
Parking assist method and parking assist apparatus
InactiveUS20070057816A1High precisionIndication of parksing free spacesSteering partsData conversionImage storage
Owner:AISIN AW CO LTD
Minimal sum decoding method based on grading excursion correction
InactiveCN1770641AHigh precisionGood correction effectError correction/detection using multiple parity bitsSum product algorithmTest equation
Owner:SOUTHEAST UNIV
Projection image correction method and device
PendingCN109873997AGood correction effectGood visual experiencePicture reproducers using projection devicesProjection imageCorrection method
Owner:GUIAN NEW DISTRICT XINTE ELECTRIC VEHICLE IND CO LTD
Calculation method for theoretical line loss of distribution network considering small power supply
InactiveCN109188204AHigh precisionGood correction effectFault locationAc network circuit arrangementsVoltage amplitudeLoop analysis
Owner:UNIV OF JINAN
Prone type cervical correcting device
InactiveCN101637417AGood correction effectImprove applicabilityChiropractic devicesFractureTreatment effectForehead
Owner:欧阳刚
Building steel tube correcting device
Owner:MESKA GRP CONSTR