A model training method, device, apparatus and medium
By adjusting the neural network parameters of the feature recognition model and the character recognition model, and using multi-dimensional labeled data for training, the problem of poor training effect of existing models was solved, and more accurate text recognition was achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG DAHUA TECH CO LTD
- Filing Date
- 2022-12-30
- Publication Date
- 2026-06-16
AI Technical Summary
Existing text recognition models suffer from poor training performance and are unable to effectively recognize text content in images due to the limited dimensionality of the label data.
By inputting the feature tensors of the sample set into the feature recognition model, the first auxiliary model, the second auxiliary model, and the character recognition model, and using the output information of these models to adjust the neural network parameters of the feature recognition model and the character recognition model, the dimension of the label data is enriched, and the model training effect is improved.
The trained text recognition model can more accurately identify the text content in images, thus improving the model's recognition ability.
Smart Images

Figure CN115862029B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, specifically to a model training method, apparatus, device, and medium. Background Technology
[0002] Current techniques for training text recognition models primarily rely on supervised training with large amounts of labeled data. However, labeled data has limited dimensionality, resulting in text recognition models that cannot accurately capture the text content within images. Therefore, existing methods for training text recognition models exhibit poor training performance, and the resulting models fail to adequately meet the demands of text recognition. Summary of the Invention
[0003] This application provides a model training method, apparatus, device, and medium to improve the model training effect, so that the trained text recognition model can well meet the needs of text recognition.
[0004] In a first aspect, embodiments of this application provide a model training method, which can be executed by a computer device. The method includes: inputting a sample set into a feature recognition model to obtain a feature tensor for each image in at least one image; wherein the sample set includes the at least one image, first character direction information, first character length information, and first character content information for each image; inputting the feature tensor of each image into a first auxiliary model, a second auxiliary model, and a character recognition model; wherein the first auxiliary model is used to recognize the character direction information of the image based on the feature tensor of the image, the second auxiliary model is used to recognize the character length information of the image based on the feature tensor of the image, and the character recognition model is used to recognize the character content information of the image based on the feature tensor of the image; adjusting the feature recognition model and the character recognition model according to the second character direction information output by the first auxiliary model, the second character length information output by the second auxiliary model, and the second character content information output by the character recognition model to obtain a trained feature recognition model and a trained character recognition model.
[0005] In this scheme, the computer device inputs feature vectors into the first auxiliary model, the second auxiliary model, and the character recognition model. Based on information from the sample set and the second character direction, length, and content information, the feature recognition model and the character recognition model are adjusted, ultimately determining the trained feature recognition model and the trained character recognition model. Since the trained feature recognition model and the trained character recognition model are continuously adjusted based on the direction, length, and content dimensions of the images in the sample set—meaning the label data has richer dimensions—the trained feature recognition model and the trained character recognition model are more accurate. Furthermore, this scheme uses the first and second auxiliary models to continuously adjust the feature recognition model and the character recognition model, enabling the trained feature recognition model to more realistically reflect the information of the images in the sample set, and the second character content information output by the trained character recognition model to more closely approximate the first character content information.
[0006] Optionally, adjusting the feature recognition model and the character recognition model based on the second character direction information output by the first auxiliary model, the second character length information output by the second auxiliary model, and the second character content information output by the character recognition model includes: if the second character direction information output by the first auxiliary model does not satisfy a first condition with the first character direction information, determining a first error based on the second character direction information output by the first auxiliary model and the first character direction information; and adjusting the neural network parameters of the feature recognition model based on the first error.
[0007] In this method, the computer device determines the first error based on the second character direction information and the first character direction information. Since the first error is related to the character direction information of the images in the sample set, adjusting the neural network parameters of the feature recognition model according to the first error can make the feature tensor output by the adjusted feature recognition model more and more realistically reflect the information of the images in the sample set. The model training effect is good, ensuring the rationality of the scheme.
[0008] Optionally, the method further includes: adjusting the neural network parameters of the first auxiliary model based on the first error.
[0009] In this way, the computer device can also adjust the neural network parameters of the first auxiliary model according to the first error, so that the second character direction information output by the adjusted first auxiliary model can be closer to the first character direction information, resulting in good model training effect and ensuring the rationality of the solution.
[0010] Optionally, adjusting the feature recognition model and the character recognition model based on the second character direction information output by the first auxiliary model, the second character length information output by the second auxiliary model, and the second character content information output by the character recognition model includes: if the second character length information output by the second auxiliary model does not satisfy the second condition with the first character length information, determining a second error based on the second character length information output by the second auxiliary model and the first character length information; and adjusting the neural network parameters of the feature recognition model based on the second error.
[0011] In this method, the computer device determines the second error based on the second character length information and the first character length information. Since the second error is related to the character length information of the images in the sample set, adjusting the neural network parameters of the feature recognition model according to the second error can make the feature tensor output by the adjusted feature recognition model more and more realistically reflect the information of the images in the sample set. The model training effect is good, ensuring the rationality of the scheme.
[0012] Optionally, the method further includes: adjusting the neural network parameters of the second auxiliary model based on the second error.
[0013] In this way, the computer device can also adjust the neural network parameters of the second auxiliary model according to the second error, so that the second character length information output by the adjusted second auxiliary model can be closer to the first character length information, resulting in good model training effect and ensuring the rationality of the solution.
[0014] Optionally, adjusting the feature recognition model and the character recognition model based on the second character direction information output by the first auxiliary model, the second character length information output by the second auxiliary model, and the second character content information output by the character recognition model includes: if the second character content information output by the character recognition model does not satisfy a third condition with the first character content information, determining a third error based on the second character content information output by the character recognition model and the first character content information; and adjusting the neural network parameters of at least one of the feature recognition model and the character recognition model based on the third error.
[0015] In this method, the computer device determines the third error based on the content information of the second character and the first character. Since the third error is related to the character content information of the images in the sample set, adjusting the neural network parameters of the feature recognition model according to the third error allows the feature tensor output by the adjusted model to more accurately reflect the information of the images in the sample set. This results in good model training performance and ensures the rationality of the solution. The computer device can also adjust the neural network parameters of the character recognition model based on the third error, making the second character content information output by the adjusted model closer to the first character content information. This also results in good model training performance and ensures the rationality of the solution.
[0016] Optionally, the method further includes: inputting the image to be recognized into the trained feature recognition model to obtain the feature tensor of the image to be recognized; and inputting the feature tensor of the image to be recognized into the trained character recognition model to obtain the character content information of the image to be recognized.
[0017] In this method, the computer device inputs the image to be recognized into a trained feature recognition model. The feature tensor output by the trained feature recognition model can reflect the information of the image to be recognized more realistically. The feature tensor of the image to be recognized is input into a trained character recognition model. The character content information output by the trained character recognition model can reflect the character content information of the image to be recognized more accurately, thus improving the completeness of the solution.
[0018] Secondly, embodiments of this application provide a character recognition method, which includes: inputting a trained feature recognition model into an image to be recognized to obtain a feature tensor of the image to be recognized; inputting the feature tensor of the image to be recognized into a trained character recognition model to obtain character content information of the image to be recognized; wherein the trained feature recognition model and the trained character recognition model are trained by the model training method described in the first aspect or any optional embodiment of the first aspect.
[0019] Thirdly, this application provides a model training apparatus, which includes modules / units / technical means for performing the methods described in the first aspect or any optional embodiment of the first aspect.
[0020] For example, the device may include:
[0021] The first input module is used to input a sample set into the feature recognition model; wherein, the sample set includes at least one image, first character direction information, first character length information, and first character content information for each image;
[0022] The processing module is used to obtain the feature tensor of each image in at least one image;
[0023] The first input module is further configured to: input the feature tensor of each image into the first auxiliary model, the second auxiliary model, and the character recognition model; wherein, the first auxiliary model is configured to recognize the character direction information of the image based on the feature tensor of the image, the second auxiliary model is configured to recognize the character length information of the image based on the feature tensor of the image, and the character recognition model is configured to recognize the character content information of the image based on the feature tensor of the image.
[0024] The processing module is further configured to: adjust the feature recognition model and the character recognition model according to the second character direction information output by the first auxiliary model, the second character length information output by the second auxiliary model, and the second character content information output by the character recognition model, so as to obtain the trained feature recognition model and the trained character recognition model.
[0025] Optionally, if the second character direction information output by the first auxiliary model does not satisfy the first condition with the first character direction information, the processing module is used to determine a first error based on the second character direction information output by the first auxiliary model and the first character direction information; and adjust the neural network parameters of the feature recognition model based on the first error.
[0026] Optionally, the processing module is further configured to: adjust the neural network parameters of the first auxiliary model based on the first error.
[0027] Optionally, if the second character length information output by the second auxiliary model does not satisfy the second condition with the first character length information, the processing module is used to determine a second error based on the second character length information output by the second auxiliary model and the first character length information; and adjust the neural network parameters of the feature recognition model based on the second error.
[0028] Optionally, the processing module is also used to: adjust the neural network parameters of the second auxiliary model according to the second error.
[0029] Optionally, if the second character content information output by the character recognition model does not satisfy the third condition with the first character content information, the processing module is used to determine a third error based on the second character content information output by the character recognition model and the first character content information; and adjust the neural network parameters of the feature recognition model based on the third error.
[0030] Optionally, the processing module is also used to: adjust the neural network parameters of the character recognition model based on the third error.
[0031] Optionally, the first input module is further configured to input the image to be recognized into the trained feature recognition model, and the processing module is further configured to obtain the feature tensor of the image to be recognized; the first input module is further configured to input the feature tensor of the image to be recognized into the trained character recognition model, and the processing module is further configured to obtain the character content information of the image to be recognized.
[0032] Fourthly, this application provides a character recognition device, which includes modules / units / technical means for performing the methods described in the second aspect or any optional embodiment of the second aspect.
[0033] For example, the device may include:
[0034] The second input module is used to input the image to be recognized into the trained feature recognition model;
[0035] The recognition module is used to obtain the feature tensor of the image to be recognized;
[0036] The second input module is also used to input the feature tensor of the image to be recognized into the trained character recognition model;
[0037] The recognition module is further configured to obtain character content information of the image to be recognized; wherein the trained feature recognition model and the trained character recognition model are trained by the model training method described in the first aspect or any optional implementation of the first aspect.
[0038] Fifthly, this application provides an electronic device, including: at least one processor; and a memory and a communication interface communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the at least one processor, by executing the instructions stored in the memory, causes the electronic device to perform the method described in the first aspect or any optional implementation of the first aspect or the second aspect or any optional implementation of the second aspect through the communication interface.
[0039] In a sixth aspect, this application provides a computer-readable storage medium for storing instructions that, when executed, cause the method described in the first aspect or any optional implementation thereof, or in the second aspect or any optional implementation thereof, to be implemented. Attached Figure Description
[0040] Figure 1 A flowchart of a model training method is provided in this application embodiment;
[0041] Figure 2A schematic diagram of a ResNet15 provided for an embodiment of this application;
[0042] Figure 3 A schematic diagram of a sample set provided for an embodiment of this application;
[0043] Figure 4 A schematic diagram illustrating a model training method provided in an embodiment of this application;
[0044] Figure 5 A schematic diagram of ResNet6 provided for an embodiment of this application;
[0045] Figure 6 A structural diagram of a model training device provided in an embodiment of this application;
[0046] Figure 7 A structural diagram of a character recognition device provided in an embodiment of this application;
[0047] Figure 8 This is a structural diagram of an electronic device provided in an embodiment of this application. Detailed Implementation
[0048] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application. Unless otherwise specified, the embodiments and features in the embodiments of this application can be arbitrarily combined with each other. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than that shown here.
[0049] The terms "first" and "second" in the specification, claims, and accompanying drawings of this application are used to distinguish different objects, not to describe a specific order. Furthermore, the term "comprising" and any variations thereof are intended to cover non-exclusive protection. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or devices. The term "multiple" in this application can mean at least two, for example, two, three, or more, and the embodiments of this application do not impose limitations.
[0050] The term "and / or" in the embodiments of this application is merely a description of the association relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Additionally, the character " / " in this document generally indicates that the preceding and following related objects have an "or" relationship.
[0051] To facilitate understanding of the solutions in the embodiments of this application, the possible application scenarios of the embodiments of this application will be introduced below.
[0052] In text recognition scenarios, text recognition models are typically used to identify the characters in an image. However, these models require extensive training with large amounts of data before they can be used for image recognition.
[0053] In practical applications, due to the single dimension of the label in the training data for training text recognition models, the training methods for text recognition models have poor training effects, and the trained text recognition models cannot well meet the needs of text recognition.
[0054] Therefore, the technical solution of the embodiments of this application is provided to improve the model training effect, so that the trained text recognition model can well meet the needs of text recognition.
[0055] Reference Figure 1 The flowchart below provides a model training method according to an embodiment of this application.
[0056] This method can be executed by computer devices, such as laptops, desktop computers, and servers, and can also be applied to various devices with computing capabilities, as well as various chips with computing capabilities. The above devices are merely illustrative examples, and this application does not impose any limitations.
[0057] The following example illustrates how this method is executed by a computer device. The method includes:
[0058] S101: Input the sample set into the feature recognition model to obtain the feature tensor of each image in at least one image.
[0059] The sample set includes at least one image, and for each image, the orientation information of the first character, the length information of the first character, and the content information of the first character. The orientation information, length information, and content information of the first character are all known in advance.
[0060] For example, the feature recognition model can be a convolutional neural network 15 (Deep residual network 15, ResNet15). A computer device inputs a sample set into ResNet15, and ResNet15 generates a feature tensor for each image. See also... Figure 2ResNet15 is mainly composed of residual units, 3*3 convolutions, and pooling layers. The residual units add direct cross-layer connections on the basis of two consecutive 3*3 convolutions, which can increase gradient flow, solve the problems of gradient vanishing and gradient explosion in ResNet15, and accelerate the convergence speed of ResNet15.
[0061] In one possible implementation, when the computer device first inputs a sample set into the feature recognition model, it obtains a feature tensor for each image in at least one image. The feature tensor of an image is represented in the form of a matrix, and the numbers in the matrix are random values.
[0062] Optionally, before inputting the sample set into the feature recognition model, the computer device can use a preset script to rotate multiple different initial sample images to obtain the sample set.
[0063] For a specific example, see Figure 3 The initial sample image consists of one image, which is parallel to the horizontal direction, meaning the character orientation is 0°. A computer device can use a pre-set script to rotate the initial sample image counter-clockwise, resulting in four images with four character orientations: 0°, 90°, 180°, and 270°. The sample set then includes four images, the orientation information of the first character in each image, the length information of the first character in each image, and the content information of the first character in each image.
[0064] It is understandable that the above is only based on one initial sample image and four character directions as an example. In practice, it can be more than this, and this application does not impose any restrictions.
[0065] S102: Input the feature tensor of each image into the first auxiliary model, the second auxiliary model, and the character recognition model.
[0066] The first auxiliary model is used to identify the character direction information of an image based on the image's feature tensor. The second auxiliary model is used to identify the character length information of an image based on the image's feature tensor. The character recognition model is used to identify the character content information of an image based on the image's feature tensor.
[0067] After the computer device inputs the feature tensor of each image into the first auxiliary model, the second auxiliary model, and the character recognition model, the first auxiliary model outputs the second character direction information, the second auxiliary model outputs the second character length information, and the character recognition model outputs the second character content information.
[0068] See Figure 4The sample set includes 4 images. The orientation information of the first character in the 4 images is known in advance as 0°, 90°, 180°, and 270°, respectively. The length information of the first character in the 4 images is 7, and the content information of the first character in the 4 images is AM10:00. The first auxiliary model can output the orientation information of the second character as 0°, 90°, 180°, and 270°, the second auxiliary model can output the length information of the second character as 7, and the character recognition model can output the content information of the second character as AM10:00.
[0069] In one possible implementation, the first auxiliary model is based on a fully connected layer (FC) and outputs the second character direction information.
[0070] In one possible implementation, the second auxiliary model outputs the second character length information based on a Deep Residual Network 6 (ResNet6) and fully connected layers. See also Figure 5 This is a schematic diagram of a ResNet6 implementation provided in this application. The second character length information can be in one-hot encoded form.
[0071] In one possible implementation, the character recognition model is based on ResNet6 and a bidirectional long short-term memory (BLSTM) network to output the second character content information.
[0072] S103: Adjust the feature recognition model and the character recognition model according to the second character direction information output by the first auxiliary model, the second character length information output by the second auxiliary model, and the second character content information output by the character recognition model to obtain the trained feature recognition model and the trained character recognition model.
[0073] In one possible implementation, if the second character direction information output by the first auxiliary model does not satisfy the first condition, the computer device determines the first error based on the second character direction information output by the first auxiliary model and the first character direction information; then, the neural network parameters of the feature recognition model are adjusted based on the first error.
[0074] In a specific example, the sample set includes four images. The four images have identical content but different first character orientation information, designated A1, A2, A3, and A4. The first auxiliary model outputs the second character orientation information (B1, B2, B3, and B4) of the four images based on their feature tensors. The first condition is that at least 75% of the images in the sample set have the same first and second character orientation information; that is, at least three images in the sample set have the same first and second character orientation information. If A1 = B1, A2 = B2, A3 ≠ B3, and A4 ≠ B4 at this time, the first character direction information A1, A2, A3, A4 and the second character direction information B1, B2, B3, B4 do not satisfy the first condition. The computer device can input the first loss function (e.g., the cross-entropy loss function) based on the first character direction information A1, A2, A3, A4 and the second character direction information B1, B2, B3, B4 to obtain the first error. Then, the computer device can adjust the neural network parameters of the feature recognition model based on the first error.
[0075] Understandably, the first condition and the number of images in the sample set can be specified according to actual needs, and this application does not impose any restrictions.
[0076] In this method, the computer device determines the first error based on the second character direction information and the first character direction information. Since the first error is related to the character direction information of the images in the sample set, adjusting the neural network parameters of the feature recognition model according to the first error can make the feature tensor output by the adjusted feature recognition model more and more realistically reflect the information of the images in the sample set. The model training effect is good, ensuring the rationality of the scheme.
[0077] Optionally, the computer device may also adjust the neural network parameters of the first auxiliary model based on the first error.
[0078] In this way, the computer device can also adjust the neural network parameters of the first auxiliary model according to the first error, so that the second character direction information output by the adjusted first auxiliary model can be closer to the first character direction information, resulting in good model training effect and ensuring the rationality of the solution.
[0079] In one possible implementation, if the second character length information output by the second auxiliary model does not meet the first condition with the first character length information, the computer device determines the second error based on the second character length information output by the first auxiliary model and the first character length information; then, it adjusts the neural network parameters of the feature recognition model based on the first error.
[0080] In a specific example, the sample set includes four images. The four images have identical content, but the orientation information of the first character differs, and the length information of the first character is C1, C2, C3, and C4, respectively. The second auxiliary model outputs the length information of the second character of the four images, D1, D2, D3, and D4, based on the feature tensors of the four images. The second condition is that at least 75% of the images in the sample set have the same first and second character length information, meaning that at least three images in the sample set have the same first and second character length information. If C1≠D1, C2=D2, C3≠D3, and C4≠D4 at this time, the first character length information C1, C2, C3, C4 and the second character length information D1, D2, D3, D4 do not satisfy the second condition. The computer device can input a second loss function (e.g., cross-entropy loss function) based on the first character length information C1, C2, C3, C4 and the second character length information D1, D2, D3, D4 to obtain the second error. Then, the computer device can adjust the neural network parameters of the feature recognition model based on the second error.
[0081] Understandably, the second condition and the number of images in the sample set can be specified according to actual needs, and this application does not impose any restrictions.
[0082] In this method, the computer device determines the second error based on the second character length information and the first character length information. Since the second error is related to the character length information of the images in the sample set, adjusting the neural network parameters of the feature recognition model according to the second error can make the feature tensor output by the adjusted feature recognition model more and more realistically reflect the information of the images in the sample set. The model training effect is good, ensuring the rationality of the scheme.
[0083] Optionally, the computer device can also adjust the neural network parameters of the second auxiliary model based on the second error.
[0084] In this way, the computer device can also adjust the neural network parameters of the second auxiliary model according to the second error, so that the second character length information output by the adjusted second auxiliary model can be closer to the first character length information, resulting in good model training effect and ensuring the rationality of the solution.
[0085] In one possible implementation, if the second character content information output by the character recognition model does not satisfy the third condition with the first character content information, the computer device determines the third error based on the second character content information output by the character recognition model and the first character content information; then, the neural network parameters of the feature recognition model are adjusted based on the third error.
[0086] In a specific example, the sample set includes four images. The four images have identical content, but their first character orientation information differs. The first character content information for the four images is E1, E2, E3, and E4, respectively. The character recognition model outputs the second character content information F1, F2, F3, and F4 for the four images based on their feature tensors. The third condition is that at least 50% of the images in the sample set have the same first and second character content information; that is, at least two images in the sample set have the same first and second character content information. If E1 = F1, E2 ≠ F2, E3 ≠ F3, and E4 ≠ F4 at this time, the first character content information E1, E2, E3, E4 and the second character content information F1, F2, F3, F4 do not satisfy the third condition. The computer device can input the third loss function (e.g., Connectionist Temporal Classification (CTC)) based on the first character content information E1, E2, E3, E4 and the second character content information F1, F2, F3, F4 to obtain the third error. Then, the computer device can adjust the neural network parameters of the feature recognition model based on the third error.
[0087] Understandably, the third condition and the number of images in the sample set can be specified according to actual needs, and this application does not impose any restrictions.
[0088] In this method, the computer device determines the third error based on the second character content information and the first character content information. Since the third error is related to the character content information of the images in the sample set, adjusting the neural network parameters of the feature recognition model according to the third error can make the feature tensor output by the adjusted feature recognition model more and more realistically reflect the information of the images in the sample set. The model training effect is good, ensuring the rationality of the scheme.
[0089] Optionally, the computer device can also adjust the neural network parameters of the character recognition model based on the third error.
[0090] In this way, the computer device can also adjust the neural network parameters of the character recognition model based on the third error, so that the second character content information output by the adjusted character recognition model can be closer to the first character content information, resulting in good model training effect and ensuring the rationality of the solution.
[0091] It is understood that the above-described embodiments can be implemented individually or in combination, and this application does not impose any restrictions.
[0092] In the above scheme, the computer device can adjust the neural network parameters of the feature recognition model based on at least one of the first error, second error, and third error. Then, the computer device repeats all the steps above (i.e., repeatedly obtaining the feature tensor of each image based on the adjusted feature recognition model, inputting the feature tensor of each image into the first auxiliary model, the second auxiliary model, and the character recognition model, and adjusting the neural network parameters of the feature recognition model based on at least one of the first error, second error, and third error; during this process, the feature tensor output by the feature recognition model increasingly reflects the image information in the sample set), until the first character direction information and the second character direction information satisfy the first condition, the first character length information and the second character length information satisfy the second condition, and the first character content information and the second character content information satisfy the third condition. At this point, training of the feature recognition model, the first auxiliary model, the second auxiliary model, and the character recognition model is stopped, resulting in a trained feature recognition model and a trained character recognition model. Additionally, a trained first auxiliary model and a trained second auxiliary model can also be obtained.
[0093] Optionally, if at least one of the following conditions is not met: the first character direction information and the second character direction information, the first character length information and the second character length information, or the first character content information and the second character content information, the first auxiliary model, the second auxiliary model, and the character recognition model will not stop adjusting their respective models.
[0094] In a specific example, if the first character direction information and the second character direction information satisfy the first condition, but the first character length information and the second character length information do not satisfy the second condition, the computer device will adjust the first auxiliary model based on the first error obtained from the first character direction information and the second character direction information; and adjust the second auxiliary model based on the second error obtained from the first character length information and the second character length information.
[0095] Optionally, the computer device can perform text recognition on the image to be recognized based on a trained feature recognition model and a trained character recognition model. Specifically, the computer device first inputs the image to be recognized into the trained feature recognition model to obtain the feature tensor of the image; then, it inputs the feature tensor of the image into the trained character recognition model to obtain the character content information of the image.
[0096] In this method, the computer device inputs the image to be recognized into a trained feature recognition model. The feature tensor output by the trained feature recognition model can reflect the information of the image to be recognized more realistically. The feature tensor of the image to be recognized is input into a trained character recognition model. The character content information output by the trained character recognition model can reflect the character content information of the image to be recognized more accurately, thus improving the completeness of the solution.
[0097] In this scheme, before the computer device determines the trained feature recognition model and the trained character recognition model, it inputs feature vectors into the first auxiliary model, the second auxiliary model, and the character recognition model. Based on information from the sample set and the second character direction, length, and content information, the feature recognition model and the character recognition model are modified. This process is repeated until the trained feature recognition model and the trained character recognition model are determined. Because the trained feature recognition model and the trained character recognition model are continuously adjusted based on the direction, length, and content dimensions of the images in the sample set—meaning the label data has richer dimensions—the trained feature recognition model and the trained character recognition model are more accurate. Furthermore, this scheme uses the first and second auxiliary models to continuously adjust the feature recognition model and the trained character recognition model, ensuring that the trained feature recognition model can more realistically reflect the information of the images in the sample set, and that the second character content information output by the trained character recognition model is closer to the first character content information.
[0098] The methods provided in the embodiments of this application have been described above. The apparatus provided in the embodiments of this application will be described below.
[0099] See Figure 6 This is a structural diagram of a model training device provided in an embodiment of this application. The device includes modules / units / technical means for executing the methods performed by computer devices in the above-described method embodiments.
[0100] For example, the device 600 includes:
[0101] The first input module 601 is used to input a sample set into the feature recognition model; wherein, the sample set includes the at least one image, the first character direction information, the first character length information, and the first character content information of each image;
[0102] Processing module 602 is used to obtain the feature tensor of each image in at least one image;
[0103] The first input module 601 is further configured to: input the feature tensor of each image into the first auxiliary model, the second auxiliary model, and the character recognition model; wherein, the first auxiliary model is configured to recognize the character direction information of the image based on the feature tensor of the image, the second auxiliary model is configured to recognize the character length information of the image based on the feature tensor of the image, and the character recognition model is configured to recognize the character content information of the image based on the feature tensor of the image.
[0104] The processing module 602 is further configured to: adjust the feature recognition model and the character recognition model according to the second character direction information output by the first auxiliary model, the second character length information output by the second auxiliary model, and the second character content information output by the character recognition model, so as to obtain the trained feature recognition model and the trained character recognition model.
[0105] See Figure 7 This is a structural diagram of a character recognition device provided in an embodiment of this application. The device includes modules / units / technical means for performing the methods executed by computer devices in the above-described method embodiments.
[0106] For example, the device 700 includes:
[0107] The second input module 701 is used to input the image to be recognized into the trained feature recognition model;
[0108] The recognition module 702 is used to obtain the feature tensor of the image to be recognized;
[0109] The second input module 701 is also used to input the feature tensor of the image to be recognized into the trained character recognition model;
[0110] The recognition module 702 is also used to obtain the character content information of the image to be recognized; wherein the trained feature recognition model and the trained character recognition model are trained by the model training method described above.
[0111] As one possible product form of the aforementioned device, see [link to product description]. Figure 8 This application also provides an electronic device 800, comprising:
[0112] At least one processor 801; and a communication interface 803 communicatively connected to the at least one processor 801; the at least one processor 801 causes the electronic device 800 to execute the method steps performed by any device in the above method embodiments through the communication interface 803 by executing instructions stored in the memory 802.
[0113] Optionally, the memory 802 is located outside the electronic device 800.
[0114] Optionally, the electronic device 800 includes the memory 802, which is connected to the at least one processor 801, and stores instructions executable by the at least one processor 801. (Appendix) Figure 8 The dashed line indicates that memory 802 is optional for electronic device 800.
[0115] The processor 801 and the memory 802 can be coupled through an interface circuit or integrated together; no restriction is imposed here.
[0116] This application embodiment does not limit the specific connection medium between the processor 801, memory 802, and communication interface 803. This application embodiment... Figure 8 The processor 801, memory 802, and communication interface 803 are connected via a bus 804. Figure 8 The connections between other components are shown in bold and are for illustrative purposes only, not as limiting information. The bus can be divided into address bus, data bus, control bus, etc. For ease of illustration, Figure 8 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.
[0117] It should be understood that the processor mentioned in the embodiments of this application can be implemented in hardware or software. When implemented in hardware, the processor can be a logic circuit, integrated circuit, etc. When implemented in software, the processor can be a general-purpose processor, implemented by reading software code stored in memory.
[0118] For example, the processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor.
[0119] It should be understood that the memory mentioned in the embodiments of this application can be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate Synchronous DRAM (DDR SDRAM), Enhanced Synchronous DRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct RAM (DR RAM).
[0120] It should be noted that when the processor is a general-purpose processor, DSP, ASIC, FPGA, or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component, the memory (storage module) can be integrated into the processor.
[0121] It should be noted that the memories described herein are intended to include, but are not limited to, these and any other suitable types of memories.
[0122] As another possible product form, this application embodiment also provides a computer-readable storage medium for storing instructions that, when executed, cause a computer to perform the method steps performed by the first device in the above method example.
[0123] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0124] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to this application. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0125] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0126] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0127] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. A model training method, characterized in that, include: Input a sample set into the feature recognition model to obtain the feature tensor of each image in at least one image; wherein, the sample set includes the at least one image, the first character direction information, the first character length information, and the first character content information of each image; The feature tensor of each image is input into the first auxiliary model, the second auxiliary model, and the character recognition model; wherein, the first auxiliary model is used to recognize the character direction information of the image based on the image's feature tensor, the second auxiliary model is used to recognize the character length information of the image based on the image's feature tensor, and the character recognition model is used to recognize the character content information of the image based on the image's feature tensor. Adjusting the feature recognition model and the character recognition model based on the second character direction information output by the first auxiliary model, the second character length information output by the second auxiliary model, and the second character content information output by the character recognition model, to obtain a trained feature recognition model and a trained character recognition model; including: adjusting the feature recognition model based on the second character direction information, adjusting the feature recognition model based on the second character length information, and adjusting at least one of the feature recognition model and the character recognition model based on the second character content information.
2. The method as described in claim 1, characterized in that, The step of adjusting the feature recognition model and the character recognition model based on the second character direction information output by the first auxiliary model, the second character length information output by the second auxiliary model, and the second character content information output by the character recognition model includes: If the second character direction information output by the first auxiliary model does not satisfy the first condition with the first character direction information, the first error is determined based on the second character direction information output by the first auxiliary model and the first character direction information. The neural network parameters of the feature recognition model are adjusted based on the first error.
3. The method as described in claim 2, characterized in that, Also includes: The neural network parameters of the first auxiliary model are adjusted based on the first error.
4. The method as described in claim 1, characterized in that, The step of adjusting the feature recognition model and the character recognition model based on the second character direction information output by the first auxiliary model, the second character length information output by the second auxiliary model, and the second character content information output by the character recognition model includes: If the second character length information output by the second auxiliary model does not meet the second condition with the first character length information, the second error is determined based on the second character length information output by the second auxiliary model and the first character length information. The neural network parameters of the feature recognition model are adjusted based on the second error.
5. The method as described in claim 4, characterized in that, Also includes: The neural network parameters of the second auxiliary model are adjusted based on the second error.
6. The method as described in claim 1, characterized in that, The step of adjusting the feature recognition model and the character recognition model based on the second character direction information output by the first auxiliary model, the second character length information output by the second auxiliary model, and the second character content information output by the character recognition model includes: If the second character content information output by the character recognition model does not satisfy the third condition with the first character content information, the third error is determined based on the second character content information output by the character recognition model and the first character content information. The neural network parameters of at least one of the feature recognition model and the character recognition model are adjusted based on the third error.
7. The method according to any one of claims 1-6, characterized in that, Also includes: The image to be identified is input into the trained feature recognition model to obtain the feature tensor of the image to be identified; The feature tensor of the image to be recognized is input into the trained character recognition model to obtain the character content information of the image to be recognized.
8. A character recognition method, characterized in that, include: The image to be identified is input into the trained feature recognition model to obtain the feature tensor of the image to be identified; The feature tensor of the image to be recognized is input into the trained character recognition model to obtain the character content information of the image to be recognized; wherein, The trained feature recognition model and the trained character recognition model are obtained by training the model training method according to any one of claims 1-7.
9. An electronic device, characterized in that, include: At least one processor; And a memory and a communication interface that are communicatively connected to the at least one processor; The memory stores instructions executable by the at least one processor, which, by executing the instructions stored in the memory, causes the electronic device to perform the method as described in any one of claims 1-7 or 8 through the communication interface.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1-7 or 8.