Feature extraction model training method, feature extraction method, device and equipment
By training a handwritten character feature extraction model in an unsupervised manner, removing target strokes and generating reconstructed strokes, the problem of poor handwritten character feature extraction performance in existing technologies is solved, and rich, multi-layered feature extraction effects are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING YOUZHUJU NETWORK TECH CO LTD
- Filing Date
- 2021-07-28
- Publication Date
- 2026-06-16
AI Technical Summary
Existing handwritten commentary algorithms require a large amount of manually labeled commentary data when training feature extraction models, which makes data acquisition difficult, costly, and of poor quality, resulting in poor performance in extracting handwritten features.
The feature extraction model is trained in an unsupervised manner to obtain handwritten character samples and remove target strokes. The encoding network is used to extract features of the retained strokes and target strokes, and the decoding network is used to generate reconstructed strokes for model training. This achieves self-reconstruction and mutual reconstruction of strokes, forming rich and multi-layered stroke features.
Without the need for manual annotation and commentary on data, it can effectively enhance the richness and robustness of the features extracted by the model, extracting multi-layered information including the strokes themselves, other strokes, and the overall structure of the character, thus improving the feature extraction effect of handwritten characters.
Smart Images

Figure CN115690437B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of artificial intelligence technology, and in particular to a training method, feature extraction method, apparatus, device and medium for a feature extraction model. Background Technology
[0002] Practicing calligraphy is a compulsory course for every school-aged child, and also a hobby for many adults. Writing is not only a skill needed in daily life, but it also rises to the level of aesthetic art. However, how to objectively and accurately evaluate whether handwriting is standardized and beautiful remains a difficult problem to address today.
[0003] Traditional methods of handwritten handwriting critique rely on professional calligraphy teachers, which suffer from a shortage of teachers and the inherent subjectivity of human evaluation. With the development of artificial intelligence, algorithms have emerged capable of automatically critiquing handwritten handwriting, evaluating its neatness and aesthetics by extracting handwriting features. Understandably, feature extraction is one of the most crucial aspects of critique, directly impacting the accuracy of the results. However, existing handwritten handwriting critique algorithms require labeled critique data for training the network model capable of extracting handwritten features. This supervised training method suffers from difficulties in obtaining critique data, the data often being biased and subjective, and the high labor costs associated with data labeling. These limitations restrict the quantity and quality of training samples, resulting in poor performance of the trained network model in extracting handwritten features. Summary of the Invention
[0004] To solve the above-mentioned technical problems, or at least partially solve them, this disclosure provides a training method, feature extraction method, apparatus, device, and medium for a feature extraction model.
[0005] This disclosure provides a method for training a feature extraction model, the method comprising: acquiring handwritten character samples; removing at least one target stroke from the handwritten character samples to obtain retained strokes in the handwritten character samples; acquiring stroke features of each retained stroke and stroke features of each target stroke through an encoding network in a neural network model to be trained; generating reconstructed strokes of each retained stroke and reconstructed strokes of each target stroke based on the stroke features through a decoding network in the neural network model; training the neural network model according to the reconstructed strokes generated by the decoding network; and using the encoding network in the neural network model at the end of training as a feature extraction model.
[0006] Optionally, the step of obtaining the stroke features of each retained stroke and the stroke features of each target stroke through the encoding network in the neural network model to be trained includes: obtaining first label information corresponding to each retained stroke; the first label information includes the stroke shape information of the retained stroke and the standard position information of the retained stroke in the handwritten character sample; obtaining second label information corresponding to each target stroke; the second label information includes the standard position information of the target stroke in the handwritten character sample; and extracting the stroke features of each retained stroke and the stroke features of each target stroke based on the first label information and the second label information through the encoding network.
[0007] Optionally, the step of extracting the stroke features of each of the retained strokes and the stroke features of each of the target strokes based on the first marker information and the second marker information through the encoding network includes: extracting the stroke features of the corresponding retained strokes based on each of the first marker information through the encoding network; and extracting the stroke features of each of the target strokes based on all of the first marker information and the second marker information through the encoding network.
[0008] Optionally, the step of generating the reconstructed strokes of each retained stroke and the target stroke based on the stroke features through the decoding network in the neural network model includes: independently parsing each stroke feature and jointly parsing multiple stroke features through the decoding network in the neural network model to obtain independent parsing results and joint parsing results; and generating the reconstructed strokes of each retained stroke and the target stroke based on the independent parsing results and the joint parsing results.
[0009] Optionally, the step of training the neural network model based on the reconstructed strokes generated by the decoding network includes: performing stroke self-training on the neural network model based on the retained strokes and the reconstructed strokes of the retained strokes generated by the decoding network; and forming a reconstructed whole character by combining the reconstructed strokes of the retained strokes generated by the decoding network and the reconstructed strokes of the target strokes, and performing whole character reconstruction training on the neural network model based on the handwritten character sample and the reconstructed whole character.
[0010] Optionally, the neural network model further includes a feature fusion network. The step of training the neural network model based on the reconstructed strokes generated by the decoding network further includes: inputting the stroke features obtained by the encoding network into the feature fusion network, obtaining the whole character features corresponding to the handwritten character sample through the feature fusion network based on the stroke features; and performing whole character recognition training on the neural network model after the stroke self-construction training and the whole character reconstruction training based on the whole character features and the handwritten character sample.
[0011] Optionally, the step of training the neural network model for whole-character recognition based on the whole-character features and the handwritten character samples includes: obtaining whole-character recognition results based on the whole-character features; obtaining the true recognition results of the handwritten character samples; and training the neural network model for whole-character recognition based on the whole-character recognition results and the true recognition results.
[0012] Optionally, the step of obtaining handwritten character samples includes: obtaining a handwriting point sequence through an electronic writing tablet; the handwriting point sequence contains point information of multiple handwriting points, and the point information contains position coordinate information; and using the obtained handwriting point sequence as a handwritten character sample.
[0013] Optionally, the dot information may also include writing pressure information and / or writing status information.
[0014] Optionally, the step of obtaining handwritten character samples includes: obtaining a text image containing handwritten characters; detecting and extracting handwritten characters from the text image, and using the extracted handwritten characters as handwritten character samples.
[0015] This disclosure also provides a feature extraction method, comprising: acquiring a target handwritten character whose features are to be extracted; using a pre-trained feature extraction model to extract features from the target handwritten character to obtain the features of the target handwritten character; wherein the feature extraction model is obtained by training the feature extraction model according to any one of the above embodiments.
[0016] Optionally, the method further includes: providing feedback on the target handwritten character based on its features.
[0017] This disclosure also provides a training apparatus for a feature extraction model, comprising: a sample acquisition module for acquiring handwritten character samples; a stroke removal module for removing at least one target stroke from the handwritten character samples to obtain retained strokes from the handwritten character samples; a feature acquisition module for acquiring stroke features of each retained stroke and stroke features of each target stroke through an encoding network in a neural network model to be trained; a stroke reconstruction module for generating reconstructed strokes of each retained stroke and each target stroke based on the stroke features through a decoding network in the neural network model; and a model training module for training the neural network model according to the reconstructed strokes generated by the decoding network, and using the encoding network in the neural network model at the end of training as a feature extraction model.
[0018] This disclosure also provides a feature extraction device, including: a character acquisition module for acquiring a target handwritten character whose features are to be extracted; and a feature extraction module for extracting features from the target handwritten character using a pre-trained feature extraction model to obtain the features of the target handwritten character; wherein the feature extraction model is obtained using the training method of the feature extraction model described in any of the preceding claims.
[0019] This disclosure also provides an electronic device, the electronic device comprising: a processor; a memory for storing executable instructions of the processor; the processor being configured to read the executable instructions from the memory and execute the instructions to implement a training method or feature extraction method for a feature extraction model as provided in this disclosure.
[0020] This disclosure also provides a computer-readable storage medium storing a computer program for executing a training method or feature extraction method for a feature extraction model as provided in this disclosure.
[0021] The technical solution provided in this disclosure first obtains a handwritten character sample and removes at least one target stroke from the handwritten character sample to obtain the retained strokes in the handwritten character sample. Then, the encoding network in the neural network model to be trained obtains the stroke features of each retained stroke and the stroke features of each target stroke. Subsequently, the decoding network in the neural network model generates the reconstructed strokes of each retained stroke and each target stroke based on the stroke features. Finally, the neural network model is trained based on the reconstructed strokes generated by the decoding network, and the encoding network in the neural network model at the end of training is used as the feature extraction model. The above method eliminates the need for labeled comment data on handwritten characters, allowing for unsupervised training (also known as self-supervised training) of the model directly using handwritten characters. Since it avoids the limitations of the quantity, quality, and labeling cost of comment data, a large number of handwritten characters can be used to train the model, effectively enhancing the richness and robustness of the extracted features. Furthermore, by performing stroke self-reconstruction (generating reconstructed strokes that retain the original strokes) and mutual stroke reconstruction (generating reconstructed strokes of the removed target strokes) based on the stroke features extracted by the encoding network, and training the model based on these reconstructed strokes, the final feature extraction model (encoding network) extracts stroke features that include not only its own stroke information but also other stroke information, overall character structure information, and relationships between strokes—a richer and more multi-layered information that effectively improves the poor handwritten character feature extraction performance in existing technologies.
[0022] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description
[0023] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.
[0024] To more clearly illustrate the technical solutions in the embodiments of this disclosure or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0025] Figure 1 A flowchart illustrating a training method for a feature extraction model provided in this embodiment of the disclosure;
[0026] Figure 2 This is a schematic diagram of the structure of a neural network model provided in an embodiment of the present disclosure;
[0027] Figure 3 A schematic diagram illustrating the training of a neural network model provided in an embodiment of this disclosure;
[0028] Figure 4 This is a schematic diagram of the structure of a neural network model provided in an embodiment of the present disclosure;
[0029] Figure 5 A schematic diagram illustrating the training of a neural network model provided in an embodiment of this disclosure;
[0030] Figure 6 A flowchart of a feature extraction method provided in an embodiment of this disclosure;
[0031] Figure 7 A schematic diagram of the structure of a training device for a feature extraction model provided in an embodiment of this disclosure;
[0032] Figure 8 This is a schematic diagram of the structure of a feature extraction device provided in an embodiment of the present disclosure;
[0033] Figure 9 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this disclosure. Detailed Implementation
[0034] To better understand the above-mentioned objectives, features, and advantages of this disclosure, the solutions disclosed herein will be further described below. It should be noted that, unless otherwise specified, the embodiments and features described herein can be combined with each other.
[0035] Numerous specific details are set forth in the following description in order to provide a full understanding of this disclosure, but this disclosure may also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only some, and not all, of the embodiments of this disclosure.
[0036] Existing handwriting critique algorithms mainly rely on extracting handwriting features for critique. Therefore, the effectiveness of handwriting feature extraction directly affects the critique results. Specifically, the richer the information contained in the extracted features, the richer and more accurate the critique results will be.
[0037] Currently, there are two main types of methods for extracting handwritten features. The first type requires designing handmade features that mimic the dimensions of human evaluation. However, these handmade features rely on prior information from experts, and the number of enumerated features is limited. They are only enumerated based on the dimensions of human evaluation, and each feature needs to be extracted according to pre-set rules and methods. For example, handwritten features are obtained only according to human evaluation dimensions such as the size of the whole character, the structure of the whole character, the starting and ending points of each stroke, the center point position of the whole character outline, area, and perimeter. However, these features are only limited and shallow features. Based on these pre-set dimensions of human evaluation, it is impossible to extract information that humans have not summarized from handwritten characters. For example, it is impossible to extract some deep features that cannot be directly calculated. Therefore, the features extracted by this type of method are relatively shallow and limited.
[0038] The second type of extraction method mainly uses deep learning methods based on images to extract handwritten features. This requires supervised training to obtain the feature extraction model, that is, labeling the training samples with comment data and training the model in a supervised manner. However, reliable comment data is difficult to obtain. Although handwritten comments are based on a certain degree of right and wrong rules, they are inevitably influenced by human subjective feelings. This means that different people may have different opinions on the same comment for the same character. Even with a large number of professional calligraphy teachers, it is difficult to obtain a large amount of stable comment data. In addition, the types of comment items in the comment data are also relatively limited, so the comment data is generally one-sided and subjective. Furthermore, labeling comment data also requires a large amount of manual labor. In summary, the number of training samples with labeled comment data is small, and the quality of the labeled comment data is poor. Therefore, the network model trained with training samples that are limited in both quantity and quality has poor reliability and robustness, and it is difficult to effectively extract rich handwritten features. In other words, it is difficult to effectively extract multi-layered, detailed handwritten features.
[0039] To address at least one of the above problems, this disclosure provides a training method, feature extraction method, apparatus, device, and medium for a feature extraction model. It eliminates the need for labeled training samples, allowing direct training of the model using a self-supervised (unsupervised) approach. Therefore, it is not limited by the number of training samples, nor does it require comment data as sample labels (supervisory information). This avoids the difficulties in obtaining comment data, the high labor costs associated with labeling, and the limitations imposed by the quality of comment data. This disclosure enables training the model using a large number of training samples, and the resulting model can extract richer and more multi-layered handwritten character features. For ease of understanding, a detailed description follows:
[0040] First, this disclosure provides a method for training a feature extraction model, see [link to relevant documentation]. Figure 1The diagram illustrates a training method for a feature extraction model. This method can be executed by a training device for the feature extraction model, which can be implemented in software and / or hardware, and is typically integrated into an electronic device. Figure 1 As shown, the method mainly includes the following steps S102 to S110:
[0041] Step S102: Obtain handwritten character samples. These handwritten character samples are the samples used to train the neural network model. They can be characters composed of strokes, such as Chinese, Japanese, or Korean. There are no restrictions on the language of the handwritten characters; they only need to be composed of strokes. The handwritten character samples can be represented directly as a sequence of stroke points or directly as a handwritten character image; there are no restrictions here.
[0042] In some implementations, a handwriting point sequence can be obtained using an electronic writing tablet. This sequence contains point information for multiple handwriting points, including position coordinates. Additionally, it may include writing pressure information and / or writing state information. The obtained handwriting point sequence is then used as a handwritten character sample. The position coordinates indicate the position of the handwriting point on the electronic writing tablet, the writing pressure information indicates the pressure applied by the user when writing the point, and the writing state information indicates whether the point is in a writing or lifting state. In specific implementation examples, handwriting points can be uniformly sampled during the user's writing process on the electronic writing tablet, making the handwriting point sequence used as a handwritten character sample more objective and accurate.
[0043] In other implementations, a text image containing handwritten characters can be acquired; then, the handwritten characters can be detected and extracted from the text image, and the extracted handwritten characters can be used as handwritten character samples. In specific implementations, the handwritten characters extracted from the text image can also be segmented according to strokes to obtain the individual strokes that make up the handwritten character, so that they can be used directly later.
[0044] Of course, in practical applications, handwriting point sequences and text images can be converted to each other. For example, handwriting point sequences can be obtained by further skeletalizing and point sampling the handwritten characters extracted from the text image; or, handwritten character images can be drawn based on handwriting point sequences. In summary, the embodiments of this disclosure do not limit the form of handwritten character samples. Subsequent processing can be performed directly on the obtained handwritten character samples, or the form of the handwritten character samples can be converted into other processable forms before subsequent processing, depending on the actual situation.
[0045] Step S104: Remove at least one target stroke from the handwritten character sample to obtain the retained strokes in the handwritten character sample.
[0046] It is understood that handwritten character samples typically contain multiple strokes. To achieve better feature extraction results, this embodiment of the disclosure can randomly remove one or more strokes from the handwritten character sample. The removed strokes are the target strokes, and the strokes that are not removed are the retained strokes. The purpose of removing some strokes from the handwritten character sample is to promote the utilization of information between strokes, making it easier to recover the removed strokes using other strokes. This ensures that the extracted stroke features not only contain their own stroke information but also information from other strokes or the entire character.
[0047] Step S106: Obtain the stroke features of each retained stroke and the stroke features of each target stroke through the encoding network in the neural network model to be trained.
[0048] The encoding network in the neural network model to be trained is mainly used to extract features (also known as encoding) from each stroke, obtaining the stroke features (also known as stroke embedding) of each stroke. It is understandable that the stroke features of the retained strokes can be directly extracted, but if the retained stroke features also carry information about other strokes, information about the relationships between strokes, information about the overall structure of the character, etc., the stroke features of the target stroke can also be extracted accordingly.
[0049] Step S108: Generate the reconstructed strokes of each retained stroke and the reconstructed strokes of each target stroke based on stroke features through the decoding network in the neural network model.
[0050] The decoding network in the neural network model to be trained is mainly used to parse and restore the stroke features output by the encoding network, thereby realizing stroke reconstruction. Specifically, the decoding network can realize stroke self-reconstruction (generating reconstructed strokes that retain the strokes) and stroke mutual reconstruction (generating reconstructed strokes that target the strokes). In addition to restoring the strokes that retain the strokes, it can also restore the target strokes that were previously removed based on the stroke features.
[0051] In some implementations, the aforementioned neural network model can be implemented using generative models such as VAEs (Variational Autoencoders). In generative models, the input and output formats are consistent; for example, if the input to the model is a stroke, the output after encoding and decoding the stroke is also a stroke. Furthermore, the processing of handwritten character samples with the target strokes removed by the neural network model can also be understood as a compression and reconstruction process. "Compression" essentially refers to feature compression, which reduces the information dimensionality and uses lower-dimensional information to express higher-dimensional information. "Reconstruction" essentially refers to image reconstruction, that is, restoring the compressed features to the original input information dimensionality.
[0052] Step S110: Train the neural network model based on the reconstructed strokes generated by the decoding network, and use the encoding network in the neural network model at the end of training as the feature extraction model.
[0053] The above-described model training process is essentially the process of adjusting the network parameters of the neural network model. In this embodiment, there is no need for supervised training with annotation data for handwritten samples; instead, a self-supervised approach can be used for model training. This means that the reconstructed strokes output by the neural network model are supervised by the handwritten samples themselves until the model outputs reconstructed strokes that meet expectations. At this point, the reconstructed strokes output by the neural network model can effectively reproduce the true strokes of the handwritten samples, demonstrating that the encoding network in the neural network model can effectively extract rich and multi-layered stroke features, enabling the decoding network to achieve expected stroke reconstruction based on these extracted features. After the neural network model training is complete, the encoding network can be directly used as the feature extraction model. In subsequent applications, this feature extraction model can be directly used for handwritten feature extraction.
[0054] In summary, the training method for the feature extraction model provided in this embodiment of the present disclosure is not limited by the quantity, quality, or annotation cost of the comment data because it does not require annotation of comment data for handwritten characters. A large number of handwritten characters can be used to train the model, thereby effectively enhancing the richness and robustness of the extracted features. In addition, based on the stroke features extracted by the encoding network, stroke self-reconstruction (generating reconstructed strokes that retain the strokes) and stroke mutual reconstruction (generating reconstructed strokes that remove the target strokes) are performed. The model is trained based on the reconstructed strokes, so that the stroke features extracted by the final feature extraction model (encoding network) contain not only its own stroke information, but also other stroke information or whole character stroke information, and information on the relationship between strokes, which is a richer and more multi-layered information. This can effectively improve the problem of poor handwritten character feature extraction in the prior art.
[0055] In some implementations, the steps of obtaining the stroke features of each retained stroke and the stroke features of each target stroke through the encoding network in the neural network model to be trained can be implemented with reference to the following steps a to c:
[0056] Step a: Obtain the first marker information corresponding to each retained stroke. The first marker information includes the stroke morphology information of the retained stroke and the standard positional information of the retained stroke in the handwritten character sample. In practical applications, the first marker information can be directly represented by the stroke itself, or it can be the initial stroke features obtained after preprocessing (such as preliminary simple encoding), or it can be the corresponding sequence number pre-assigned to the stroke (for example, different sequences can be pre-assigned to different strokes, that is, each sequence number corresponds to a stroke morphology). However, regardless of the form, it can reflect the stroke morphology information. In addition, the first marker information also carries the standard positional information of the retained stroke in the handwritten character sample. The standard positional information is used to characterize the standard position and / or standard writing order of the stroke in the handwritten character (that is, the standard stroke sequence number).
[0057] Step b: Obtain the second marker information corresponding to each target stroke; the second marker information includes the standard positional information of the target stroke in the handwritten character sample. It can be understood that since the target stroke has been removed, the second marker information corresponding to the target stroke does not contain stroke shape information, that is, the stroke shape information is empty, but it can still carry its standard positional information.
[0058] Step c: Extract the stroke features of each retained stroke and the stroke features of each target stroke based on the first and second label information using an encoding network.
[0059] In one specific implementation, the aforementioned stroke shape information can be referred to as a token. That is, the handwritten character sample, after removing the target stroke, can be divided into different tokens according to stroke shape. The token corresponding to the target stroke can be an empty token; that is, a specific empty token can be used to represent the stroke order position of the target stroke. Then, based on the standard positional information corresponding to each stroke, the stroke tokens are sequentially input into the encoding network. The encoding network can determine the standard positional information of each stroke token in the whole character based on the input order of the stroke tokens. In another specific implementation, it is not necessary to input the stroke tokens into the encoding network sequentially. Instead, the stroke shape information along with the standard positional information is directly input into the encoding network. That is, the aforementioned first token information and second token information are directly input into the encoding network together.
[0060] In some implementations, the step of extracting stroke features of each retained stroke and each target stroke using an encoding network based on first and second labeling information includes: extracting stroke features of the corresponding retained stroke based on each first labeling information using an encoding network; and extracting stroke features of each target stroke based on all the first and second labeling information using an encoding network. That is, the encoding network can extract features of the retained stroke itself based on the first labeling information, or it can fuse all the labeling information to extract the features of the target stroke.
[0061] This disclosure does not limit the structure of the encoding network; any network structure capable of feature extraction (or feature encoding) is acceptable, such as various forms including sequence models, image models, or graph models. If a sequence model is used, the output stroke features are the corresponding output items at their input positions; that is, the order of the input strokes is consistent with the order of the output stroke features. If an image model is used, the input and output matrices need to ensure that their number of channels is consistent. If a graph model is used, each stroke can be treated as a node in a graph, constructing a graph of relationships between strokes, and using the intermediate output node features as stroke features.
[0062] After obtaining the stroke features through the encoding network in the neural network model, this disclosure provides a specific implementation method for generating reconstructed strokes of each retained stroke and target strokes based on the stroke features through the decoding network in the neural network model. This includes: independently parsing each stroke feature and jointly parsing multiple stroke features through the decoding network in the neural network model to obtain independent parsing results and joint parsing results; and generating reconstructed strokes of each retained stroke and target strokes based on the independent parsing results and joint parsing results. The joint parsing process involves fusing multiple stroke features to extract as much information as possible beyond the stroke's own features, such as the correlation between different strokes and the overall character structure. Reconstructing retained strokes using the stroke features of the retained strokes (stroke self-reconstruction) and reconstructing the removed target strokes using multiple stroke features (stroke mutual reconstruction). In other words, the decoding model can perform information fusion within and between strokes for each reconstructed stroke, obtaining the reconstruction result for each stroke (target stroke and retained stroke). Understandably, once both the retained and target strokes have been reconstructed, the entire character can be reconstructed by combining the standard positional information corresponding to each of the retained and target strokes.
[0063] Similarly, the present disclosure does not limit the structure of the decoding network. Any network structure that can realize feature parsing and reconstruction is acceptable, but it needs to be consistent with the structure of the encoding network. For example, when the encoding network is in the form of a sequence model, the decoding network is also in the form of a sequence model; when the encoding network is in the form of an image model, the decoding network is also in the form of an image model; when the encoding network is in the form of a graph model, the decoding network is also in the form of a graph model.
[0064] This disclosure provides a schematic diagram of the structure of a neural network model. (See also...) Figure 2 As shown, the neural network model includes an encoding network and a decoding network. The encoding network is used to acquire the stroke features of each retained stroke and the stroke features of each target stroke, while the decoding network is used to generate the reconstructed strokes of each retained stroke and each target stroke based on the stroke features.
[0065] Based on this, the steps for training the neural network model according to the reconstructed strokes generated by the decoding network include the following (1) and (2):
[0066] (1) Based on the reconstructed strokes of the preserved strokes and the preserved strokes generated by the decoding network, the neural network model is trained by self-construction of strokes.
[0067] In some implementations, the retained strokes can be used as supervisory information. By adjusting the parameters of the decoding network, the reconstructed strokes generated by the decoding network can be made to approximate the retained strokes, until a preset first training termination condition is met, at which point the stroke self-construction training stops. In a specific implementation example, an L2 loss function can be used to measure the loss of the reconstructed strokes relative to the retained strokes. The stroke self-construction training ends when the L2 loss function converges, indicating that the first training termination condition has been met.
[0068] (2) The reconstructed characters are formed by combining the reconstructed strokes of the preserved strokes generated by the decoding network and the reconstructed strokes of the target strokes. The neural network model is trained to reconstruct characters based on the handwritten character samples and the reconstructed characters.
[0069] In some embodiments, a handwritten character sample (a complete character) can be used as supervision information. By adjusting the parameters of the decoding network, the reconstructed complete character formed by the reconstructed strokes of the retained strokes and the target strokes respectively generated by the decoding network can be made to approach the handwritten character sample until the stroke self-reconstruction training is stopped when a preset second training end condition is met. In a specific implementation example, a loss function such as the L2 loss function can be used to measure the loss of the reconstructed complete character relative to the handwritten character sample, and the second training end condition is determined to be met when the L2 loss function converges, at which point the complete character reconstruction training ends. It can be understood that the complete character reconstruction training can also be regarded as the training of mutual stroke reconstruction. After the final training ends, the effect of accurately reconstructing the removed target strokes with the retained strokes can be achieved. After the target strokes are accurately reconstructed, the effect of accurately reconstructing the complete character can naturally be achieved smoothly.
[0070] In some embodiments, the steps of the above-mentioned stroke self-reconstruction training can be preferentially executed, and then the steps of the above-mentioned complete character reconstruction training can be executed. For example, the neural network model is preferentially trained for stroke self-reconstruction until the training is stopped when the neural network model can achieve a stroke self-reconstruction effect that meets expectations. Then, the neural network model after stroke self-reconstruction training is trained for complete character reconstruction, that is, the parameters of the neural network model are further adjusted until the neural network model can achieve a complete character reconstruction effect that meets expectations and the training is terminated. In addition, the stroke self-reconstruction training and the complete character reconstruction training can also be carried out simultaneously, which is not limited here.
[0071] For ease of understanding, reference can be made to Figure 3 the training schematic diagram of a neural network model shown in the figure, which shows that the handwritten character sample is the Chinese character "yin", including 6 standard strokes, and the target stroke "丿" is removed, and the remaining strokes are all used as retained strokes. The handwritten character sample with the target stroke removed is split into multiple tokens according to the strokes. Among them, the token corresponding to the removed stroke "丿" is empty, and it can be recorded as [emp token] at its corresponding stroke order position. Then, all tokens are input to the encoding network in the standard writing order. In this way, the encoding network knows the morphological information of each stroke (the morphological information of the removed stroke is empty) and the standard order information, and based on this, feature extraction is performed to obtain the stroke embedding corresponding to each stroke (that is, the stroke feature). Then, the decoding network is used to parse and restore the stroke embedding output by the encoding network to obtain the reconstructed stroke corresponding to each stroke, and thus the complete character reconstruction can be achieved. As Figure 3 shown in the figure, even for the stroke "丿" that has been removed in advance, the neural network model can reconstruct "丿" based on the features of the remaining strokes. After obtaining the reconstructed stroke corresponding to each stroke, the complete character reconstruction can be achieved by using all the reconstructed strokes. In Figure 3In it, the supervision information is schematically shown by dashed lines. The handwritten character sample "yin" serves as the supervision information for the whole character reconstruction result, and the remaining strokes after splitting serve as the supervision information for the stroke reconstruction. In the above process, it is necessary to perform self-reconstruction on individual strokes through the compression reconstruction process, and also to reconstruct the target strokes that have been removed based on the feature information of other strokes. Therefore, the stroke features extracted by the encoding network in the neural network model should contain both the morphological detail information of individual strokes and the relational information between strokes, the whole character structure information, etc. Thus, it can be ensured that the encoding network after the training ends can extract rich and multi-layered stroke features.
[0072] To prompt the encoding network in the neural network model to further extract stroke features containing the high-level semantic information of the whole character, the embodiments of the present disclosure also provide a structural schematic diagram of a neural network model. On the Figure 2 basis of Figure 4 it is shown that the neural network model further includes a feature fusion network. The feature fusion network is used to perform feature fusion on the stroke features output by the encoding network to obtain the whole character features corresponding to the handwritten character sample. Based on this, the steps of training the neural network model, on the basis of the above (1) and (2), further include:
[0073] (3) Input the stroke features obtained by the encoding network into the feature fusion network, and obtain the whole character features corresponding to the handwritten character sample through the feature fusion network based on the stroke features; perform whole character recognition training on the neural network model based on the whole character features and the handwritten character sample. Among them, the whole character features can also be referred to as whole character embeddings (char embedding). When performing whole character recognition training on the neural network model based on the whole character features and the handwritten character sample, the whole character recognition result can be obtained based on the whole character features; the true recognition result of the handwritten character sample can be obtained; and whole character recognition training is performed on the neural network model based on the whole character recognition result and the true recognition result. The above process of whole character recognition can also be understood as a classification process, that is, determining which character the whole character features specifically correspond to.
[0074] In some embodiments, the true recognition result can be used as the supervision information. By adjusting the parameters of the neural network model, it is prompted that the whole character recognition result obtained based on the whole character features (mainly obtained by fusing the stroke features generated by the encoding network) can approach the true recognition result until the third training end condition is met and the stroke self-reconstruction training is stopped. In a specific implementation example, a softmax loss function can be used to measure the loss of the whole character recognition result relative to the true recognition result, and when the softmax loss function converges, it is determined that the third training end condition is met, and at this time, the whole character recognition training ends. It can be understood that if the whole character recognition result can approach the true recognition result, it means that the stroke features extracted by the encoding network already contain the high-level semantic information of the whole character well.
[0075] The stroke self-construction training and the whole character reconstruction training in (1) and (2) above can ensure that the stroke features extracted by the encoding network contain at least low-level morphological information. Specifically, it includes the morphological detail information of the strokes themselves, as well as the correlation information between strokes, the morphology and structure information of the whole character, etc. The whole character recognition training in (3) above can ensure that the stroke features extracted by the encoding network contain high-level semantic information, comprehensively making the stroke features extracted by the encoding network rich and multi-layered.
[0076] For the sake of understanding, based on Figure 3 , one can further refer to Figure 5 a training schematic diagram of a neural network model shown in it, which also shows that the stroke embeddings output by the encoding network are all input to the feature fusion network to obtain the whole character embedding (i.e., the whole character feature) of the feature fusion network, and then recognition is performed based on the whole character embedding to obtain the whole character recognition result. After that, the true recognition result (i.e., the true recognition result of the handwritten character 'yin') can be taken as the supervision information to adjust the parameters of the neural network model again, so that the stroke features extracted by the encoding network can effectively contain high-level semantic information, thereby making the final whole character recognition result approach the true recognition result.
[0077] By performing the above-mentioned stroke self-construction training, whole character reconstruction training and whole character recognition training on the neural network model, multi-level information such as the stroke detail information, the correlation information between strokes and the structure information of the whole character in handwritten characters can be encoded by the encoding network into the stroke embeddings (i.e., stroke features). After the training is completed, only the encoding network in the neural network model can be taken as the feature extraction model to perform stroke encoding on handwritten characters, and the stroke embedding of each stroke is obtained as the feature expression of the stroke.
[0078] Based on the above content, the embodiments of the present disclosure also provide a feature extraction method. Refer to Figure 6 a flowchart of a feature extraction method shown in it, which mainly includes the following steps S602 to step S604:
[0079] Step S602, obtain the target handwritten character for which features are to be extracted;
[0080] Step S604, use the pre-trained feature extraction model to extract features from the target handwritten character to obtain the features of the target handwritten character; wherein, the feature extraction model is obtained by using the training method of the feature extraction model provided in any one of the foregoing of the embodiments of the present disclosure.
[0081] The feature extraction method provided in this embodiment adopts the feature extraction model obtained by the aforementioned training method. Therefore, it can effectively extract rich and multi-layered handwritten features. In addition to including low-level stroke detail features and structural features between strokes, the handwritten features can also include higher-level semantic features. This not only helps to score the target handwritten character as a whole, but can also be used to evaluate it on multiple dimensions such as stroke size and shape, and stroke position relationship. In other words, it supports multi-dimensional evaluation, thereby ensuring the reliability and accuracy of the evaluation results.
[0082] Furthermore, the above method also includes: commenting on the target handwriting based on its characteristics. The comment results can be represented directly as a score or a writing standard level, or they can provide comments such as whether the length of each stroke is appropriate, whether the overall structure is reasonable, etc. The comments can also point out the writing errors of the target handwriting, such as adding an extra "hook" to a certain stroke.
[0083] In summary, the training method and feature extraction method of the aforementioned feature extraction model provided in this disclosure have at least one of the following advantages:
[0084] 1) The self-supervised approach is used to obtain the feature representation of handwritten characters in both stroke and whole character dimensions. This eliminates the need for manual annotation of comment data for training samples. Feature extraction is achieved using only the information of the training samples themselves, and is not limited by the quantity, quality, or annotation cost of comment data. Therefore, it can greatly increase the number of available training samples, thereby supporting more complex models. It can also effectively enhance the richness and robustness of the model's feature extraction. In addition, it saves a huge amount of annotation work on handwritten character samples (i.e., saves manual annotation costs), and also avoids the instability caused by the subjectivity of the annotated comment data.
[0085] 2) The feature extraction model, trained using three supervised tasks—stroke self-reconstruction, whole-character reconstruction (stroke-to-stroke reconstruction), and whole-character recognition—extracts stroke features containing low-level stroke details, structural information between strokes, and higher-level semantic information. This means it can extract rich, multi-layered (multi-scale) stroke features. This feature extraction method does not rely on prior expert information or manually pre-defined rules, and can further extract deeper features beyond those manually summarized. The rich, multi-layered stroke features extracted can support multi-dimensional evaluation outputs of subsequent handwritten characters. Furthermore, it can be used not only for coarse-grained handwritten character evaluation methods such as whole-character scoring, but also for finer-grained methods such as writing standardization evaluation or error judgment evaluation.
[0086] 3) It can extract features from handwritten characters input by image and handwriting point input. That is, it is applicable to both handwritten character samples in the form of handwriting point sequence samples and text image samples. Therefore, it can be applied to various types of calligraphy practice application scenarios or handwritten character review scenarios, thus expanding the scope of application scenarios.
[0087] Corresponding to the aforementioned training method for the feature extraction model, this disclosure also provides a training apparatus for the feature extraction model, see [link to relevant documentation]. Figure 7 The diagram shows a structural schematic of a training device for a feature extraction model. This device can be implemented by software and / or hardware, and is generally integrated into electronic devices, such as... Figure 7 As shown, the training apparatus for the feature extraction model includes:
[0088] Sample acquisition module 702 is used to acquire handwritten character samples;
[0089] The stroke removal module 704 is used to remove at least one target stroke from the handwritten character sample to obtain the retained strokes in the handwritten character sample.
[0090] The feature acquisition module 706 is used to acquire the stroke features of each retained stroke and the stroke features of each target stroke through the encoding network in the neural network model to be trained.
[0091] The stroke reconstruction module 708 is used to generate the reconstructed stroke of each preserved stroke and the reconstructed stroke of each target stroke based on stroke features through the decoding network in the neural network model.
[0092] The model training module 710 is used to train the neural network model based on the reconstructed strokes generated by the decoding network, and uses the encoding network in the neural network model at the end of training as the feature extraction model.
[0093] The training apparatus for the feature extraction model provided in this embodiment of the present disclosure is not limited by the quantity, quality, and annotation cost of the comment data because it does not require annotation of comment data for handwritten characters. It can train the model with a large number of handwritten characters, thereby effectively enhancing the richness and robustness of the extracted features. In addition, based on the stroke features extracted by the encoding network, stroke self-reconstruction (generating reconstructed strokes that retain the strokes) and stroke mutual reconstruction (generating reconstructed strokes that remove the target strokes) are performed. The model is trained based on the reconstructed strokes, so that the stroke features extracted by the final feature extraction model (encoding network) contain not only its own stroke information, but also other stroke information or whole character stroke information, and information on the relationship between strokes, which is a richer and more multi-layered information. This can effectively improve the problem of poor handwritten character feature extraction in the prior art.
[0094] In some implementations, the feature acquisition module 706 is specifically used to: acquire first marker information corresponding to each retained stroke; the first marker information includes the stroke shape information of the retained stroke and the standard position information of the retained stroke in the handwritten character sample; acquire second marker information corresponding to each target stroke; the second marker information includes the standard position information of the target stroke in the handwritten character sample; and extract the stroke features of each retained stroke and the stroke features of each target stroke based on the first marker information and the second marker information through an encoding network.
[0095] In some implementations, the feature acquisition module 706 is specifically used to: extract the stroke features of the corresponding retained strokes based on each of the first marker information through the encoding network; and extract the stroke features of each of the target strokes based on all the first marker information and the second marker information through the encoding network.
[0096] In some implementations, the stroke reconstruction module 708 is specifically used to: independently parse each of the stroke features and jointly parse multiple of the stroke features through the decoding network in the neural network model to obtain independent parsing results and joint parsing results; and generate reconstructed strokes of each of the retained strokes and the target stroke based on the independent parsing results and the joint parsing results.
[0097] In some implementations, the model training module 710 is specifically used to: perform stroke self-construction training on the neural network model based on the preserved strokes and the reconstructed strokes of the preserved strokes generated by the decoding network; and form a reconstructed whole character by combining the reconstructed strokes of the preserved strokes generated by the decoding network and the reconstructed strokes of the target strokes, and perform whole character reconstruction training on the neural network model based on the handwritten character sample and the reconstructed whole character.
[0098] In some embodiments, the neural network model further includes a feature fusion network, and the model training module 710 is further configured to: input the stroke features obtained by the encoding network into the feature fusion network, obtain the whole character features corresponding to the handwritten character sample through the feature fusion network based on the reconstructed strokes; and train the neural network model for whole character recognition based on the whole character features and the handwritten character sample.
[0099] In some implementations, the model training module 710 is further specifically used for: obtaining whole character recognition results based on the whole character features; obtaining the true recognition results of the handwritten character sample; and training the neural network model for whole character recognition based on the whole character recognition results and the true recognition results.
[0100] In some implementations, the sample acquisition module 702 is specifically used to: acquire a handwriting point sequence via an electronic writing tablet; the handwriting point sequence includes point information of multiple handwriting points, and the point information includes position coordinate information; and use the acquired handwriting point sequence as a handwritten character sample. Optionally, the point information may also include writing pressure information and / or writing state information.
[0101] In some implementations, the sample acquisition module 702 is specifically used to: acquire a text image containing handwritten characters; detect and extract handwritten characters from the text image, and use the extracted handwritten characters as handwritten character samples.
[0102] Corresponding to the aforementioned feature extraction method, this disclosure also provides a feature extraction apparatus, see [link to relevant documentation]. Figure 8 The diagram shows a structural schematic of a feature extraction device. This device can be implemented by software and / or hardware, and is generally integrated into electronic devices, such as... Figure 8 As shown, the training apparatus for the feature extraction model includes:
[0103] The character acquisition module 802 is used to acquire the target handwritten character whose features are to be extracted.
[0104] The feature extraction module 804 is used to extract features from the target handwritten character using a pre-trained feature extraction model to obtain the features of the target handwritten character; wherein, the feature extraction model is obtained by training any of the aforementioned feature extraction models provided in the embodiments of this disclosure.
[0105] The feature extraction device provided in this embodiment uses the feature extraction model obtained by the aforementioned training method. Therefore, it can effectively extract rich and multi-layered handwritten features. In addition to including low-level stroke detail features and structural features between strokes, the handwritten features can also include higher-level semantic features. This not only helps to score the target handwritten character as a whole, but can also be used to evaluate it on multiple dimensions such as stroke size and shape, and stroke position relationship. In other words, it supports multi-dimensional evaluation, thereby ensuring the reliability and accuracy of the evaluation results.
[0106] In some embodiments, the above-described apparatus further includes a critique module for critiquing the target handwritten character based on its features.
[0107] The training apparatus and feature extraction apparatus of the feature extraction model provided in this disclosure can respectively execute the training method and feature extraction method of the feature extraction model provided in any embodiment of this disclosure, and have the corresponding functional modules and beneficial effects of the execution method.
[0108] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the above-described device embodiments can be referred to the corresponding process in the method embodiments, and will not be repeated here.
[0109] This disclosure also provides an electronic device, which includes: a processor; a memory for storing processor-executable instructions; and a processor for reading executable instructions from the memory and executing the instructions to implement the training method of any of the above-described feature extraction models or to implement any of the above-described feature extraction methods.
[0110] Figure 9 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this disclosure. Figure 9 As shown, the electronic device 900 includes one or more processors 901 and memory 902.
[0111] The processor 901 may be a central processing unit (CPU) or other form of processing unit with data processing capabilities and / or instruction execution capabilities, and may control other components in the electronic device 900 to perform desired functions.
[0112] The memory 902 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and / or cache memory. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 901 may execute the program instructions to implement the training method or feature extraction method of the feature extraction model of the embodiments of this disclosure described above, and / or other desired functions. Various contents such as input signals, signal components, and noise components may also be stored in the computer-readable storage medium.
[0113] In one example, the electronic device 900 may also include an input device 903 and an output device 904, which are interconnected via a bus system and / or other forms of connection mechanism (not shown).
[0114] In addition, the input device 903 may also include, for example, a keyboard, a mouse, etc.
[0115] The output device 904 can output various information to the outside, including determined distance information, direction information, etc. The output device 904 may include, for example, a display, a speaker, a printer, and a communication network and its connected remote output devices, etc.
[0116] Of course, for the sake of simplicity, Figure 9 Only some of the components of the electronic device 900 relevant to this disclosure are shown, omitting components such as buses, input / output interfaces, etc. In addition, the electronic device 900 may include any other suitable components depending on the specific application.
[0117] In addition to the methods and devices described above, embodiments of this disclosure may also be computer program products, including computer program instructions that, when executed by a processor, cause the processor to perform the training method or feature extraction method of the feature extraction model described in the embodiments of this disclosure.
[0118] The computer program product can be written in any combination of one or more programming languages to perform the operations of the embodiments of this disclosure. The programming languages include object-oriented programming languages such as Java and C++, as well as conventional procedural programming languages such as C or similar languages. The program code can be executed entirely on a user's computing device, partially on a user's computing device, as a standalone software package, partially on a user's computing device and partially on a remote computing device, or entirely on a remote computing device or server.
[0119] Furthermore, embodiments of this disclosure may also be computer-readable storage media storing computer program instructions thereon, which, when executed by a processor, cause the processor to perform the training method or feature extraction method of the feature extraction model provided in the embodiments of this disclosure.
[0120] The computer-readable storage medium may be any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may, for example, include, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses, or devices, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: electrical connections having one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0121] This disclosure also provides a computer program product, including a computer program / instruction, which, when executed by a processor, implements the training method and feature extraction method of the feature extraction model in this disclosure.
[0122] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0123] The above description is merely a specific embodiment of this disclosure, enabling those skilled in the art to understand or implement it. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this disclosure. Therefore, this disclosure is not to be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A training method for a feature extraction model, characterized in that, include: Obtain handwritten samples; Randomly remove at least one target stroke from the handwritten character sample to obtain the retained strokes in the handwritten character sample; The stroke features of each of the preserved strokes and the stroke features of each of the target strokes are obtained through the encoding network in the neural network model to be trained. The neural network model generates a reconstructed stroke for each of the preserved strokes and a reconstructed stroke for each of the target strokes based on the stroke features through the decoding network in the neural network model. The neural network model is trained based on the reconstructed strokes generated by the decoding network, and the encoding network in the neural network model at the end of training is used as a feature extraction model. The step of training the neural network model based on the reconstructed strokes generated by the decoding network includes: performing stroke self-training on the neural network model based on the retained strokes and the reconstructed strokes of the retained strokes generated by the decoding network; and forming a reconstructed whole character by combining the reconstructed strokes of the retained strokes generated by the decoding network and the reconstructed strokes of the target strokes, and performing whole character reconstruction training on the neural network model based on the handwritten character sample and the reconstructed whole character.
2. The method according to claim 1, characterized in that, The step of obtaining the stroke features of each of the preserved strokes and the stroke features of each of the target strokes through the encoding network in the neural network model to be trained includes: Obtain first marker information corresponding to each of the retained strokes; the first marker information includes the stroke shape information of the retained stroke and the standard position information of the retained stroke in the handwritten character sample; Obtain second marker information corresponding to each target stroke; the second marker information includes the standard positional information of the target stroke in the handwritten character sample; The encoding network extracts the stroke features of each of the retained strokes and the stroke features of each of the target strokes based on the first and second labeling information.
3. The method according to claim 2, characterized in that, The step of extracting the stroke features of each of the retained strokes and the stroke features of each of the target strokes based on the first and second marker information through the encoding network includes: The encoding network extracts the stroke features of the corresponding retained strokes based on each of the first marker information; The encoding network extracts the stroke features of each target stroke based on all the first and second label information.
4. The method according to claim 1, characterized in that, The step of generating the reconstructed strokes of each of the preserved strokes and the reconstructed strokes of the target strokes based on the stroke features through the decoding network in the neural network model includes: The decoding network in the neural network model is used to independently parse each stroke feature and jointly parse multiple stroke features to obtain independent parsing results and joint parsing results; Based on the independent analysis results and the joint analysis results, the reconstructed strokes of each of the preserved strokes and the reconstructed strokes of the target strokes are generated.
5. The method according to claim 1, characterized in that, The neural network model further includes a feature fusion network, and the step of training the neural network model based on the reconstructed strokes generated by the decoding network further includes: The stroke features obtained by the encoding network are input into the feature fusion network, and the whole character features corresponding to the handwritten character sample are obtained by the feature fusion network based on the stroke features. Based on the whole character features and the handwritten character samples, the neural network model, after being trained by the stroke self-construction training and the whole character reconstruction training, is trained to recognize whole characters.
6. The method according to claim 5, characterized in that, The step of training the neural network model for whole-character recognition based on the whole-character features and the handwritten character samples includes: The whole character recognition result is obtained based on the whole character features; Obtain the actual recognition results of the handwritten character samples; The neural network model is trained to perform whole-character recognition based on the whole-character recognition results and the actual recognition results.
7. The method according to claim 1, characterized in that, The step of obtaining handwritten character samples includes: A handwriting point sequence is obtained using an electronic writing tablet; the handwriting point sequence contains point information of multiple handwriting points, and the point information includes position coordinate information; The obtained handwriting point sequence is used as a handwritten character sample.
8. The method according to claim 7, characterized in that, The point information also includes writing pressure information and / or writing status information.
9. The method according to claim 1, characterized in that, The step of obtaining handwritten character samples includes: Get a text image containing handwritten text; Handwritten characters are detected and extracted from the text image, and the extracted handwritten characters are used as handwritten character samples.
10. A feature extraction method, characterized in that, include: Obtain the target handwritten character whose features are to be extracted; The target handwritten character is feature extracted using a pre-trained feature extraction model to obtain the features of the target handwritten character; wherein the feature extraction model is obtained by training the feature extraction model according to any one of claims 1 to 9.
11. The method according to claim 10, characterized in that, The method further includes: The target handwritten character is evaluated based on its features.
12. A training device for a feature extraction model, characterized in that, include: The sample acquisition module is used to acquire handwritten character samples; The stroke removal module is used to randomly remove at least one target stroke from the handwritten character sample to obtain the retained strokes in the handwritten character sample. The feature acquisition module is used to acquire the stroke features of each of the retained strokes and the stroke features of each of the target strokes through the encoding network in the neural network model to be trained. The stroke reconstruction module is used to generate a reconstructed stroke for each of the preserved strokes and a reconstructed stroke for each of the target strokes based on the stroke features through the decoding network in the neural network model. The model training module is used to train the neural network model based on the reconstructed strokes generated by the decoding network, and to use the encoding network in the neural network model at the end of training as a feature extraction model. Specifically, the model training module is used to: perform stroke self-construction training on the neural network model based on the preserved strokes and the reconstructed strokes of the preserved strokes generated by the decoding network; and form a reconstructed whole character by combining the reconstructed strokes of the preserved strokes generated by the decoding network and the reconstructed strokes of the target strokes, and perform whole character reconstruction training on the neural network model based on the handwritten character sample and the reconstructed whole character.
13. A feature extraction device, characterized in that, include: The character acquisition module is used to acquire the target handwritten characters whose features are to be extracted. The feature extraction module is used to extract features from the target handwritten character using a pre-trained feature extraction model to obtain the features of the target handwritten character; wherein the feature extraction model is obtained by training the feature extraction model according to any one of claims 1 to 9.
14. An electronic device, characterized in that, The electronic device includes: processor; Memory used to store the processor's executable instructions; The processor is configured to read the executable instructions from the memory and execute the instructions to implement the training method of the feature extraction model according to any one of claims 1-9 or to implement the feature extraction method according to claim 10 or 11.
15. A computer-readable storage medium, characterized in that, The storage medium stores a computer program for executing the training method of the feature extraction model according to any one of claims 1-9 or implementing the feature extraction method according to claim 10 or 11.