Chinese ancient book character recognition method, Chinese ancient book character segmentation, layout reconstruction method, medium and equipment
A technology of character recognition and character classification, applied in character recognition, character and pattern recognition, neural learning methods, etc., can solve problems such as misjudgment, omission, and uneven distribution of character categories, achieve uniform character size distribution, reduce negative interference, The effect of improving accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0080] Example 1
[0081] This embodiment discloses a Chinese ancient bike character identification method, which can be performed by a smart device such as a computer, such as figure 1 As shown, the specifically includes the steps of:
[0082] Step 1. Get the Chinese ancient books of the character boundary box and the character category as the original training sample; simultaneously obtain the label file of the original training sample, including the character boundary box size, character position, and character category in the standard file.
[0083] The above character position can be obtained by the character boundary box, the specific, character position is the coordinates of the two top angles opposite the boundary frame, for example: (x left Y top , X right Y bottom ), (X left Y top ) As the coordinate of the left upper left corner, (x right Y bottom ) The coordinates of the lower right corner of the boundary box.
[0084] The above character category refers to a specific ...
Example Embodiment
[0117] Example 2
[0118] This example discloses a Chinese ancient book character group method, including the following steps:
[0119] Step 7, for the acquired Chinese ancient bike document image, the prediction boundary frame and predictive category of each character are acquired by the method described in Embodiment 1;
[0120] Step 8, the predicted boundary box of each character is restored in group clustering and reading order in accordance with the ancient book order and the character language sequence, and obtain an ancient book content of the non-punctuation symbol. like figure 2 As shown in this, the specific steps are as follows:
[0121] S1, in the predicted boundary box of each character, in accordance with the ancient book order habitual sorting, and calculate the geometric feature information of the character boundary box. details as follows:
[0122] S1A, the predicted boundary box of each character is sorted in the order of the ancient books from the left, and from...
Example Embodiment
[0170] Example 3
[0171] This embodiment discloses a Chinese ancient book reconstruction method, including steps:
[0172] Step 9. For the acquired Chinese ancient books, first, the characters identified in the Chinese ancient book image are restored by the Chinese ancient bike character set method according to Embodiment 2, and the ancient book for non-punctuation symbol is obtained. Text content;
[0173] Step 10, build a language model ancient books reconstruction algorithm, including the error correction language model, and the fractal language model, error correction and selection of ancient book content for non-punctuation symbols. like Figure 5 As shown, the specific is as follows:
[0174] (1) Bert-base-chinese language model based on modern text, use the ancient text data set of 殆 Zhixi Huichen as a domain tang, and perform an unaffected language model based on masking language model. .
[0175] (2) Based on the above acquired error correction language model, the ancient...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap