Text detection method and device, electronic equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a feature vector library of rare characters and combining it with OCR technology, the problem of poor recognition effect of OCR in recognizing rare characters has been solved, achieving efficient and accurate recognition of ancient books and other texts, and improving the accuracy and reliability of digitized ancient books and archaeological documents.

CN122244888APending Publication Date: 2026-06-19BEIJING ZITIAO NETWORK TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: BEIJING ZITIAO NETWORK TECH CO LTD
Filing Date: 2024-12-17
Publication Date: 2026-06-19

Application Information

Patent Timeline

17 Dec 2024

Application

19 Jun 2026

Publication

CN122244888A

IPC: G06V30/42; G06V30/19; G06V30/30

AI Tagging

Application Domain

Instruments

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing OCR technology is not effective in recognizing texts containing rare characters, such as those in ancient books, and cannot meet the requirements for efficient and accurate recognition, resulting in insufficient accuracy and reliability of text recognition during the digitization of ancient books.

Method used

By constructing a pre-defined feature vector library for rare characters and performing vector retrieval, combined with optical character recognition (OCR) operations, the target character recognition result is determined by comprehensively utilizing the vector retrieval recall and OCR recognition results from the rare character feature vector library.

Benefits of technology

It improves the accuracy of rare character recognition, enhances the processing capability of images containing rare characters in ancient books, provides technical support for the digitization of ancient books and the identification of archaeological documents, and improves the comprehensiveness and reliability of recognition.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122244888A_ABST

Patent Text Reader

Abstract

This disclosure provides a text detection method, apparatus, electronic device, and storage medium. The method includes: obtaining a first text recognition result corresponding to a text image to be recognized by performing vector retrieval in a preset rare character feature vector library, wherein the preset rare character feature vector library records at least one rare character feature vector and the text corresponding to the rare character vector; obtaining a second text recognition result corresponding to the text image to be recognized by performing optical character recognition (OCR) on the text image to be recognized; and determining a target text recognition result for the text image to be recognized based on the first text recognition result and the second text recognition result. This disclosure, by combining vector retrieval based on a rare character feature vector library with OCR recognition results, can effectively recognize text images containing rare characters, improving the overall accuracy and comprehensiveness of text recognition.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of data processing technology, and in particular to a text detection method, apparatus, electronic device, and storage medium. Background Technology

[0002] Ancient books, as precious cultural heritage, carry profound historical and cultural value. When they exist in physical form, they are easily subject to natural wear and tear and human damage, facing a severe preservation dilemma. The digitization of ancient books has emerged as a solution.

[0003] The digitization of ancient books involves many key technological aspects, among which high-definition scanning, image processing, character recognition, and data encoding are particularly important. These ensure the proper storage, rapid transmission, and accurate retrieval of digitized results. OCR technology, as the core of character recognition and digital encoding, has undergone different stages of development, gradually transitioning from traditional template matching methods to the widely used deep neural network technology of today. However, in practical applications, OCR recognition results vary considerably. For example, in the recognition of ancient texts, due to the diverse handwriting styles, blurred characters, and complex layouts, extremely high demands are placed on OCR technology. Existing technical solutions still struggle to fully meet the needs for efficient and accurate recognition, resulting in insufficient accuracy and reliability of character recognition in the digitization process of ancient books. Summary of the Invention

[0004] This disclosure provides a text detection method, apparatus, electronic device, and storage medium to achieve effective text recognition of text images containing rare characters by combining vector retrieval based on a rare character feature vector library with OCR recognition results. This solves the problem of poor recognition performance of OCR recognition methods when performing text recognition on rare characters.

[0005] In a first aspect, embodiments of this disclosure provide a text detection method, the method comprising:

[0006] The first character recognition result corresponding to the character image to be recognized is obtained by performing vector retrieval in a preset rare character feature vector library. The preset rare character feature vector library records at least one rare character feature vector and the character corresponding to the rare character vector.

[0007] The second character recognition result corresponding to the character image to be recognized is obtained by performing optical character recognition operation on the character image to be recognized.

[0008] Based on the first character recognition result and the second character recognition result, the target character recognition result of the character image to be recognized is determined.

[0009] Secondly, embodiments of this disclosure also provide a text detection device, the device comprising:

[0010] The first recognition module is used to obtain the first character recognition result corresponding to the character image to be recognized by performing vector retrieval in a preset rare character feature vector library. The preset rare character feature vector library records at least one rare character feature vector and the character corresponding to the rare character vector.

[0011] The second recognition module is used to obtain the second character recognition result corresponding to the character image to be recognized by performing optical character recognition operation on the character image to be recognized.

[0012] The text detection module is used to determine the target text recognition result of the text image to be recognized based on the first text recognition result and the second text recognition result.

[0013] Thirdly, this disclosure also provides an electronic device, the electronic device comprising:

[0014] At least one processor; and

[0015] A memory communicatively connected to the at least one processor; wherein,

[0016] The memory stores a computer program that can be executed by the at least one processor, which enables the at least one processor to perform the text detection method described in any of the above embodiments.

[0017] Fourthly, this disclosure also provides a computer-readable medium storing computer instructions that, when executed by a processor, implement the text detection method described in any of the above embodiments.

[0018] The technical solution of this disclosure embodiment utilizes a preset rare character feature vector library during character recognition, providing a dedicated resource for the recognition of rare characters. Vector retrieval can quickly locate similar rare character feature vectors in the preset rare character feature vector library, efficiently recalling possible first character recognition results. This is particularly advantageous when processing images such as ancient books and professional documents containing rare characters. In addition to vector retrieval in the preset rare character feature vector library, optical character recognition (OCR) is performed on the image of the text to be recognized to obtain a second character recognition result. OCR can comprehensively process the text in the image, compensating for the shortcomings of traditional OCR methods in recognizing rare characters. By combining the first and second character recognition results, the target character recognition result is determined. Through mutual verification and supplementation, the accuracy of recognition can be effectively improved, reducing misjudgments that may occur due to insufficient performance of OCR in the presence of rare characters. This enhances the processing capability and reliability of the entire character recognition system for various types of text images, especially those containing rare characters, thus providing strong technical support in areas such as ancient book digitization and archaeological document recognition.

[0019] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0020] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent when taken in conjunction with the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and the originals and elements are not necessarily drawn to scale.

[0021] Figure 1 This is a schematic flowchart of a text detection method provided in an embodiment of this disclosure;

[0022] Figure 2 This is a schematic diagram of a text detection process for character recognition provided in an embodiment of this disclosure;

[0023] Figure 3 This is a schematic diagram of the process of constructing a preset rare character feature vector library during text detection provided in this embodiment of the disclosure;

[0024] Figure 4 This is a schematic diagram of another text detection method provided in this embodiment of the disclosure;

[0025] Figure 5 This is a flowchart illustrating the determination of the reliability threshold for vector retrieval and OCR during text detection, as provided in an embodiment of this disclosure.

[0026] Figure 6 This is a flowchart illustrating the determination of the baseline effect of vector retrieval and OCR during text detection, as provided in this embodiment of the disclosure.

[0027] Figure 7 This is a schematic diagram of the structure of a text detection device provided in an embodiment of this disclosure;

[0028] Figure 8 This is a schematic diagram of the structure of an electronic device that implements a text detection method according to an embodiment of this disclosure. Detailed Implementation

[0029] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.

[0030] It should be understood that the steps described in the method embodiments of this disclosure may be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of this disclosure is not limited in this respect.

[0031] The term "comprising" and its variations as used herein are open-ended inclusions, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Definitions of other terms will be given in the description below.

[0032] It should be noted that the concepts of "first" and "second" mentioned in this disclosure are used only to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or their interdependencies.

[0033] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0034] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

[0035] It is understood that before using the technical solutions disclosed in the various embodiments of this disclosure, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in this disclosure in an appropriate manner in accordance with relevant laws and regulations, and user authorization should be obtained.

[0036] For example, upon receiving a user's active request, a prompt message is sent to the user to explicitly inform them that the requested operation will require the acquisition and use of the user's personal information. This allows the user to independently choose whether to provide personal information to the software or hardware, such as the electronic device, application, server, or storage medium performing the operations of this disclosed technical solution, based on the prompt message.

[0037] As an optional but non-limiting implementation, in response to a user's active request, sending a prompt message to the user can be done via a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide personal information to the electronic device.

[0038] It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation of this disclosure. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this disclosure.

[0039] It is understood that the data involved in this technical solution (including but not limited to the data itself, the acquisition or use of the data) shall comply with the requirements of relevant laws, regulations and related provisions.

[0040] Figure 1 This is a flowchart illustrating a text detection method provided in an embodiment of the present disclosure. This embodiment is applicable to the situation of text recognition of text images containing rare characters, especially in the process of digitizing ancient books and the situation of text recognition of text images corresponding to ancient books or archaeological materials containing rare characters. The text detection method can be executed by a text detection device, which can be implemented in the form of software and / or hardware, and is generally integrated on any electronic device with network communication function, such as a mobile terminal, PC, or server.

[0041] like Figure 1 As shown, the text detection method of this disclosure embodiment may include the following process:

[0042] S110. The first character recognition result corresponding to the character image to be recognized is obtained by performing vector retrieval in the preset rare character feature vector library. The preset rare character feature vector library records at least one rare character feature vector and the character corresponding to the rare character vector.

[0043] Among them, the preset rare Chinese character feature vector library is a pre-constructed database that stores the feature vectors of rare Chinese characters and the rare Chinese characters themselves corresponding to the rare Chinese characters. The rare Chinese character feature vector is to transform the specific features of the rare Chinese character (such as the feature of the change in the thickness of the strokes, the shape of the strokes, the distribution of the spatial layout of the strokes, etc.) into a vector form through a specific mathematical method. For example, for the character "龘", its feature vector may contain the representation in the mathematical space of information such as the number of strokes, the direction of the strokes, and the relative positional relationship between each component. Through these rare Chinese character feature vectors, retrieval and matching operations can be performed in the rare Chinese character feature vector library.

[0044] Among them, vector retrieval recall can be to, in the preset rare Chinese character feature vector library, after transforming the text image to be recognized into a feature vector of the image to be recognized, find the rare Chinese character feature vector similar to the feature vector of the image to be recognized in the preset rare Chinese character feature vector library. The similarity is measured by calculating the vector distance (such as Euclidean distance, cosine similarity, etc.) between the rare Chinese character feature vector and the feature vector of the image to be recognized. The closer the vector distance, the higher the similarity. Thus, the character corresponding to the rare Chinese character feature vector most similar to the feature vector of the text image to be recognized is recalled as the first text recognition result. For example, if the feature vector of the text image to be recognized after processing has the closest distance to the feature vector of the character "龘" in the preset rare Chinese character feature vector library, then the character "龘" can be recalled as the first text recognition result.

[0045] In an optional example, the text image to be recognized can be first subjected to feature extraction to be transformed into a feature vector, then vector retrieval is performed in the preset rare Chinese character feature vector library, and by calculating the similarity between the feature vector of the text image to be recognized and each rare Chinese character feature vector in the preset rare Chinese character feature vector library, according to a certain similarity threshold or sorting rule, the rare Chinese characters that meet the similarity conditions are recalled as the first text recognition result. This step makes full use of the targeted resources of the rare Chinese character feature vector library and improves the professionalism and efficiency of the recognition of rare Chinese characters in the process of text recognition.

[0046] As an optional but non-limiting implementation method, the construction process of the preset rare Chinese character feature vector library includes the following steps A1 - A3:

[0047] Step A1: Obtain various reference font files containing text extended areas. The text extended areas of the reference font files include multiple rare Chinese characters, and at least one of the following attribute contents between each reference font file is different: stroke width, stroke direction, or connected stroke connection.

[0048] Step A2: Determine each font glyph corresponding to the rare Chinese character in the reference font file, and determine each font text image corresponding to the rare Chinese character according to each font glyph corresponding to the rare Chinese character.

[0049] Step A3: Extract image features from each font text image corresponding to rare Chinese characters to obtain the feature vectors of rare Chinese characters corresponding to the rare Chinese characters and the texts corresponding to the feature vectors of rare Chinese characters, so as to obtain a preset feature vector library of rare Chinese characters.

[0050] Among them, the text expansion area included in the reference font file is an area expanded on the basis of the conventional character encoding range in order to cover more special characters (such as rare Chinese characters, ancient Chinese characters, etc.). The reference font file is a font file containing such a text expansion area, which means that these fonts can display and process special characters such as rare Chinese characters that are difficult to present in conventional fonts. For example, some unique font files used in ancient book typesetting contain a large number of rare Chinese characters in the text expansion area to ensure that the content of ancient books can be presented completely and accurately. At least one of the stroke width, stroke direction, or connecting stroke connection among each reference font file needs to be different and cannot all be the same. Among them, the stroke width refers to the thickness of the stroke line when writing or drawing text; the stroke direction refers to the writing direction and trajectory of the text stroke; the connecting stroke connection refers to the connection method and transition effect between strokes during the writing process of text.

[0051] Among them, collect various suitable reference font files. The key to these reference font files is that they contain a text expansion area and there are multiple rare Chinese characters in it. And ensure that at least one of the stroke width, stroke direction, or connecting stroke connection among each reference font file is different. This is done to be able to obtain rich and diverse presentation forms of rare Chinese characters with different visual characteristics in the future, so as to cover more feature situations of rare Chinese characters and lay a foundation for constructing a comprehensive and accurate feature vector library of rare Chinese characters. For example, search for and download qualified font files from different font design websites, font resource libraries, etc. For example, collect traditional font styles such as Song typeface and Kai typeface that are more regular but have different attributes in the display of rare Chinese characters, and also collect some font files with artistic creative styles and unique designs in the above attributes.

[0052] Among them, the font glyph can refer to the specific shape and appearance of the text presented in a specific font. For the same rare Chinese character in different reference font files, due to factors such as font style and design concept, its font glyph will be different. For example, for the character "龘", its glyph in the Song typeface is different from its glyph in the Kai typeface or a certain artistic font in terms of stroke form, structural layout, etc. The font text image can be an image material that presents the text in the font in the form of an image, retaining the visual information such as the shape and structure of the text in this font and existing in the form of an image representation such as a pixel matrix. Each font text image corresponds to a specific text in a specific font.

[0053] After obtaining the reference font file, for the rare characters in the reference font file, it is necessary to determine the font glyphs corresponding to each rare character. This is because the same rare character has different appearance shapes in different fonts. It is necessary to find these different glyphs and then, based on these glyphs, convert them into corresponding font text images. In other words, the glyphs of the characters are presented in the form of images, which facilitates subsequent processing using computer vision technology.

[0054] In one optional example, see Figure 2 The process of determining the glyphs of rare characters in a reference font file includes the following steps: For each glyph of a rare character in a reference font file, read the character mapping table of the reference font file, remove the symbols, punctuation marks, radicals and compatible characters included therein, and obtain the encoding and graphic representation of all remaining characters; initialize a white background image of a specified size for each character, and render a black glyph at a uniform starting coordinate point on the white background image, which serves as the glyph of each rare character in the reference font file.

[0055] In one optional example, see Figure 2 The process involves determining the corresponding font images for each rare character based on the glyphs of the various fonts. This includes: rendering black glyphs at a uniform starting point on a white background image as reference font glyphs for each rare character; identifying the black-and-white color difference; finding the smallest rectangular area covered by the glyphs and cropping it, retaining only the area containing the actual content as a pure glyph image; expanding the smallest side of the cropped image outward from the image center to be equal to the longest side, or using the longer side of the image rectangle as a standard, expanding the shorter width or height horizontally or vertically from the image center point, so that the image width and height are equal and the graphic is centered, resulting in a uniformly sized and centered text image as the corresponding font image for each rare character.

[0056] This process involves extracting image features from the previously obtained images of rare characters in various fonts. Using relevant image feature extraction algorithms (such as deep learning algorithms based on convolutional neural networks or traditional algorithms based on edge and contour detection), key information representing the features of rare characters is extracted from the images. This feature information is then converted into rare character feature vectors, and the corresponding text content is recorded. Through processing rare characters in numerous different reference font files, a pre-defined rare character feature vector library is constructed. This library stores a large number of feature vectors for rare characters and their corresponding text, allowing for subsequent operations such as retrieval and recognition of rare characters.

[0057] By collecting reference font files with varying attributes and extracting feature vectors for rare characters, the feature representation of rare characters becomes richer and more diverse. Different stroke widths, stroke directions, and stroke connections bring multi-dimensional feature information, enabling a more comprehensive characterization of rare characters. The constructed pre-defined rare character feature vector library can be used in rare character recognition scenarios. For example, in a text image recognition system, when encountering an image of a rare character, vector retrieval can be performed in this library to find the text corresponding to the matching rare character feature vector. This effectively compensates for the shortcomings of conventional text recognition methods in recognizing rare characters, improving the overall ability to recognize rare characters.

[0058] As an optional but not limited implementation, the first character recognition result corresponding to the image of the character to be recognized is obtained by vector retrieval in a preset rare character feature vector library, including the following steps B1-B3:

[0059] Step B1: Determine the feature vector of the image to be recognized, and perform vector retrieval to retrieve multiple first candidate characters in the preset rare character feature vector library based on the feature vector of the image to be recognized.

[0060] Step B2: Determine the first target text from multiple first candidate texts. The vector distance between the first target text and the feature vector of the image to be recognized is less than the vector distance between the remaining texts (excluding the first target text) and the feature vector of the image to be recognized.

[0061] Step B3: Based on multiple first candidate characters and the first target character, determine the first character recognition result corresponding to the image of the character to be recognized.

[0062] Among them, the feature vector of the image to be recognized is a mathematical representation obtained after feature extraction of the text image to be recognized. By using image feature extraction algorithms (such as deep learning methods based on convolutional neural networks or traditional algorithms based on texture and shape feature extraction), key features such as the stroke structure, outline shape, and spatial layout of the text are extracted from the text image to be recognized, and these features are expressed in the form of vectors.

[0063] Among them, multiple rare Chinese characters that may match the characters in the text image to be recognized obtained after the vector retrieval and text recall operation are used as the first candidate characters. Since vector retrieval is filtered based on similarity, these first candidate characters recalled by vector retrieval all have a certain possibility of being the characters actually presented in the text image to be recognized. However, further screening is still needed to determine the final result. For example, rare Chinese characters such as "鱻", "靐", and "龘" are recalled by vector retrieval as the first candidate characters, which have a certain similarity with the feature vector of the image to be recognized in terms of the feature vector. The first target character can be a rare Chinese character selected from multiple first candidate characters and having the smallest vector distance from the feature vector of the image to be recognized.

[0064] In an optional example, refer to Figure 3 , for any given text image to be recognized, perform grayscale and binarization processing operations on the text image to be recognized to achieve the effect of color normalization. Furthermore, perform normalization processing on the size layout of the text image to be recognized so that the size of the text image to be recognized is consistent with the size of the font text image to which the rare Chinese characters in the preset rare Chinese character feature vector library belong. Then, apply image feature extraction technology to the text image to be recognized to extract the feature vector of the image to be recognized that can represent the characters in this text image, and use this feature vector of the image to be recognized as a vector retrieval condition to perform vector retrieval and text recall operations in the preset rare Chinese character feature vector library.

[0065] In an optional example, refer to Figure 3 Determining the first target character from multiple first candidate characters includes: after obtaining multiple first candidate characters, calculating the vector distance between the feature vector of the rare Chinese character corresponding to each first candidate character and the feature vector of the image to be recognized, comparing the magnitudes of these distances, and finding the first candidate character with the smallest distance as the first target character. Furthermore, based on the determined multiple first candidate characters and the finally selected first target character, comprehensively consider various factors (although more comprehensive consideration details are not elaborated in the current description, in actual applications, factors such as the similarity weight and appearance frequency of different candidate characters can also be combined), to determine the first text recognition result corresponding to the text image to be recognized. This result is a preliminary and relatively accurate recognition judgment on the characters in the text image to be recognized after this series of operations based on vector retrieval and screening.

[0066] As an optional but non-limiting implementation manner, determining the feature vector of the image to be recognized of the text image to be recognized includes the following steps: performing image feature extraction on the image to be recognized based on an inverted residual structure and a depthwise separable neural network model to obtain the feature vector of the image to be recognized corresponding to the image to be recognized.

[0067] As an optional but not limited implementation, a vector retrieval and recall operation is performed in a preset rare character feature vector library based on the feature vector of the image to be recognized to obtain multiple first candidate characters, including the following steps C1-C2:

[0068] Step C1: Calculate the vector distance between the feature vector of the image to be recognized and the feature vectors of each rare character in the preset rare character feature vector library, and sort the feature vectors of each rare character according to the vector distance.

[0069] Step C2: Sort the rare character feature vectors according to the vector distances corresponding to each rare character feature vector, perform vector retrieval from each rare character feature vector to retrieve multiple target rare character feature vectors, and determine multiple first candidate characters based on the characters corresponding to the multiple target rare character feature vectors.

[0070] Among them, see Figure 3 After sorting the feature vectors of rare characters in a pre-defined rare character feature vector library according to vector distance, the rare character feature vectors are determined according to a specific recall rule (such as selecting several that are relatively close). The rare characters corresponding to these vectors form the basis for subsequently determining the first candidate characters; they are feature vectors of rare characters selected from the entire library that are relatively similar to the feature vectors of the image to be identified. For example, after sorting by vector distance from smallest to largest, the top 100 rare character feature vectors are selected as the target rare character feature vectors, and the rare characters corresponding to these 100 vectors become multiple first candidate characters for further analysis.

[0071] For example, see Figure 3 Using the feature vectors of the image to be identified, a vector retrieval recall operation is performed in a pre-defined rare character feature vector library. The multiple target rare character feature vectors obtained from the vector retrieval are sorted according to vector distance, and each result is accompanied by a text character label. Next, deduplication is performed on the top 100 target rare character feature vectors after sorting based on the text character encoding. Then, the top 10 of the deduplicated target rare character feature vectors are selected as multiple first candidate characters, thus constructing a vector retrieval candidate set. Each first candidate character includes the text character and its corresponding vector distance information, with the first-ranked first candidate character being the first target character.

[0072] By calculating the vector distance and sorting, it is possible to accurately determine the rare Chinese character feature vector in the preset rare Chinese character feature vector library that is most similar to the feature vector of the image to be recognized, thereby effectively recalling the target rare Chinese character feature vector and its corresponding character that are most likely to match the image of the text to be recognized. This greatly improves the pertinence and accuracy of rare Chinese character recognition, avoids blind searching in the entire library, reduces unnecessary calculations and incorrect judgments. Based on the vector distance sorting for vector retrieval and recall, it is possible to quickly screen out a few of the most promising first candidate characters from a large number of rare Chinese character feature vectors, rather than comparing each rare Chinese character in the library in detail one by one. This targeted screening method significantly reduces the computational amount and processing time in the recognition process, and improves the operating efficiency of the entire rare Chinese character recognition system.

[0073] S120. Obtain a second text recognition result corresponding to the image of the text to be recognized by performing an optical character recognition operation on the image of the text to be recognized.

[0074] Among them, perform an optical character recognition (OCR) operation on the image of the text to be recognized. Through the optical character recognition (OCR) operation, a series of image processing operations (such as grayscale conversion, noise reduction, binarization, character segmentation, etc.) and feature extraction can be performed. According to the extracted features, a character model is matched to recognize the characters in the image of the text to be recognized, and a second text recognition result is obtained. For example, for the optical character recognition (OCR) operation, it can also be performed on an image of a text containing the character "dá". After OCR recognition processing, the character is recognized and output as the second text recognition result.

[0075] In an optional example, perform optical character recognition on the image of the text to be recognized. From image preprocessing to character recognition, it is not limited to rare Chinese characters, and all characters in the image of the text to be recognized are recognized to obtain a second text recognition result. This step can cover common characters and some rare Chinese characters in the image, providing a wider range of text recognition and avoiding omissions. Among them, the optical character recognition operation and the vector retrieval and recall operation can be executed in parallel or sequentially.

[0076] As an optional but non-limiting implementation, obtaining a second text recognition result corresponding to the image of the text to be recognized by performing an optical character recognition operation on the image of the text to be recognized includes the following steps D1 - D2:

[0077] Step D1. Perform an optical character recognition operation on the image of the text to be recognized to obtain multiple second candidate characters. The multiple second candidate characters are determined by screening according to the recognition confidence of each text recognition result when performing the optical character recognition operation on the image of the text to be recognized.

[0078] Step D2: Determine the second target text corresponding to the text image to be recognized from multiple second candidate texts, where the recognition confidence of the second target text is greater than the recognition confidence of the remaining texts other than the second target text among the multiple second candidate texts.

[0079] Among them, the second candidate texts are multiple possible text results determined after performing an optical character recognition operation on the text image to be recognized and passing certain screening conditions. Here, the screening conditions are based on the recognition confidence of each text recognition result. The recognition confidence reflects an evaluation index for the accuracy of the text recognition result in OCR recognition. Generally, the higher the value, the more reliable the recognition result. For example, after performing OCR recognition on a text image containing rare characters, multiple texts and their corresponding recognition confidences are obtained. Select the texts whose confidences meet certain requirements as the second candidate texts. For example, the character "龘" is recognized with a confidence of 0.7, and the character "鱻" is recognized with a confidence of 0.6. If the set screening condition is that the confidence is greater than 0.5, then these two characters will become the second candidate texts. The second target text can be the result further screened out from the multiple second candidate texts and is considered to be the most likely actual text in the text image to be recognized. The basis for screening is the recognition confidence corresponding to each text. The recognition confidence of the second target text should be higher than the recognition confidences of all other second candidate texts.

[0080] For the above method, by performing an optical character recognition operation on the text image to be recognized, a relatively comprehensive recognition process can be carried out on the text in the image, not limited to rare characters or specific types of texts. Whether it is common characters, rare characters, handwritten or printed texts, there is a chance to be recognized, expanding the scope of text recognition and providing more possibilities for finally determining an accurate recognition result.

[0081] S130: Determine the target text recognition result of the text image to be recognized according to the first text recognition result and the second text recognition result.

[0082] Among them, after obtaining the first text recognition result and the second text recognition result, a comprehensive analysis of the first text recognition result and the second text recognition result can be carried out. Multiple methods can be used to determine the target text recognition result. For example, compare whether the first text recognition result and the second text recognition result are consistent. If they are consistent, directly determine either the first text recognition result or the second text recognition result as the target text recognition result of the text image to be recognized; if they are inconsistent, weighted summation and other methods can be used according to pre-set weight rules (for example, the vector retrieval result has a higher weight because it is more professional for rare characters, or weights are assigned according to their confidences). Finally, determine the target text recognition result of the text image to be recognized. Through this comprehensive judgment, the accuracy and reliability of the recognition are improved.

[0083] The above methods include a pre-set feature vector library for rare characters and vector retrieval specifically for rare characters, which can effectively recall the recognition results of rare characters. OCR operation can also supplement the recognition of common characters and some rare characters. The combination of the two greatly improves the success rate of rare characters in the entire text image recognition, especially when processing text images rich in rare characters such as ancient books, professional documents, and cultural heritage materials. The text recognition results obtained by the two different methods can be mutually verified and supplemented.

[0084] As an optional but not limited implementation, the target character recognition result of the image to be recognized is determined based on the first character recognition result and the second character recognition result, including the following steps E1-E2:

[0085] Step E1: Determine the first target character in the first character recognition result and the second target character in the second character recognition result.

[0086] Step E2: By performing reliability testing on the first target text and the second target text, the text recognition result of the image to be recognized is determined.

[0087] Among them, see Figure 3 Reliability detection of the first and second target texts is an operation process that evaluates the accuracy and credibility of the first and second target texts. By comparing the feature matching degree between the two target texts and the text image to be recognized, it is determined whether the two target texts conform to semantic logic, etc. The purpose is to determine which of the two target texts or whether it is necessary to combine the two target texts to determine the final text recognition result of the text image to be recognized, thereby improving the accuracy and reliability of the entire text recognition.

[0088] In one optional example, see Figure 3 The system assesses the degree of visual feature matching between the first and second target characters and the text image to be recognized, such as the shape of strokes, structural layout, and overall outline, to determine if they meet the preset matching degree. If the text image to be recognized contains contextual information, it analyzes whether the first and second target characters conform to semantic logic and language expression habits when placed in the context. Through these multi-faceted detection and analysis, the reliability of the first and second target characters is comprehensively judged. If one of the target characters shows a significant advantage in all aspects, it can be determined as the final text recognition result of the text image to be recognized; if both have their advantages and disadvantages or there is some uncertainty, further fusion strategies can be used to determine the final text recognition result, thereby ensuring that the accuracy and reliability of the entire text recognition process are maximized.

[0089] By identifying the first and second target texts and performing reliability checks, the advantages of both vector retrieval recognition and OCR recognition can be fully utilized. Vector retrieval recognition may have high accuracy when handling rare characters or characters with specific font styles, while OCR recognition has good coverage and recognition capabilities for common characters and general image text. Combining the two can complement each other, improving the comprehensiveness and accuracy of recognizing various types of text images. By performing reliability checks on the two target texts and evaluating and verifying them from multiple dimensions, erroneous recognition results caused by image quality issues, recognition algorithm errors, etc., can be effectively eliminated, improving the accuracy and reliability of the final text recognition result.

[0090] The technical solution of this disclosure embodiment utilizes a preset rare character feature vector library during character recognition, providing a dedicated resource for the recognition of rare characters. Vector retrieval can quickly locate similar rare character feature vectors in the preset rare character feature vector library, efficiently recalling possible first character recognition results. This is particularly advantageous when processing images such as ancient books and professional documents containing rare characters. In addition to vector retrieval in the preset rare character feature vector library, optical character recognition (OCR) is performed on the image of the text to be recognized to obtain a second character recognition result. OCR can comprehensively process the text in the image, compensating for the shortcomings of traditional OCR methods in recognizing rare characters. By combining the first and second character recognition results, the target character recognition result is determined. Through mutual verification and supplementation, the accuracy of recognition can be effectively improved, reducing misjudgments that may occur due to insufficient performance of OCR in the presence of rare characters. This enhances the processing capability and reliability of the entire character recognition system for various types of text images, especially those containing rare characters, thus providing strong technical support in areas such as ancient book digitization and archaeological document recognition.

[0091] Figure 4 This is a flowchart illustrating another text detection method provided in this embodiment. The technical solution of this embodiment further optimizes the process of determining the text recognition result of the text image to be recognized by performing reliability detection on the first target text and the second target text in the aforementioned embodiments. This embodiment can be combined with various optional solutions in one or more of the above embodiments.

[0092] like Figure 4 As shown, the text detection method of this disclosure embodiment may include the following process:

[0093] S410. Retrieve the first text recognition result corresponding to the text image to be recognized by performing vector retrieval and recall in a preset rare Chinese character feature vector library. The preset rare Chinese character feature vector library records at least one rare Chinese character feature vector and the text corresponding to the rare Chinese character vector.

[0094] S420. Obtain the second text recognition result corresponding to the text image to be recognized by performing an optical character recognition operation on the text image to be recognized.

[0095] S430. Determine the first target text in the first text recognition result and the second target text in the second text recognition result.

[0096] S440. Determine the vector distance of the first target text. The vector distance of the first target text is based on the vector distance between the first target text and the feature vector of the image to be recognized corresponding to the text image to be recognized.

[0097] Among them, referring to Figure 3 , the vector distance between the rare Chinese character feature vector corresponding to the first target text and the feature vector of the image to be recognized corresponding to the text image to be recognized can be calculated, and this is used as the vector distance of the first target text to judge the similarity degree of the first target text and the text image to be recognized in terms of features. The smaller the vector distance value, the more similar they are, and vice versa, the lower the similarity. For example, if the Euclidean distance between the feature vector corresponding to the first target text "dá" and the feature vector of the image to be recognized is used to calculate the vector distance value of the first target text as 0.1, it indicates that they are relatively close in features; if the vector distance value of the first target text is 0.5, the similarity is relatively weak.

[0098] Among them, the preset vector distance is a preset numerical threshold, which is a key condition for judging the reliability of the text retrieved by vector retrieval and recall in the preset rare Chinese character feature vector library. When the vector distance between the first target text and the feature vector of the image to be recognized is less than this preset vector distance, it means that the matching degree of the first target text and the text image to be recognized from the perspective of vector retrieval does not reach a certain reliability standard; while when the vector distance is greater than or equal to this preset vector distance, it is considered that the first target text has a certain reliability in vector retrieval, and other factors can be further combined to comprehensively judge the final text recognition result. For example, if the preset vector distance is set to 0.3, if the vector distance of the first target text is 0.4, it exceeds this threshold and its reliability is in doubt; if the vector distance is 0.2, it is within this threshold range and is relatively more reliable.

[0099] S450. If the vector distance of the first target text is less than the preset vector distance, then the first target text is determined as the text recognition result of the text image to be recognized. The preset vector distance is a threshold condition used to make a reliability judgment on the text retrieved by vector retrieval in the preset rare character feature vector library.

[0100] Among them, see Figure 3 After obtaining the vector distance of the first target text, it is compared with a preset vector distance. If the vector distance of the first target text is less than the preset vector distance, it means that from the perspective of vector retrieval, the first target text and the image to be identified have a relatively high degree of feature matching, and its reliability is high. In this case, the first target text and the image to be identified have a high degree of similarity in features, which has reached the preset reliability standard. Therefore, the first target text can be directly identified as the text recognition result of the image to be identified, that is, it is considered that the first target text is likely the actual text presented in the image to be identified, thus completing the text recognition judgment process.

[0101] S460. If the vector distance of the first target text is not less than the preset vector distance, then the recognition confidence of the second target text and the preset recognition confidence are determined, and the text recognition result of the text image to be recognized is determined based on the recognition confidence of the second target text and the preset recognition confidence. The recognition confidence of the second target text is determined based on the recognition confidence of the second target text when performing text recognition operation on the text image to be recognized. The preset recognition confidence is a threshold condition used to judge the reliability of the text obtained by performing optical character recognition operation on the text image to be recognized.

[0102] The recognition confidence score of the second target text is used to characterize the probability that the text recognition result of the second target text is correct when performing Optical Character Recognition (OCR) on the image to be recognized. The preset recognition confidence score can be a pre-defined value, a threshold condition used to judge the reliability of the text obtained from the OCR operation on the image to be recognized. When the recognition confidence score of the second target text is greater than this preset confidence score, it indicates that it is relatively reliable at the OCR recognition level; conversely, when the recognition confidence score of the second target text is not greater than this preset confidence score, further consideration or combination of other factors is needed to determine the final text recognition result. For example, the preset recognition confidence score can be set to 0.6 to measure the reliability of the OCR recognition result.

[0103] Among them, see Figure 3When the vector distance of the first target text is not less than the preset vector distance, it indicates that the reliability of the first target text retrieved based on vector retrieval is not high enough, and it is necessary to refer to the situation of optical character recognition (OCR) for comprehensive judgment. First, the recognition confidence of the second target text needs to be determined. This recognition confidence is a value that reflects the probability of its accuracy when the OCR recognition operation was performed on the image of the text to be recognized. At the same time, the preset recognition confidence is obtained as a pre-set threshold condition. Then, the recognition confidence of the second target text is compared with the preset recognition confidence. If the recognition confidence of the second target text is greater than the preset recognition confidence, then the second target text can be identified as the text recognition result of the image of the text to be recognized, and the OCR recognition result is considered more reliable in this case. If the recognition confidence of the second target text is not greater than the preset recognition confidence, the text content in the image of the text to be recognized can be further determined by executing the following steps F2-F4. The implementation process will be described in detail later. Of course, a preset auxiliary judgment method (such as combining more contextual information) can also be selected to further determine the text content in the image of the text to be recognized.

[0104] By setting two threshold conditions—preset vector distance and preset recognition confidence—the reliability of two different text recognition methods, vector retrieval and optical character recognition, is assessed separately. This avoids the limitations of relying solely on a single recognition method or judgment standard to determine the text recognition result. It comprehensively considers the accuracy of text recognition from multiple perspectives, improving the reliability of the entire text recognition process. Based on the comparison between the vector distance of the first target text and the preset vector distance, the system flexibly decides whether to directly use the result of vector retrieval or further refer to OCR recognition to determine the final text recognition result. This mechanism allows the entire text recognition system to better handle various complex text image situations. By comprehensively considering the two key factors of vector distance in vector retrieval and recognition confidence in OCR recognition, it can more accurately select the result most likely to be the actual text in the text image to be recognized, especially in handling complex text images containing rare characters, diverse writing styles, or poor image quality that may lead to misrecognition.

[0105] Based on the above embodiments, optionally, the text detection method of this disclosure may further include the following steps K1-K2:

[0106] Step K1: Within the range of vector distance values in vector retrieval, the vector distance is progressively decreased by a preset step size. At each progressive decrease, reference verification data with vector distances greater than the corresponding vector distances after each progressive decrease are selected from the reference verification data set as the first target data set. The first accuracy and the second accuracy of the first target data set are determined. The first accuracy is the statistical result of the accuracy of the text retrieved by vector retrieval for each preset text image in the first target data set. The second accuracy is the statistical result of the accuracy of the text obtained by optical character recognition for each preset text image in the first target data set.

[0107] Step K2: Based on the first accuracy and second accuracy of the first target data set during each progressive decrease, determine the critical value for transitioning from the first state to the second state as a preset vector distance. The first state is the vector distance state that makes the first accuracy of the first target data set greater than the second accuracy, and the second state is the vector distance state that makes the first accuracy of the first target data set greater than the second accuracy.

[0108] Each reference verification data set includes the actual text corresponding to the preset text image, the first preset recognition error of the preset text image, the reference vector distance of the preset text image, the second preset recognition error of the preset text image, and the reference recognition confidence of the preset text image. The first preset recognition error is used to indicate whether the text retrieved by vector retrieval of the preset text image is correct. The reference vector distance is used to indicate the vector distance between the text retrieved by vector retrieval of the preset text image and the actual text corresponding to the preset text image. The second preset recognition error is used to indicate whether the text obtained by optical character recognition of the preset text image is correct. The reference recognition confidence is used to indicate the confidence when the second preset recognition character is obtained by optical character recognition of the preset text image.

[0109] For example, see Figure 5Within the range of vector distance values involved in vector retrieval, a progressively decreasing value operation is performed with a preset step size of 1e-2. Within each progressively decreasing vector distance, reference verification data is selected where the vector distance of the vector-retrieval recognized character in the reference verification dataset is greater than the vector distance corresponding to each progressively decreasing step, thus constructing the first target dataset. Subsequently, based on the correctness of the vector-retrieval recognized character and the OCR recognized character corresponding to each reference verification data in the first target dataset, the accuracy rates of both vector retrieval and OCR are calculated. Through meticulous comparative analysis of these two accuracy rates, the superior recognition method—vector retrieval or OCR—is accurately determined at each progressively decreasing vector distance. By comprehensively acquiring the optimal recognition method corresponding to the vector distance of vector retrieval at each progressively decreasing step value, it is clearly observed that as the vector distance corresponding to each progressively decreasing step gradually decreases, the superior recognition method gradually shifts from vector retrieval to OCR. At this point, this crucial transition threshold is precisely determined and set as the reliability threshold for vector retrieval, resulting in a preset vector distance. Thus, when the vector distance of vector retrieval exceeds the preset vector distance threshold, it becomes clear that the recognition effect of vector retrieval is better than that of OCR. Therefore, in practical applications, this threshold allows for flexible selection of the more suitable recognition method.

[0110] Based on the above embodiments, optionally, the text detection method of this disclosure may further include the following steps K3-K4:

[0111] Step K3: Within the range of recognition confidence values for optical character recognition, the recognition confidence is progressively decreased by a preset step size. At each progressive decrease, reference verification data with recognition confidence values lower than the recognition confidence values corresponding to each progressive decrease are selected from the reference verification data set as the second target data set. The third accuracy and the fourth accuracy of the second target data set are determined. The third accuracy is the statistical result of the accuracy of obtaining text by optical character recognition of each preset text image in the second target data set. The fourth accuracy is the statistical result of the accuracy of retrieving text by vector retrieval of each preset text image in the second target data set.

[0112] Step K4: Based on the third and fourth accuracies of the second target data set during each progressive decrease, determine the critical value for transitioning from the third state to the fourth state as the preset recognition confidence level. The third state is the recognition confidence level that makes the third accuracy of the second target data set greater than the fourth accuracy level, and the fourth state is the recognition confidence level that makes the third accuracy of the second target data set greater than the fourth accuracy level.

[0113] For example, see Figure 5Within the range of OCR confidence values, a progressively decreasing operation is performed with a preset step size of 1e-6. During each progressive decrease, reference verification data items in the reference verification dataset whose OCR recognition confidence is lower than the recognition confidence corresponding to each progressive decrease are selected, thus constructing a corresponding second target dataset. Based on the correctness of the OCR-recognized characters and vector retrieval-recognized characters of each reference verification data item in the second target dataset, the accuracy of OCR and vector retrieval is calculated separately. By comparing the accuracy of the two, it is accurately determined whether the better recognition method corresponding to the recognition confidence at each progressive decrease is OCR or vector retrieval. By further exploring the better recognition method corresponding to the recognition confidence at each progressive decrease step, it is observed that the better recognition method changes from OCR to vector retrieval as the recognition confidence continues to decrease. By accurately defining this key conversion threshold and using it as the reliability threshold of OCR, the preset recognition confidence can be obtained. When the confidence level of the OCR-recognized character is less than the preset confidence level threshold, it can be determined that the recognition effect of OCR is significantly worse than that of vector retrieval. This provides a crucial decision-making basis for rationally selecting the recognition method in different situations, which helps to improve the accuracy and reliability of overall text recognition.

[0114] As an optional but not limited implementation, the text recognition result of the image to be recognized is determined based on the recognition confidence of the second target text and the preset recognition confidence, including the following steps F1-F4:

[0115] Step F1: If the recognition confidence of the second target text is greater than the preset recognition confidence, then the second target text is determined as the text recognition result of the text image to be recognized.

[0116] Step F2: If the recognition confidence of the second target text is not greater than the preset recognition confidence, then determine the first reference attribute information of the first target text. The first reference attribute information is used to indicate whether there is a first reference text among the multiple second candidate texts and the sorting position of the first reference text among the multiple second candidate texts when the first reference text exists. The first reference text is the same as the first target text. The multiple second candidate texts are sorted according to their respective recognition confidence.

[0117] Step F3: Determine the second reference attribute information of the second target text. The second reference attribute information is used to indicate whether there is a second reference text among the multiple first candidate texts and the sorting position of the second reference text among the multiple first candidate texts when the second reference text exists. The second reference text is the same as the second target text. The multiple first candidate texts are sorted according to their respective vector distances.

[0118] Step F4: Determine the text recognition result of the text image to be recognized based on the first reference attribute information and the second reference attribute information.

[0119] Among them, the first reference attribute information is an auxiliary judgment information related to the first target text, which is used to indicate whether there is a text (i.e., the first reference text) identical to the first target text among multiple second candidate texts (the candidate text set obtained by OCR recognition), and if so, the sorting position of the first reference text among the multiple second candidate texts. This sorting position is usually arranged in descending order according to the recognition confidence corresponding to each second candidate text. Through this attribute information, another manifestation of the first target text in the OCR recognition result can be understood. For example, the first target text "龘" exists among multiple second candidate texts and ranks 3rd. This information constitutes part of the first reference attribute information and helps to comprehensively judge the text recognition result.

[0120] Among them, the second reference attribute information is an auxiliary judgment information related to the second target text, which is used to indicate whether there is a text (i.e., the second reference text) identical to the second target text among multiple first candidate texts (the candidate text set obtained by vector retrieval), and if so, its sorting position among the multiple first candidate texts. Here, the sorting is based on the vector distance corresponding to each first candidate text from small to large, which can reflect the relevant situation of the second target text in the vector retrieval recognition process. For example, the second target text "鱻" also exists among multiple first candidate texts and ranks 5th. This is the manifestation of the second reference attribute information and provides more reference basis for finally determining the text recognition result.

[0121] Among them, refer to Figure 3 , obtain the recognition confidence of the second target text, and compare it with the preset recognition confidence. If the recognition confidence of the second target text is greater than the preset recognition confidence, this indicates that at the OCR recognition level, this text has high reliability and accuracy. At this time, directly determine the second target text as the text recognition result of the text image to be recognized, because in this case, the recognition result of this text by the OCR system is highly credible and can be used as the final recognition judgment, thus completing the text recognition process.

[0122] Among them, refer to Figure 3, when the recognition confidence of the second target text is not greater than the preset recognition confidence, it is necessary to further consider the situation of the first target text in the OCR recognition result, that is, to determine the first reference attribute information. First, check whether there is a text identical to the first target text among multiple second candidate texts (these are candidate texts generated during the OCR recognition process), that is, the first reference text. If it exists, then determine the position of this first reference text among the multiple second candidate texts sorted by recognition confidence. For example, if multiple second candidate texts are arranged from high to low confidence as "dá" (confidence 0.7), "xiān" (confidence 0.6), "bìng" (confidence 0.5), etc., and if the first target text "xiān" is among them, then the sorting position of the first target text is the 2nd, and this information is recorded as the first reference attribute information to provide a basis for subsequent comprehensive judgment.

[0123] Similarly, in the case where the vector retrieval result of the second target text is not reliable (that is, the recognition confidence of the second target text is not greater than the preset recognition confidence), it is also necessary to determine the second reference attribute information. Specifically, check whether there is a text identical to the second target text among multiple first candidate texts (candidate texts obtained by vector retrieval), that is, the second reference text. If it exists, determine its position among the multiple first candidate texts sorted by vector distance. For example, if multiple first candidate texts are arranged from small to large vector distance as "bìng" (distance 0.1), "dá" (distance 0.2), "xiān" (distance 0.3), etc., and if the second target text "dá" is among them, its sorting position is the 2nd, and this information is the second reference attribute information, which can help evaluate the text recognition result from another perspective.

[0124] The text recognition result of the image to be recognized is determined by comprehensively considering the first and second reference attribute information obtained earlier. If the first reference attribute information shows that the first target text ranks higher among the second candidate texts, and the second reference attribute information indicates that the second target text ranks higher among the first candidate texts, then the subsequent execution step H3 can compare the ranking positions of the first target text and the second target text among the first candidate texts to determine whether the first or second target text with the higher ranking position is the text recognition result of the image to be recognized. Of course, a preset auxiliary judgment method (such as combining context information) can also be used to further determine the text content in the image to be recognized. If one of the reference attribute information shows that the corresponding target text ranks very low or does not exist in the candidate texts of another recognition method, then the target text with the relatively better ranking will be used as the final recognition result. Through this comprehensive consideration, the text content in the image to be recognized can be determined more comprehensively and accurately, improving the accuracy and reliability of text recognition. If the first reference attribute information shows that the first reference text does not exist among the multiple second candidate texts and the second reference attribute information shows that the second reference text does not exist among the multiple first candidate texts, then the execution step H4 shown later can be further adopted. The vector retrieval effect baseline score where the vector distance of the first target text is located and the optical character recognition effect baseline score where the recognition confidence of the second target text is located can be used to further select the first target text or the second target text as the text recognition result of the text image to be recognized. This will not be elaborated in detail here, but will be elaborated in detail later.

[0125] By considering both primary and secondary reference attribute information, the results of vector retrieval and optical character recognition (OCR) can be deeply integrated and correlated. This allows for in-depth analysis of the relationships between candidate characters generated during the two recognition processes, providing a richer and more comprehensive basis for determining the final character recognition result and improving its accuracy and reliability. In judging the character recognition result, in addition to considering key indicators such as recognition confidence and vector distance, the performance of the target character within the candidate set of the other recognition method is also taken into account. This multi-dimensional comprehensive judgment method effectively reduces misrecognition caused by the limitations or errors of a single recognition method. This method of determining character recognition results based on multiple reference attribute information enables the character recognition system to more flexibly cope with various complex and ever-changing character image recognition scenarios.

[0126] As an optional but not limited implementation, the text recognition result of the image to be recognized is determined based on the first reference attribute information and the second reference attribute information, including the following steps H1-H5:

[0127] Step H1: If the first reference attribute information indicates that a first reference character exists among multiple second candidate characters and the second reference attribute information indicates that a second reference character does not exist among multiple first candidate characters, then the first target character is determined as the character recognition result of the character image to be recognized.

[0128] Step H2: If the first reference attribute information indicates that there is no first reference text among the multiple second candidate texts and the second reference attribute information indicates that there is a second reference text among the multiple first candidate texts, then the second target text is determined as the text recognition result of the text image to be recognized.

[0129] Step H3: If the first reference attribute information indicates that a first reference text exists among multiple second candidate texts and the second reference attribute information indicates that a second reference text exists among multiple first candidate texts, then by comparing the sorting position of the first reference text indicated by the first reference attribute information among multiple second candidate texts with the sorting position of the second reference text indicated by the second reference attribute information among multiple first candidate texts, the first target text or the second target text is selected as the text recognition result of the text image to be recognized.

[0130] Step H4: If the first reference attribute information indicates that there is no first reference text among the multiple second candidate texts and the second reference attribute information indicates that there is no second reference text among the multiple first candidate texts, then determine the vector retrieval effect baseline score where the vector distance of the first target text is located and the optical character recognition effect baseline score where the recognition confidence of the second target text is located.

[0131] Step H5: Based on the vector retrieval baseline score of the vector distance of the first target text and the optical character recognition baseline score of the recognition confidence of the second target text, select the first target text or the second target text as the text recognition result of the image to be recognized.

[0132] Among them, see Figure 6 The vector retrieval performance baseline score is a quantitative evaluation index corresponding to different segments divided by the vector distance of the vector retrieval. Previously, the performance baseline of the vector retrieval vector distance was determined by progressively decreasing the vector distance value range at a certain step size and comparing the accuracy of vector retrieval and OCR under different progressive distance values. The accuracy is divided into different segments, and each segment corresponds to a score. The vector distance of the first target text is in which segment it corresponds to the corresponding vector retrieval performance baseline score, which reflects the relative performance level of vector retrieval under the current vector distance condition and can be used to compare and judge with the OCR recognition performance.

[0133] Among them, see Figure 6The baseline score for optical character recognition (OCR) performance is also a quantitative evaluation metric. Based on the OCR confidence value range, it decreases progressively with a specific step size. The baseline score for OCR confidence is determined by comparing the accuracy of the two recognition methods at different progressive values, dividing the accuracy into segments and assigning corresponding scores. The segment in which the recognition confidence of the second target text falls corresponds to the corresponding baseline score for OCR performance. This score measures the relative merits of OCR recognition at the current OCR confidence level, allowing for comparison and selection with vector retrieval performance.

[0134] Among them, see Figure 3 The process involves obtaining first and second reference attribute information for judgment. If the first reference attribute information indicates that the first reference text exists among multiple second candidate texts, it means that the first target text has a certain presence within the candidate range of OCR recognition; while the second reference attribute information shows that the second reference text does not exist among multiple first candidate texts, meaning that the second target text does not appear in the candidate set of vector retrieval. In this case, it indicates that the first target text has a relative advantage in the correlation consideration between the two recognition methods, and is more comprehensive and representative. Therefore, the first target text is determined as the text recognition result of the image to be recognized, completing the final judgment of this text recognition task and obtaining an accurate text result that can be applied to subsequent text processing and other operations.

[0135] Among them, see Figure 3 When the first reference attribute information shows that the first reference text is not present among multiple second candidate texts (meaning the first target text does not appear in the candidate texts recognized by OCR), while the second reference attribute information indicates that the second reference text is present among multiple first candidate texts (meaning the second target text is reflected in the candidate set of vector retrieval), it indicates that the second target text is more advantageous and relatively more reliable when considering both recognition methods. Therefore, the second target text is determined as the text recognition result of the image to be recognized, and this is used as the final text recognition judgment, providing accurate text information for subsequent related processes.

[0136] Among them, see Figure 3If the first reference attribute information indicates that a first reference text exists among multiple second candidate texts, and the second reference attribute information indicates that a second reference text exists among multiple first candidate texts, then it is necessary to further compare the ranking positions of the two texts in their respective candidate sets. This is because the ranking position reflects, to some extent, the relative merits of the text under the corresponding recognition method. For example, if the first reference text ranks higher among multiple second candidate texts, it indicates that it is closer to the optimal result in the OCR recognition process; while if the second reference text also ranks higher among multiple first candidate texts, it will also perform well in vector retrieval. By comparing these two ranking positions, the target text with the better ranking (either the first or second target text) is selected as the text recognition result of the image to be recognized, thus combining the information from the two recognition methods to obtain the most accurate text judgment.

[0137] Among them, see Figure 3 When the first reference attribute information indicates that the first reference text does not exist among multiple second candidate texts, and the second reference attribute information indicates that the second reference text does not exist among multiple first candidate texts, it is necessary to rely on the vector retrieval effect baseline score and the optical character recognition effect baseline score since it is impossible to judge based on the existence and order of the previous reference texts in the candidate set. First, the vector retrieval effect baseline score at which the vector distance of the first target text is located needs to be determined. This requires finding the corresponding segment score based on the previously established vector retrieval vector distance effect baseline division rules. At the same time, the optical character recognition effect baseline score at which the recognition confidence of the second target text is located needs to be determined. The corresponding score is obtained according to the division of the OCR recognition effect baseline, providing a quantitative basis for the next step of comprehensive judgment.

[0138] Among them, see Figure 3 The baseline score of vector retrieval for the first target text is compared with the baseline score of optical character recognition for the second target text. If the baseline score of vector retrieval is higher, it indicates that vector retrieval performs better than OCR under the current vector distance conditions, and the first target text is selected as the text recognition result of the image to be recognized. Conversely, if the baseline score of optical character recognition is higher, it indicates that OCR recognition is more advantageous under the current OCR confidence level, and the second target text is determined as the text recognition result. By comparing the baseline scores, a relatively accurate final text recognition decision can be made in complex recognition situations, improving the reliability and accuracy of the entire text recognition system.

[0139] By comprehensively utilizing the first and second reference attribute information, as well as the baseline scores of vector retrieval and optical character recognition, this approach fully integrates information from multiple levels of both methods, including the existence and ranking of candidate characters, and baseline scores based on accuracy. It moves beyond simply comparing results from a single recognition method; instead, it comprehensively considers the correlation and differences between the two methods from multiple dimensions. This provides a richer, deeper, and more comprehensive basis for determining character recognition results, significantly improving the accuracy and reliability of the results.

[0140] Based on the above embodiments, optionally, the text detection method of this disclosure may further include the following steps M1-M2:

[0141] Step M1: Within the range of vector distance values in vector retrieval, the vector distance is progressively decreased by a preset step size. At each progressive decrease, reference verification data with a vector distance greater than the corresponding vector distance after each progressive decrease are selected from the reference verification data set as the first target data set. The first accuracy of the first target data set is determined. The first accuracy is the statistical result of the accuracy of the text retrieved by vector retrieval for each preset text image in the first target data set.

[0142] Step M2: Based on the first accuracy of the first target data set during each decreasing progressive step, multiple accuracy segments corresponding to vector retrieval are obtained. The upper and lower limits of each accuracy segment corresponding to vector retrieval are determined according to the vector distance of the first accuracy during the decreasing progressive step corresponding to the upper and lower limits of each accuracy segment. Different accuracy segments corresponding to vector retrieval correspond to different vector retrieval performance baselines.

[0143] Each reference verification data set includes the actual text corresponding to the preset text image, the first preset recognition error of the preset text image, the reference vector distance of the preset text image, the second preset recognition error of the preset text image, and the reference recognition confidence of the preset text image. The first preset recognition error is used to indicate whether the text retrieved by vector retrieval of the preset text image is correct. The reference vector distance is used to indicate the vector distance between the text retrieved by vector retrieval of the preset text image and the actual text corresponding to the preset text image. The second preset recognition error is used to indicate whether the text obtained by optical character recognition of the preset text image is correct. The reference recognition confidence is used to indicate the confidence when the second preset recognition character is obtained by optical character recognition of the preset text image.

[0144] Throughout the processing, a method identical to the aforementioned progressive approach for vector retrieval vector distance is employed. Specifically, a filtering operation is performed within the specific dataset corresponding to the reference validation dataset. For each specific vector distance in each progressively decreasing state, data items whose vector distance for the vector retrieval recognition word is greater than the specific vector distance in each progressively decreasing state are selected. These selected data items are then integrated to form the required first target dataset.

[0145] After successfully constructing the first target dataset, the next crucial step is to calculate the accuracy of vector retrieval. This accuracy calculation is based on the correctness of the vector retrieval characters identified by each reference verification data item in the first target dataset. In other words, it's essential to carefully verify whether the characters obtained through vector retrieval in each data item are correct or incorrect, and then accurately calculate the vector retrieval accuracy value based on the proportion of correct characters to the total number of data items. To more clearly and effectively measure the performance of vector retrieval under different vector distances, the calculated accuracy is further divided into three different intervals: [0, 0.45], (0.45, 0.9), and [0.9, 1]. The specific cumulative values corresponding to the boundaries between these intervals are defined as the baseline performance of vector retrieval at different vector distances. These baselines act as clear markers, helping to intuitively determine the level of performance of vector retrieval under different vector distances, thus providing crucial and reliable reference for subsequent analysis, comparison, and decision-making.

[0146] Based on the above embodiments, optionally, the text detection method of this disclosure may further include the following steps M3-M4:

[0147] Step M3: Within the range of recognition confidence values for optical character recognition, the recognition confidence is progressively decreased by a preset step size. At each progressive decrease, reference verification data with recognition confidence values lower than the recognition confidence values corresponding to each progressive decrease are selected from the reference verification data set as the second target data set. The third accuracy of the second target data set is determined. The third accuracy is the statistical result of the accuracy of optical character recognition of each preset text image in the second target data set.

[0148] Step M4: Based on the third accuracy of the second target data set at each decreasing progressive step, multiple accuracy segments corresponding to optical character recognition are obtained. The upper and lower limits of each accuracy segment corresponding to optical character recognition are determined according to the vector distance of the third accuracy at each decreasing progressive step. Different accuracy segments corresponding to optical character recognition correspond to different baselines of optical character recognition performance.

[0149] Similarly, there is another similar operation, which is equivalent to the aforementioned cumulative method for OCR recognition confidence. In practice, focusing on the reference verification data within the reference verification dataset, for each predetermined decreasing cumulative confidence level, reference verification data items whose OCR recognition confidence is lower than the recognition confidence level corresponding to each decreasing cumulative level are selected and aggregated to form the corresponding second target data set.

[0150] After obtaining this second target dataset, the OCR accuracy is calculated. This accuracy is determined by examining the correctness of the OCR-recognized characters for each reference verification data item in the second target dataset. Specifically, the number of correctly recognized characters in all data items in the target dataset is counted, and then this number is divided by the total number of data items to obtain the OCR accuracy value. To provide a benchmark for OCR performance at different confidence levels, the obtained OCR accuracy is divided into three specific intervals: [0, 0.45], (0.45, 0.9), and [0.9, 1]. The cumulative recognition confidence corresponding to the boundaries of these intervals, similar to the determination of the vector distance performance baseline in vector retrieval, is defined here as the OCR confidence performance baseline. These OCR confidence performance baselines provide a clear reference framework, allowing for accurate assessment of OCR performance based on different confidence levels.

[0151] This disclosed technical solution uses two methods, vector retrieval and OCR recognition, to recognize any text image to be identified. The vector retrieval recognition result data includes the vector-retrieved character (corresponding to the first target text) and a vector retrieval candidate set (corresponding to multiple first candidate characters), each data item containing text characters and vector distance. The OCR recognition result data includes the OCR-recognized character (corresponding to the second target text) and an OCR candidate set (corresponding to multiple second candidate characters), each data item containing text characters and confidence score. Combining the different result data from the two recognition methods, reliability assessment, candidate priority index (corresponding to description using first and second reference attribute information), and baseline performance score assessment are performed sequentially. In each round of evaluation, the vector retrieval or OCR recognition result is determined as the final recognized character. Combining the results of vector retrieval and OCR recognition, a reliability assessment is first performed. Given the known reliability thresholds for both vector retrieval and OCR, the vector distance between the vector-retrieval characters in the image to be recognized and the vector retrieval reliability threshold are compared, as are the confidence scores of the OCR characters and the OCR reliability threshold. This determines whether to use the vector-retrieval or OCR-recognized characters as the text recognition result or proceed to the next step. If the reliability assessment fails, a candidate priority index assessment is performed. By comparing the vector retrieval and OCR candidate priority indices, it is determined whether to use the vector-retrieval or OCR-recognized characters as the text recognition result or proceed to the next step. If there is no difference between the vector retrieval and OCR candidate priority index assessment results, an effectiveness baseline score assessment is performed. Combining the vector retrieval and OCR effectiveness baselines, the vector retrieval and OCR effectiveness scores are obtained, determining the final text recognition result.

[0152] Figure 7 This is a schematic diagram of the structure of a text detection device provided in an embodiment of the present disclosure. The embodiments of the present disclosure are applicable to the situation of text recognition of text images containing rare characters, especially in the process of digitizing ancient books and the situation of text recognition of text images corresponding to ancient books or archaeological materials containing rare characters. The text detection device can be implemented in the form of software and / or hardware, and is generally integrated on any electronic device with network communication function, such as a mobile terminal, PC or server.

[0153] like Figure 7 As shown, the text detection device of this disclosure embodiment may include the following:

[0154] The first recognition module 710 is used to obtain the first character recognition result corresponding to the character image to be recognized by performing vector retrieval in a preset rare character feature vector library. The preset rare character feature vector library records at least one rare character feature vector and the character corresponding to the rare character vector.

[0155] The second recognition module 720 is used to obtain the second character recognition result corresponding to the character image to be recognized by performing optical character recognition operation on the character image to be recognized.

[0156] The text detection module 730 is used to determine the target text recognition result of the text image to be recognized based on the first text recognition result and the second text recognition result.

[0157] Based on the above embodiments, optionally, a first character recognition result corresponding to the character image to be recognized is obtained by vector retrieval in a preset rare character feature vector library, including:

[0158] Determine the feature vector of the image to be identified, and perform a vector retrieval operation in a preset rare character feature vector library based on the feature vector of the image to be identified to obtain multiple first candidate characters;

[0159] A first target character is determined from a plurality of first candidate characters, wherein the vector distance between the first target character and the feature vector of the image to be identified is less than the vector distance between the remaining characters in the plurality of first candidate characters other than the first target character and the feature vector of the image to be identified;

[0160] Based on the plurality of first candidate characters and the first target character, the first character recognition result corresponding to the character image to be recognized is determined.

[0161] Based on the above embodiments, optionally, determining the feature vector of the text image to be recognized includes:

[0162] Image features are extracted from the image to be identified based on the inverse residual structure and a depth-separable neural network model, resulting in the corresponding image feature vector.

[0163] Based on the above embodiments, optionally, a vector retrieval and recall operation is performed in a preset rare character feature vector library according to the feature vector of the image to be identified to obtain multiple first candidate characters, including:

[0164] Calculate the vector distance between the feature vector of the image to be identified and each rare character feature vector in the preset rare character feature vector library, and sort each rare character feature vector according to the vector distance;

[0165] Sort the rare character feature vectors by vector distance, retrieve multiple target rare character feature vectors by vector retrieval from each rare character feature vector, and determine the multiple first candidate characters based on the characters corresponding to the multiple target rare character feature vectors.

[0166] Based on the above embodiments, optionally, a second character recognition result corresponding to the character image to be recognized is obtained by performing optical character recognition operation on the character image to be recognized, including:

[0167] Optical character recognition (OCR) is performed on the image of the text to be recognized to obtain multiple second candidate characters. These multiple second candidate characters are determined by filtering based on the recognition confidence of each character recognition result when performing OCR on the image of the text to be recognized.

[0168] The second target text corresponding to the text image to be recognized is determined from a plurality of second candidate texts, and the recognition confidence of the second target text is greater than the recognition confidence of the remaining texts in the plurality of second candidate texts other than the second target text.

[0169] Based on the above embodiments, optionally, the construction process of the preset rare character feature vector library includes:

[0170] Obtain various reference font files containing extended text regions. The extended text regions of these reference font files include multiple uncommon characters. At least one of the following attributes differs between the various reference font files: stroke width, stroke direction, or stroke connection.

[0171] Determine the glyphs of each font corresponding to the rare characters in the reference font file, and determine the font text images corresponding to each rare character based on the glyphs of each font corresponding to the rare characters;

[0172] Image features are extracted from the font images corresponding to the rare characters to obtain the rare character feature vectors and the characters corresponding to the rare character feature vectors, so as to obtain a preset rare character feature vector library.

[0173] Based on the above embodiments, optionally, the target character recognition result of the character image to be recognized is determined according to the first character recognition result and the second character recognition result, including:

[0174] Identify the first target character in the first character recognition result and the second target character in the second character recognition result;

[0175] The character recognition result of the image to be recognized is determined by performing reliability testing on the first target text and the second target text.

[0176] Based on the above embodiments, optionally, the character recognition result of the character image to be recognized is determined by performing reliability detection on the first target character and the second target character, including:

[0177] The vector distance of the first target text is determined, which is based on the vector distance between the first target text and the feature vector of the image to be identified corresponding to the image of the text to be identified.

[0178] If the vector distance of the first target character is less than the preset vector distance, then the first target character is determined as the character recognition result of the character image to be recognized. The preset vector distance is a threshold condition for making a reliability judgment on the characters retrieved by vector retrieval in the preset rare character feature vector library.

[0179] If the vector distance of the first target character is not less than the preset vector distance, then the recognition confidence of the second target character and the preset recognition confidence are determined, and the character recognition result of the character image to be recognized is determined based on the recognition confidence of the second target character and the preset recognition confidence. The recognition confidence of the second target character is determined based on the recognition confidence of the second target character when performing a character recognition operation on the character image to be recognized. The preset recognition confidence is a threshold condition used to judge the reliability of the characters obtained by performing optical character recognition operation on the character image to be recognized.

[0180] Based on the above embodiments, optionally, the character recognition result of the image of the character to be recognized is determined based on the recognition confidence of the second target character and a preset recognition confidence, including:

[0181] If the recognition confidence of the second target text is greater than the preset recognition confidence, then the second target text is determined as the text recognition result of the text image to be recognized;

[0182] If the recognition confidence of the second target text is not greater than the preset recognition confidence, then the first reference attribute information of the first target text is determined. The first reference attribute information is used to indicate whether there is a first reference text among the multiple second candidate texts and the sorting position of the first reference text among the multiple second candidate texts when the first reference text exists. The first reference text is the same as the first target text. The multiple second candidate texts are sorted according to their respective recognition confidence.

[0183] The second reference attribute information of the second target text is determined. The second reference attribute information is used to indicate whether there is a second reference text among a plurality of first candidate texts and the sorting position of the second reference text among the plurality of first candidate texts when the second reference text exists. The second reference text is the same as the second target text. The plurality of first candidate texts are sorted according to their respective corresponding vector distances.

[0184] The text recognition result of the image to be recognized is determined based on the first reference attribute information and the second reference attribute information.

[0185] Based on the above embodiments, optionally, determining the character recognition result of the character image to be recognized according to the first reference attribute information and the second reference attribute information includes:

[0186] If the first reference attribute information indicates that a first reference character exists among multiple second candidate characters and the second reference attribute information indicates that a second reference character does not exist among multiple first candidate characters, then the first target character is determined as the character recognition result of the character image to be recognized.

[0187] If the first reference attribute information indicates that there is no first reference character among the multiple second candidate characters and the second reference attribute information indicates that there is a second reference character among the multiple first candidate characters, then the second target character is determined as the character recognition result of the character image to be recognized.

[0188] If the first reference attribute information indicates that a first reference text exists among multiple second candidate texts and the second reference attribute information indicates that a second reference text exists among multiple first candidate texts, then by comparing the sorting position of the first reference text indicated by the first reference attribute information among multiple second candidate texts with the sorting position of the second reference text indicated by the second reference attribute information among multiple first candidate texts, the first target text or the second target text is selected as the text recognition result of the text image to be recognized.

[0189] If the first reference attribute information indicates that there is no first reference text among the multiple second candidate texts and the second reference attribute information indicates that there is no second reference text among the multiple first candidate texts, then determine the vector retrieval effect baseline score where the vector distance of the first target text is located and the optical character recognition effect baseline score where the recognition confidence of the second target text is located.

[0190] Based on the vector retrieval baseline score of the vector distance of the first target character and the optical character recognition baseline score of the recognition confidence of the second target character, the first target character or the second target character is selected as the character recognition result of the image to be recognized.

[0191] Optionally, based on the above embodiments, the method further includes:

[0192] Within the range of vector distance values in vector retrieval, the data is progressively decreased by a preset step size. At each progressive decrease, reference verification data with vector distances greater than the corresponding vector distances after each progressive decrease are selected from the reference verification data set as the first target data set. A first accuracy and a second accuracy of the first target data set are determined. The first accuracy is the statistical result of the accuracy of the text retrieved by vector retrieval for each preset text image in the first target data set. The second accuracy is the statistical result of the accuracy of the text obtained by optical character recognition for each preset text image in the first target data set.

[0193] Based on the first accuracy and second accuracy of the first target data set during each progressive decrease, a critical value is determined as a preset vector distance when transitioning from the first state to the second state. The first state is a vector distance state that makes the first accuracy of the first target data set greater than the second accuracy, and the second state is a vector state that makes the first accuracy of the first target data set greater than the second accuracy.

[0194] Within the range of recognition confidence values for optical character recognition, the recognition confidence is progressively decreased by a preset step size. At each progressive decrease, reference verification data with recognition confidence values lower than the recognition confidence values corresponding to each progressive decrease are selected from the reference verification data set as the second target data set. A third accuracy and a fourth accuracy are determined for the second target data set. The third accuracy is the statistical result of the accuracy of obtaining text by optical character recognition of each preset text image in the second target data set, and the fourth accuracy is the statistical result of the accuracy of retrieving text by vector retrieval of each preset text image in the second target data set.

[0195] Based on the third and fourth accuracy rates of the second target data set during each progressive decrease, a critical value is determined as the preset recognition confidence level when transitioning from the third state to the fourth state. The third state is the recognition confidence level state that makes the third accuracy rate of the second target data set greater than the fourth accuracy rate, and the fourth state is the recognition confidence level state where the third accuracy rate of the second target data set is greater than the fourth accuracy rate.

[0196] Each reference verification data in the reference verification data set includes the actual text corresponding to the preset text image, the first preset recognition error of the preset text image, the reference vector distance of the preset text image, the second preset recognition error of the preset text image, and the reference recognition confidence level of the preset text image. The first preset recognition error is used to indicate whether the text retrieved by vector retrieval of the preset text image is correct. The reference vector distance is used to indicate the vector distance between the text retrieved by vector retrieval of the preset text image and the actual text corresponding to the preset text image. The second preset recognition error is used to indicate whether the text obtained by optical character recognition of the preset text image is correct. The reference recognition confidence level is used to indicate the confidence level when obtaining the second preset recognition character by optical character recognition of the preset text image.

[0197] Optionally, based on the above embodiments, the method further includes:

[0198] Within the range of vector distance values in vector retrieval, the data is progressively decreased by a preset step size. At each progressive decrease, reference verification data with vector distances greater than the corresponding vector distances after each progressive decrease are selected from the reference verification data set as the first target data set. The first accuracy of the first target data set is determined. The first accuracy is the statistical result of the accuracy of the text retrieved by vector retrieval for each preset text image in the first target data set.

[0199] Based on the first accuracy of the first target data set during each decreasing and progressive step, multiple accuracy segments corresponding to vector retrieval are obtained. The upper and lower limits of each accuracy segment corresponding to vector retrieval are determined according to the vector distance of the first accuracy during the decreasing and progressive step corresponding to the upper and lower limits of each accuracy segment. Different accuracy segments corresponding to vector retrieval correspond to different vector retrieval performance baselines.

[0200] Within the range of recognition confidence values for optical character recognition, the recognition confidence is progressively decreased by a preset step size. At each progressive decrease, reference verification data with recognition confidence values lower than the recognition confidence values corresponding to each progressive decrease are selected from the reference verification data set as the second target data set. The third accuracy of the second target data set is determined. The third accuracy is the statistical result of the accuracy of obtaining text by optical character recognition of each preset text image in the second target data set.

[0201] Based on the third accuracy of the second target data set at each decreasing progressive step, multiple accuracy segments corresponding to optical character recognition are obtained. The upper and lower limits of each accuracy segment corresponding to optical character recognition are determined according to the vector distance of the third accuracy at each decreasing progressive step. Different accuracy segments corresponding to optical character recognition correspond to different baselines of optical character recognition performance.

[0202] Each reference verification data in the reference verification data set includes the actual text corresponding to the preset text image, the first preset recognition error of the preset text image, the reference vector distance of the preset text image, the second preset recognition error of the preset text image, and the reference recognition confidence level of the preset text image. The first preset recognition error is used to indicate whether the text retrieved by vector retrieval of the preset text image is correct. The reference vector distance is used to indicate the vector distance between the text retrieved by vector retrieval of the preset text image and the actual text corresponding to the preset text image. The second preset recognition error is used to indicate whether the text obtained by optical character recognition of the preset text image is correct. The reference recognition confidence level is used to indicate the confidence level when obtaining the second preset recognition character by optical character recognition of the preset text image.

[0203] The technical solution of this disclosure embodiment utilizes a preset rare character feature vector library during character recognition, providing a dedicated resource for the recognition of rare characters. Vector retrieval can quickly locate similar rare character feature vectors in the preset rare character feature vector library, efficiently recalling possible first character recognition results. This is particularly advantageous when processing images such as ancient books and professional documents containing rare characters. In addition to vector retrieval in the preset rare character feature vector library, optical character recognition (OCR) is performed on the image of the text to be recognized to obtain a second character recognition result. OCR can comprehensively process the text in the image, compensating for the shortcomings of traditional OCR methods in recognizing rare characters. By combining the first and second character recognition results, the target character recognition result is determined. Through mutual verification and supplementation, the accuracy of recognition can be effectively improved, reducing misjudgments that may occur due to insufficient performance of OCR in the presence of rare characters. This enhances the processing capability and reliability of the entire character recognition system for various types of text images, especially those containing rare characters, thus providing strong technical support in areas such as ancient book digitization and archaeological document recognition.

[0204] The text detection device provided in this disclosure can execute the text detection method provided in any embodiment of this disclosure, and has the corresponding functional modules and beneficial effects for executing the text detection method.

[0205] It is worth noting that the various units and modules included in the above-mentioned device are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, the specific names of each functional unit are only for easy differentiation and are not used to limit the protection scope of the embodiments of this disclosure.

[0206] Figure 8 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this disclosure. Reference is made below. Figure 8 It illustrates an electronic device suitable for implementing embodiments of the present disclosure (e.g., Figure 8 The diagram below shows the structure of the terminal device or server 800. The terminal device in this embodiment may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), and vehicle terminals (e.g., vehicle navigation terminals), as well as fixed terminals such as digital TVs and desktop computers. Figure 8 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments disclosed herein.

[0207] like Figure 8 As shown, the electronic device 800 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 801, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 802 or a program loaded from a storage device 808 into a random access memory (RAM) 803. The RAM 803 also stores various programs and data required for the operation of the electronic device 800. The processing device 801, ROM 802, and RAM 803 are interconnected via a bus 804. An edit / output (I / O) interface 805 is also connected to the bus 804.

[0208] Typically, the following devices can be connected to I / O interface 805: input devices 806 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 807 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 808 including, for example, magnetic tapes, hard disks, etc.; and communication devices 809. Communication device 809 allows electronic device 800 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 8 An electronic device 800 with various devices is shown; however, it should be understood that it is not required to implement or possess all of the devices shown. More or fewer devices may be implemented or possessed alternatively.

[0209] In particular, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 809, or installed from a storage device 808, or installed from a ROM 802. When the computer program is executed by a processing device 801, it performs the functions defined in the methods of embodiments of this disclosure.

[0210] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

[0211] The electronic device provided in this embodiment and the text detection method provided in the above embodiments belong to the same inventive concept. Technical details not described in detail in this embodiment can be found in the above embodiments, and this embodiment has the same beneficial effects as the above embodiments.

[0212] This disclosure provides a computer storage medium storing a computer program that, when executed by a processor, implements the text detection method provided in the above embodiments.

[0213] It should be noted that the computer-readable medium described in this disclosure can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this disclosure, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in connection with an instruction execution system, apparatus, or device. In this disclosure, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.

[0214] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol such as HTTP (Hypertext Transfer Protocol) and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), the Internet (e.g., the Internet of Things), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future-developed networks.

[0215] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device.

[0216] The aforementioned computer-readable medium carries one or more programs. When the electronic device executes the aforementioned one or more programs, the electronic device causes the following to occur: First, it retrieves a first character recognition result corresponding to a character image to be recognized by performing a vector retrieval in a preset rare character feature vector library, wherein the preset rare character feature vector library records at least one rare character feature vector and the character corresponding to the rare character vector; Second, it performs optical character recognition on the character image to be recognized to obtain a second character recognition result corresponding to the character image to be recognized; Third, it determines a target character recognition result for the character image to be recognized based on the first character recognition result and the second character recognition result.

[0217] Computer program code for performing the operations of this disclosure can be written in one or more programming languages or a combination thereof, including but not limited to object-oriented programming languages such as Java, Smalltalk, and C++, as well as conventional procedural programming languages such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0218] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0219] The units described in the embodiments of this disclosure can be implemented in software or in hardware. The name of a unit does not necessarily limit the unit itself; for example, the first acquisition unit can also be described as "a unit that acquires at least two Internet Protocol addresses".

[0220] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.

[0221] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0222] The above description is merely a preferred embodiment of this disclosure and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of this disclosure is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-described concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features disclosed in this disclosure that have similar functions.

[0223] Furthermore, while the operations are described in a specific order, this should not be construed as requiring these operations to be performed in the specific order shown or in a sequential order. In certain environments, multitasking and parallel processing may be advantageous. Similarly, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of this disclosure. Certain features described in the context of individual embodiments may also be implemented in combination in a single embodiment. Conversely, various features described in the context of a single embodiment may also be implemented individually or in any suitable sub-combination in multiple embodiments.

[0224] Although the subject matter has been described using language specific to structural features and / or methodological logic, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or actions described above. Rather, the specific features and actions described above are merely illustrative examples of implementing the claims.

Claims

1. A text detection method, characterized by, The method includes: The first character recognition result corresponding to the character image to be recognized is obtained by performing vector retrieval in a preset rare character feature vector library. The preset rare character feature vector library records at least one rare character feature vector and the character corresponding to the rare character vector. The second character recognition result corresponding to the character image to be recognized is obtained by performing optical character recognition operation on the character image to be recognized. Based on the first character recognition result and the second character recognition result, the target character recognition result of the character image to be recognized is determined.

2. The method of claim 1, wherein, The first character recognition result corresponding to the image of the character to be recognized is obtained by vector retrieval in a pre-set rare character feature vector library, including: Determine the feature vector of the image to be identified, and perform a vector retrieval operation in a preset rare character feature vector library based on the feature vector of the image to be identified to obtain multiple first candidate characters; A first target character is determined from a plurality of first candidate characters, wherein the vector distance between the first target character and the feature vector of the image to be identified is less than the vector distance between the remaining characters in the plurality of first candidate characters other than the first target character and the feature vector of the image to be identified; Based on the plurality of first candidate characters and the first target character, the first character recognition result corresponding to the character image to be recognized is determined.

3. The method of claim 2, wherein, Based on the feature vector of the image to be identified, a vector retrieval and recall operation is performed in a preset rare character feature vector library to obtain multiple first candidate characters, including: Calculate the vector distance between the feature vector of the image to be identified and each rare character feature vector in the preset rare character feature vector library, and sort each rare character feature vector according to the vector distance; Sort the rare character feature vectors by vector distance, retrieve multiple target rare character feature vectors by vector retrieval from each rare character feature vector, and determine the multiple first candidate characters based on the characters corresponding to the multiple target rare character feature vectors.

4. The method of claim 1, wherein, The second character recognition result corresponding to the character image to be recognized is obtained by performing optical character recognition (OCR) on the character image to be recognized, including: Optical character recognition (OCR) is performed on the image of the text to be recognized to obtain multiple second candidate characters. These multiple second candidate characters are determined by filtering based on the recognition confidence of each character recognition result when performing OCR on the image of the text to be recognized. The second target text corresponding to the text image to be recognized is determined from a plurality of second candidate texts, and the recognition confidence of the second target text is greater than the recognition confidence of the remaining texts in the plurality of second candidate texts other than the second target text.

5. The method according to claim 1, characterized in that, The construction process of the preset rare character feature vector library includes: Obtain various reference font files containing extended text regions. The extended text regions of these reference font files include multiple uncommon characters. At least one of the following attributes differs between the various reference font files: stroke width, stroke direction, or stroke connection. Determine the glyphs of each font corresponding to the rare characters in the reference font file, and determine the font text images corresponding to each rare character based on the glyphs of each font corresponding to the rare characters; Image features are extracted from the font images corresponding to the rare characters to obtain the rare character feature vectors and the characters corresponding to the rare character feature vectors, so as to obtain a preset rare character feature vector library.

6. The method according to claim 1, characterized in that, Based on the first character recognition result and the second character recognition result, the target character recognition result of the character image to be recognized is determined, including: Identify the first target character in the first character recognition result and the second target character in the second character recognition result; The character recognition result of the image to be recognized is determined by performing reliability testing on the first target text and the second target text.

7. The method according to claim 6, characterized in that, The character recognition result of the image to be recognized is determined by performing reliability detection on the first target text and the second target text, including: The vector distance of the first target text is determined, which is based on the vector distance between the first target text and the feature vector of the image to be identified corresponding to the image of the text to be identified. If the vector distance of the first target character is less than the preset vector distance, then the first target character is determined as the character recognition result of the character image to be recognized. The preset vector distance is a threshold condition for making a reliability judgment on the characters retrieved by vector retrieval in the preset rare character feature vector library. If the vector distance of the first target character is not less than the preset vector distance, then the recognition confidence of the second target character and the preset recognition confidence are determined, and the character recognition result of the character image to be recognized is determined based on the recognition confidence of the second target character and the preset recognition confidence. The recognition confidence of the second target character is determined based on the recognition confidence of the second target character when performing a character recognition operation on the character image to be recognized. The preset recognition confidence is a threshold condition used to judge the reliability of the characters obtained by performing optical character recognition operation on the character image to be recognized.

8. The method according to claim 7, characterized in that, The character recognition result of the image to be recognized is determined based on the recognition confidence of the second target character and a preset recognition confidence, including: If the recognition confidence of the second target text is greater than the preset recognition confidence, then the second target text is determined as the text recognition result of the text image to be recognized; If the recognition confidence of the second target text is not greater than the preset recognition confidence, then the first reference attribute information of the first target text is determined. The first reference attribute information is used to indicate whether there is a first reference text among the multiple second candidate texts and the sorting position of the first reference text among the multiple second candidate texts when the first reference text exists. The first reference text is the same as the first target text. The multiple second candidate texts are sorted according to their respective recognition confidence. The second reference attribute information of the second target text is determined. The second reference attribute information is used to indicate whether there is a second reference text among a plurality of first candidate texts and the sorting position of the second reference text among the plurality of first candidate texts when the second reference text exists. The second reference text is the same as the second target text. The plurality of first candidate texts are sorted according to their respective corresponding vector distances. The text recognition result of the image to be recognized is determined based on the first reference attribute information and the second reference attribute information.

9. The method according to claim 8, characterized in that, Determining the text recognition result of the image to be recognized based on the first reference attribute information and the second reference attribute information includes: If the first reference attribute information indicates that a first reference character exists among multiple second candidate characters and the second reference attribute information indicates that a second reference character does not exist among multiple first candidate characters, then the first target character is determined as the character recognition result of the character image to be recognized. If the first reference attribute information indicates that there is no first reference character among the multiple second candidate characters and the second reference attribute information indicates that there is a second reference character among the multiple first candidate characters, then the second target character is determined as the character recognition result of the character image to be recognized. If the first reference attribute information indicates that a first reference text exists among multiple second candidate texts and the second reference attribute information indicates that a second reference text exists among multiple first candidate texts, then by comparing the sorting position of the first reference text indicated by the first reference attribute information among multiple second candidate texts with the sorting position of the second reference text indicated by the second reference attribute information among multiple first candidate texts, the first target text or the second target text is selected as the text recognition result of the text image to be recognized. If the first reference attribute information indicates that there is no first reference text among the multiple second candidate texts and the second reference attribute information indicates that there is no second reference text among the multiple first candidate texts, then determine the vector retrieval effect baseline score where the vector distance of the first target text is located and the optical character recognition effect baseline score where the recognition confidence of the second target text is located. Based on the vector retrieval baseline score of the vector distance of the first target character and the optical character recognition baseline score of the recognition confidence of the second target character, the first target character or the second target character is selected as the character recognition result of the image to be recognized.

10. The method according to claim 7, characterized in that, The method further includes: Within the range of vector distance values in vector retrieval, the data is progressively decreased by a preset step size. At each progressive decrease, reference verification data with vector distances greater than the corresponding vector distances after each progressive decrease are selected from the reference verification data set as the first target data set. A first accuracy and a second accuracy of the first target data set are determined. The first accuracy is the statistical result of the accuracy of the text retrieved by vector retrieval for each preset text image in the first target data set. The second accuracy is the statistical result of the accuracy of the text obtained by optical character recognition for each preset text image in the first target data set. Based on the first accuracy and second accuracy of the first target data set during each progressive decrease, a critical value is determined as a preset vector distance when transitioning from the first state to the second state. The first state is a vector distance state that makes the first accuracy of the first target data set greater than the second accuracy, and the second state is a vector state that makes the first accuracy of the first target data set greater than the second accuracy. Within the range of recognition confidence values for optical character recognition, the recognition confidence is progressively decreased by a preset step size. At each progressive decrease, reference verification data with recognition confidence values lower than the recognition confidence values corresponding to each progressive decrease are selected from the reference verification data set as the second target data set. A third accuracy and a fourth accuracy are determined for the second target data set. The third accuracy is the statistical result of the accuracy of obtaining text by optical character recognition of each preset text image in the second target data set, and the fourth accuracy is the statistical result of the accuracy of retrieving text by vector retrieval of each preset text image in the second target data set. Based on the third and fourth accuracy rates of the second target data set during each progressive decrease, a critical value is determined as the preset recognition confidence level when transitioning from the third state to the fourth state. The third state is the recognition confidence level state that makes the third accuracy rate of the second target data set greater than the fourth accuracy rate, and the fourth state is the recognition confidence level state where the third accuracy rate of the second target data set is greater than the fourth accuracy rate. Each reference verification data in the reference verification data set includes the actual text corresponding to the preset text image, the first preset recognition error of the preset text image, the reference vector distance of the preset text image, the second preset recognition error of the preset text image, and the reference recognition confidence level of the preset text image. The first preset recognition error is used to indicate whether the text retrieved by vector retrieval of the preset text image is correct. The reference vector distance is used to indicate the vector distance between the text retrieved by vector retrieval of the preset text image and the actual text corresponding to the preset text image. The second preset recognition error is used to indicate whether the text obtained by optical character recognition of the preset text image is correct. The reference recognition confidence level is used to indicate the confidence level when obtaining the second preset recognition character by optical character recognition of the preset text image.

11. The method according to claim 9, characterized in that, The method further includes: Within the range of vector distance values in vector retrieval, the data is progressively decreased by a preset step size. At each progressive decrease, reference verification data with vector distances greater than the corresponding vector distances after each progressive decrease are selected from the reference verification data set as the first target data set. The first accuracy of the first target data set is determined. The first accuracy is the statistical result of the accuracy of the text retrieved by vector retrieval for each preset text image in the first target data set. Based on the first accuracy of the first target data set during each decreasing and progressive step, multiple accuracy segments corresponding to vector retrieval are obtained. The upper and lower limits of each accuracy segment corresponding to vector retrieval are determined according to the vector distance of the first accuracy during the decreasing and progressive step corresponding to the upper and lower limits of each accuracy segment. Different accuracy segments corresponding to vector retrieval correspond to different vector retrieval performance baselines. Within the range of recognition confidence values for optical character recognition, the recognition confidence is progressively decreased by a preset step size. At each progressive decrease, reference verification data with recognition confidence values lower than the recognition confidence values corresponding to each progressive decrease are selected from the reference verification data set as the second target data set. The third accuracy of the second target data set is determined. The third accuracy is the statistical result of the accuracy of obtaining text by optical character recognition of each preset text image in the second target data set. Based on the third accuracy of the second target data set at each decreasing progressive step, multiple accuracy segments corresponding to optical character recognition are obtained. The upper and lower limits of each accuracy segment corresponding to optical character recognition are determined according to the vector distance of the third accuracy at each decreasing progressive step. Different accuracy segments corresponding to optical character recognition correspond to different baselines of optical character recognition performance. Each reference verification data in the reference verification data set includes the actual text corresponding to the preset text image, the first preset recognition error of the preset text image, the reference vector distance of the preset text image, the second preset recognition error of the preset text image, and the reference recognition confidence level of the preset text image. The first preset recognition error is used to indicate whether the text retrieved by vector retrieval of the preset text image is correct. The reference vector distance is used to indicate the vector distance between the text retrieved by vector retrieval of the preset text image and the actual text corresponding to the preset text image. The second preset recognition error is used to indicate whether the text obtained by optical character recognition of the preset text image is correct. The reference recognition confidence level is used to indicate the confidence level when obtaining the second preset recognition character by optical character recognition of the preset text image.

12. A text detection device, characterized in that, The device includes: The first recognition module is used to obtain the first character recognition result corresponding to the character image to be recognized by performing vector retrieval in a preset rare character feature vector library. The preset rare character feature vector library records at least one rare character feature vector and the character corresponding to the rare character vector. The second recognition module is used to obtain the second character recognition result corresponding to the character image to be recognized by performing optical character recognition operation on the character image to be recognized. The text detection module is used to determine the target text recognition result of the text image to be recognized based on the first text recognition result and the second text recognition result.

13. An electronic device, characterized in that, The electronic device includes: One or more processors; Storage device for storing one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors implement the text detection method as described in any one of claims 1-11.

14. A storage medium containing computer-executable instructions, characterized in that, The computer-executable instructions, when executed by a computer processor, are used to perform the text detection method as described in any one of claims 1-11.