Picture character recognition method and device and computer readable storage medium

A text recognition and text technology, applied in the computer field, can solve the problems of garbled recognition results, inconvenient access to editable text, poor user experience, etc.

Pending Publication Date: 2020-10-23
邓兴尧
0 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

If there are texts in some areas of the entire picture, and other areas are images, after using OCR recognition technology to identify, th...
View more

Method used

Adopt above-mentioned technical scheme, can at first the picture text to be recognized be intercepted, afterward, carry out text recognition to the less picture text of this interception, like this, both can avoid carrying out text recognition to the image in picture and cause recognition result Disadvantages of garbled characters improve the accuracy of text recognition, and only the text in pictures that need to be edited can be recognized, which effectively reduces the workload of text recognition in pictures and improves the efficiency of text recognition in pictures.
If the picture text recognition technology identifies the whole picture, there may be the following problems: 1, if there is an image in the whole picture, it will cause garbled characters to appear in the recognition result, and the recognition accuracy of the picture text is low; 2, in the user In the case where only part of the picture text needs to be edited, if the text recognition is performed on the entire picture, it will increase the useless workload of recognition. Based on the above consi...
View more

Abstract

The invention relates to a picture character recognition method and device and a computer readable storage medium. The method comprises the steps of obtaining a target picture intercepted by a user inan original picture, wherein the target picture comprises to-be-identified picture characters; identifying the picture characters to be identified; and outputting the editable characters identified this time. Therefore, the characters of the to-be-recognized picture can be intercepted; then, character recognition is carried out on the intercepted small picture text; thus, the defect of messy codes of the recognition result caused by character recognition of the image in the picture can be avoided, the accuracy of character recognition is improved, only the picture characters needing to be edited can be recognized, the workload of picture character recognition is effectively reduced, and the efficiency of picture character recognition is improved.

Application Domain

Character and pattern recognition

Technology Topic

Character recognitionEngineering +3

Image

  • Picture character recognition method and device and computer readable storage medium
  • Picture character recognition method and device and computer readable storage medium
  • Picture character recognition method and device and computer readable storage medium

Examples

  • Experimental program(1)

Example Embodiment

[0061] The exemplary embodiments will be described in detail here, and examples thereof are shown in the accompanying drawings. When the following description refers to the drawings, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present disclosure. Rather, they are merely examples of devices and methods consistent with some aspects of the present disclosure as detailed in the appended claims.
[0062] If the picture text recognition technology recognizes the entire picture, there may be the following problems: 1. If there are images in the entire picture, it will cause the recognition result to appear garbled, and the recognition accuracy of the picture text is low; In the case of editing part of the image text, if the text recognition is performed on the entire image, it will increase the useless workload of recognition. Based on the above considerations, the inventor proposes that the text of the picture to be recognized can be cut out first, and then the cut out smaller picture can be recognized. In this way, it can avoid the recognition result of the image in the picture from being garbled. Disadvantages, the accuracy of text recognition is improved, and only the image text that needs to be edited can be recognized, which effectively reduces the workload of image text recognition and improves the efficiency of image text recognition.
[0063] figure 1 It is a flowchart of a method for recognizing characters in pictures according to an exemplary embodiment, such as figure 1 As shown, the method can be applied to terminals and servers. The method includes the following steps.
[0064] In step 11, the target picture captured by the user in the original picture is obtained, and the target picture includes the text of the picture to be recognized.
[0065] In the present disclosure, the original picture may be a picture that only includes picture text, or a picture that includes both picture text and image. The user can use relevant screenshot technology to intercept the text part of the picture that needs to be edited according to the actual needs of editing text. For example, the original picture contains both picture text and image, and the user needs to edit a certain part of the picture text. In this case, the screenshot technology can be used to take a screenshot of the part of the picture text to obtain the target picture. In practical applications, the target image obtained by interception includes at least the text of the image to be recognized. That is, the size of the captured target picture is greater than or equal to the size of the text area of ​​the picture to be recognized.
[0066] In an embodiment, a default screenshot tool can be used to capture the target picture. Normally, the default screenshot tool is a square screenshot tool.
[0067] However, in practical applications, considering the irregular shape of the text in the picture to be recognized, in order to avoid intercepting other picture texts that the user does not need to edit into the target picture, which increases the workload of recognition, in another embodiment, you can Provide users with different screenshot tools, so that users can accurately capture the original image according to the actual editing image text requirements to obtain a more accurate target image.
[0068] For example, please refer to figure 2 , figure 2 It is a flowchart showing a method for obtaining a target picture according to an exemplary embodiment. Such as figure 2 As shown, the method for obtaining the target picture may include the following steps.
[0069] In step 111, in response to receiving a screenshot request input by the user, multiple screenshot tool identifiers are output.
[0070] In this disclosure, the screenshot request is used to request to call the screenshot tool to take a screenshot. For example, when it is detected that the user has clicked a shortcut button for screenshot in the external device (external keyboard), or when it is detected that the user has clicked on the screenshot tool application icon, the device executing the method determines that it receives the screenshot request input by the user.
[0071] After it is determined that the screenshot request is received, the device that can execute the method can output multiple screenshot tool identifiers. It is worth noting that if the device executing the method is a terminal, the terminal can directly display the multiple screenshot tool identifiers; if the device executing the method is a server, the server can output multiple screenshot tool identifiers to the user terminal. So that the user terminal displays the screenshot tool logo.
[0072] Among them, multiple screenshot tool identifiers include: a square screenshot tool identifier, a circular screenshot tool identifier, and a magnetic lasso screenshot tool identifier.
[0073] In step 112, the screenshot tool corresponding to the screenshot tool identifier selected by the user is determined as the target screenshot tool.
[0074] The user can select the screenshot tool to be used among multiple screenshot tool identifiers according to the shape of the image text to be recognized. For example, assuming that the image text to be recognized is a circle, the user can select the logo of the circular screenshot tool, and then determine the circular screenshot tool as the target screenshot tool.
[0075] In step 113, according to the interception range information input by the user on the original picture, a target screenshot tool is used to intercept the original picture to obtain the target picture.
[0076] In one embodiment, after the target screenshot tool is determined, the target screenshot tool is called, and the target screenshot tool is used to delineate the interception range information and perform interception. For example, the screenshot can be taken with reference to the screenshot method in the related technology.
[0077] In another embodiment, the original picture is displayed on the screen, and the user can use the drawing tool to delineate the interception range information. After that, the device that executes the method calls the target screenshot tool and performs interception according to the delineated interception range information to obtain Target image. Among them, if the screen is a touch screen, the interception range information can be delineated by the user's finger sliding on the screen, and then the target image can be obtained by using the target screenshot tool.
[0078] In step 12, the text of the picture to be recognized is recognized.
[0079] In an embodiment, after the target picture is cut out, the target picture may be input into the text recognition model to recognize the text in the picture to be recognized.
[0080] In another embodiment, the screenshot tool is coupled with text recognition technology, such as OCR recognition technology. For example, the function button of OCR recognition technology is displayed in the screenshot tool interface. After the target image is captured, if it is detected that the user clicks the function button of the OCR recognition technology, the OCR recognition technology will be activated and the target image will be directly recognized. Picture text is recognized. In this way, the efficiency of image character recognition is improved.
[0081] In step 13, output the editable text recognized this time.
[0082] In one embodiment, after the image text to be recognized is recognized this time, a window pops up in the display interface, and the editable text recognized this time is displayed in the window for the user to further edit. Wherein, the window may be a preset document interface, and the preset document interface may include one of the following: TXT document, Word document, WPS document, and Excle document. The present disclosure does not specifically limit the preset document interface, as long as the text is editable in the document interface.
[0083] With the above technical solution, the text of the picture to be recognized can be cut out first, and then the cut out smaller picture text can be recognized. In this way, it can avoid the disadvantages of garbled recognition results caused by text recognition on the image in the picture. , Improve the accuracy of text recognition, and can also recognize only the text of the picture that needs to be edited, which effectively reduces the workload of the picture text recognition and improves the efficiency of the picture text recognition.
[0084] In addition, in an embodiment, the recognized text may also be converted into voice. For example, the aforementioned preset document interface is coupled with translation software, and the preset document interface includes a language conversion function button. The language conversion function button may include multiple buttons, and each button corresponds to a language. The above image character recognition method may further include:
[0085] Detect the user's operation behavior for the voice conversion function button. For example, it can be detected which button included in the voice conversion function button is clicked by the user.
[0086] Determine the target language to be converted according to the operation behavior.
[0087] Convert the recognized editable text into the target text under the target voice.
[0088] Output the target text.
[0089] For example, button 1 corresponds to English, button 2 corresponds to Japanese, button 3 corresponds to Chinese, button 4 corresponds to Korean, etc. When it is detected that the user clicks on button 1, English is determined as the target voice to be converted, and then all The recognized editable text is translated into English and the English is displayed.
[0090] In practical applications, multiple pictures and texts may be continuously recognized, and each time the recognized editable text is saved, you can choose whether to save continuously or in sections according to user needs. For example, such as image 3 As shown, figure 1 The methods in can also include:
[0091] In step 14, the save type selected by the user is obtained. Among them, the preservation type includes continuous preservation or section preservation.
[0092] For example, the preset document interface displaying editable text includes the save type, and the user can make a selection according to actual needs.
[0093] In step 15, if the save type is continuous save, the editable text recognized this time and the editable text recognized last time are saved in the same interface.
[0094] It is worth noting that after the editable text is displayed each time, after the user clicks to close, the editable text recognized this time can be automatically saved. In this screenshot, after calling the screenshot tool, you can use the screenshot tool interface The function button of the text recognition technology in, again controls the pop-up of the interface that displays the editable text. If the save type selected by the user is continuous save, the interface that pops up this time includes both the editable text recognized last time and the editable text recognized this time. For example, the editable text recognized this time will continue to be displayed below the editable text recognized last time.
[0095] In step 16, when the save type is striped save, the editable text recognized last time is saved in the history window in the interface for saving the editable text recognized this time.
[0096] In this embodiment, the editable text that is recognized each time is not displayed in the same interface. For example, the editable text recognized last time is displayed in an interface alone, and after the user closes the interface, the editable text can be displayed in the history window in the interface. In this way, when recognizing the image text this time, you can view the previously recognized editable text from the history window in the interface.
[0097] By adopting the above technical solution, the user can select the saving type according to actual needs, and thus, the flexibility of image character recognition is improved.
[0098] Figure 4 It is a block diagram showing a device for recognizing pictures and characters according to an exemplary embodiment. Reference Figure 4 , The image character recognition device 400 may include:
[0099] The first obtaining module 401 is configured to obtain a target picture intercepted by a user in an original picture, where the target picture includes the text of the picture to be recognized;
[0100] The recognition module 402 is configured to recognize the text of the picture to be recognized;
[0101] The first output module 403 is used to output the editable text recognized this time.
[0102] Optionally, the first output module 403 is configured to: output the editable text recognized this time on a preset document interface, and the preset document interface includes one of the following: TXT document, Word document, WPS document , Excle document.
[0103] Optionally, the preset document interface includes a language conversion function button, and the device further includes:
[0104] The detection module is used to detect the user's operation behavior on the language conversion function button;
[0105] The determining module is used to determine the target language to be converted according to the operation behavior;
[0106] A conversion module for converting the recognized editable text into a target text in the target language;
[0107] The second output module is used to output the target text.
[0108] Optionally, the device further includes:
[0109] The first obtaining module is used to obtain the saving type selected by the user, and the saving type includes continuous saving or segmented saving;
[0110] The first saving module is configured to save the editable text recognized this time and the editable text recognized last time in the same interface when the saving type is continuous saving;
[0111] The second saving module is used for saving the editable text recognized last time in the history record window in the interface for saving the editable text recognized this time when the saving type is striped saving.
[0112] Optionally, the first obtaining module includes:
[0113] The output sub-module is used to output multiple screenshot tool identifiers in response to receiving a screenshot request input by the user;
[0114] The determining sub-module is used to determine the screenshot tool corresponding to the screenshot tool identifier selected by the user as the target screenshot tool;
[0115] The interception sub-module is configured to use the target screenshot tool to intercept the original picture according to the interception range information input by the user on the original picture to obtain the target picture.
[0116] Optionally, the multiple screenshot tool identifiers include: a square screenshot tool identifier, a circular screenshot tool identifier, and a magnetic lasso screenshot tool identifier.
[0117] Regarding the device in the foregoing embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment of the method, and detailed description will not be given here.
[0118] The present disclosure also provides a computer-readable storage medium on which computer program instructions are stored, and when the program instructions are executed by a processor, the steps of the image character recognition method provided in the present disclosure are realized.
[0119] Figure 5 It is a block diagram showing a device for recognizing characters in pictures according to an exemplary embodiment. For example, the device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc.
[0120] Reference Figure 5 , The device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
[0121] The processing component 802 generally controls the overall operations of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the image character recognition method. In addition, the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
[0122] The memory 804 is configured to store various types of data to support operations in the device 800. Examples of these data include instructions for any application or method operating on the device 800, contact data, phone book data, messages, pictures, videos, etc. The memory 804 can be implemented by any type of volatile or nonvolatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
[0123] The power component 806 provides power to various components of the device 800. The power component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 800.
[0124] The multimedia component 808 includes a screen that provides an output interface between the device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
[0125] The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC), and when the device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive external audio signals. The received audio signal may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, the audio component 810 further includes a speaker for outputting audio signals.
[0126] The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module. The peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include but are not limited to: home button, volume button, start button, and lock button.
[0127] The sensor component 814 includes one or more sensors for providing the device 800 with various aspects of status assessment. For example, the sensor component 814 can detect the open/close state of the device 800 and the relative positioning of components. For example, the component is the display and the keypad of the device 800. The sensor component 814 can also detect the position change of the device 800 or a component of the device 800. , The presence or absence of contact between the user and the device 800, the orientation or acceleration/deceleration of the device 800, and the temperature change of the device 800. The sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact. The sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
[0128] The communication component 816 is configured to facilitate wired or wireless communication between the device 800 and other devices. The device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
[0129] In an exemplary embodiment, the apparatus 800 may be implemented by one or more application specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing equipment (DSPD), programmable logic devices (PLD), field programmable Implemented by gate array (FPGA), controller, microcontroller, microprocessor or other electronic components, used to perform image character recognition methods.
[0130] In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as the memory 804 including instructions, which can be executed by the processor 820 of the device 800 to complete the image character recognition method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
[0131] In another exemplary embodiment, a computer program product is further provided. The computer program product includes a computer program that can be executed by a programmable device, and the computer program has functions for executing the above-mentioned program when executed by the programmable device. The code part of the image text recognition method.
[0132] Image 6 It is a block diagram of a picture character recognition device according to an exemplary embodiment. For example, the device 1900 may be provided as a server. Reference Image 6 The device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by the memory 1932, for storing instructions that can be executed by the processing component 1922, such as application programs. The application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 1922 is configured to execute instructions to execute the image character recognition method
[0133] The device 1900 may also include a power supply component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input output (I/O) interface 1958. The device 1900 can operate based on an operating system stored in the memory 1932, such as Windows Server TM , MacOS X TM , Unix TM , Linux TM , FreeBSD TM Or similar.
[0134] Those skilled in the art will easily think of other embodiments of the present disclosure after considering the specification and practicing the present disclosure. This application is intended to cover any variations, uses, or adaptive changes of the present disclosure, which follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The description and the embodiments are only regarded as exemplary, and the true scope and spirit of the present disclosure are pointed out by the following claims.
[0135] It should be understood that the present disclosure is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present disclosure is only limited by the appended claims.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products