Supercharge Your Innovation With Domain-Expert AI Agents!

Text processing method and apparatus

A text processing and text technology, applied in the field of text processing, can solve the problems of high recognition error rate, complex and redundant recognition process, etc., achieve the effect of simplifying the text recognition process and improving the recognition accuracy

Inactive Publication Date: 2018-10-16
BEIJING SINOVOICE TECH CO LTD
View PDF13 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention provides a text processing method and device to solve the problems of high recognition error rate and complex and redundant recognition process in the prior art when performing text recognition on non-fixed-format paper certificates

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text processing method and apparatus
  • Text processing method and apparatus
  • Text processing method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0027] refer to figure 1 , which shows a flow chart of the steps of an embodiment of a text processing method of the present invention, which may specifically include the following steps:

[0028] Step 101, using OCR technology to perform character recognition on the paper text to be detected of the preset certificate type, and determine a plurality of recognized text lines;

[0029] Wherein, the preset certificate type refers to a certificate with an unfixed format, such as business card, business license, award certificate, qualification certificate and other certificates.

[0030] here to figure 2 The business card shown is used as an example for text recognition. OCR (Optical Character Recognition) technology can be used t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text processing method and a device. The method comprises adopting the OCR technique to perform character identification on the paper text to be detected of a preset certificate type; matching each text line with a first level keyword of each item type in a preset configuration file of a preset certificate type to determine a first target item type corresponding to a firsttarget text line matched to the first level keyword; determining and deleting invalid content in the first target text line according to a preset rule; formatting the first target text line deletingthe invalid content according to the preset text format corresponding to the corresponding first target item type. The invention can accurately determine the item types corresponding to each text linein the paper text. And the content of each text line is formatted and standardized, so that the uniform format of the item type and the effective text content can be obtained, which improves the accuracy of text recognition and simplifies the text recognition process.

Description

technical field [0001] The present invention relates to the technical field of text processing, in particular to a text processing method and device. Background technique [0002] OCR (Optical Character Recognition, Optical Character Recognition) means that an electronic device (such as a scanner or a digital camera) checks characters printed on paper, determines its shape by detecting dark and bright patterns, and then uses character recognition to translate the shape into a computer The process of text; that is, for printed characters, the text in the paper document is converted into a black and white dot matrix image file by optical means, and the text in the image is converted into a text format by recognition software for further processing by word processing software Editing technology. OCR technology can enable machines to read pictures and read characters, and can realize efficient information entry, storage and retrieval. [0003] OCR application scenarios, except...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/34
CPCG06V10/26G06V30/153
Inventor 伍更新李健张连毅武卫东
Owner BEIJING SINOVOICE TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More