Method and system for identifying spaces in document

A recognition method and space technology, which is applied in the field of information recognition, can solve the problems of low space recognition and inaccurate recognition results, and achieve the effect of improving space recognition and accurate and reliable recognition results

Active Publication Date: 2017-05-10
WONDERSHARE TECH CO LTD
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the embodiment of the present invention provides a method and system for identifying spaces in documents to solve the problems in the prior art that the identification degree of spaces in documents is not high and the identification results are inaccurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for identifying spaces in document
  • Method and system for identifying spaces in document
  • Method and system for identifying spaces in document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the invention may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

[0050] According to the embodiment of the present invention, the basic unit of line or segment in the document is used to collect the gap width values ​​between all adjacent characters in each of the basic units, so as to obtain the initial gap width set corresponding to each of the basic units; Using the initial set of gap widths as an input set, processing the input set by a space threshold calc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is suitable for the field of character identification and provides a method and a system for identifying spaces in a document. The method comprises the steps of acquiring width values of gaps between all adjacent characters in each basic unit by taking lines or paragraphs in the document as basic units, and obtaining an initial gap width set corresponding to each basic unit; taking the initial gap width set as an input set, processing the input set through a space threshold calculation method, and taking an obtained space threshold as a first space threshold; and judging whether the width values of the gaps between the adjacent characters in the basic unit are greater than the first space threshold or not in sequence: if the width values of the gaps between the adjacent characters in the basic unit are greater than the first space threshold, judging that the spaces exist between the adjacent characters; and if the width values of the gaps between the adjacent characters in the basic unit are not greater than the first space threshold, judging that the spaces do not exist between the adjacent characters. The fixed space width does not need to be used for judging the spaces, so that the precision of identifying the spaces in the document is improved and an identification result is more accurate and reliable.

Description

technical field [0001] The invention belongs to the technical field of information identification, in particular to a method and system for identifying spaces in documents. Background technique [0002] When converting documents in formats such as PDF (Portable Document Format, Portable Document Format) to documents in other formats (such as WORD, TXT, etc.), the characters in the document must be recognized, especially the characters between adjacent characters. Space to judge, so as to successfully form words and sentences. [0003] In the document, there are many reasons for the gap between adjacent characters, such as: the existence of spaces, character spacing set by the layout, kerning adjustment of text settings, and independent text objects. [0004] In the prior art, based on the minimum distance between adjacent characters in the full text of the document, it is determined whether the distance between all adjacent characters is less than the predetermined space wi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06F17/27G06F17/30
CPCG06F16/1794G06F40/163G06F40/279
Inventor 李云生晏检平
Owner WONDERSHARE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products