Paragraph recognition method, device and terminal equipment
A recognition method and paragraph technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problem of unable to recognize paragraphs, and achieve the effect of improving efficiency and accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0021] refer to figure 1 , shows a flow chart of steps of a paragraph recognition method according to Embodiment 1 of the present invention.
[0022] The paragraph identification method of the present embodiment comprises the following steps:
[0023] Step S102: Perform paragraph recognition on the content of the same document through various paragraph recognition rules.
[0024] Wherein, the document content includes multiple paragraphs. In the embodiment of the present invention, unless otherwise specified, document content refers to content in a text page without paragraph information, such as layout and typesetting. The layout of the layout is fixed, and the original editing layout is always displayed during the reading process, and the layout will not be automatically rearranged according to the page width after zooming. For example, PDF files made from scanned initial picture manuscripts, PDF graphics and plain text files made in fixed layout, etc.
[0025] In the em...
Embodiment 2
[0035] refer to figure 2 , shows a flow chart of steps of a paragraph recognition method according to Embodiment 2 of the present invention.
[0036] The paragraph identification method of the present embodiment comprises the following steps:
[0037] Step S202: Obtain various paragraph recognition rules.
[0038] Wherein, the various paragraph identification rules may include one or more of common paragraph identification rules, hanging paragraph identification rules and poetry paragraph identification rules. In this embodiment, the various paragraph identification rules set and used include the above three types.
[0039] Among them, the ordinary paragraph identification rule is used to identify paragraphs according to the settings of ordinary paragraphs. The settings include but are not limited to: the first character of the first line of the paragraph is indented, such as two characters; the last character of the last line of the paragraph and the document boundary have...
Embodiment 3
[0073] refer to Image 6 , shows a structural block diagram of a paragraph recognition device according to Embodiment 3 of the present invention.
[0074] The paragraph identification device in this embodiment includes: an identification module 302, configured to perform paragraph identification on the same document content through various paragraph identification rules, wherein the document content includes multiple paragraphs; an acquisition module 304, configured to acquire each paragraph identification A recognition result corresponding to the rule; a determination module 306 configured to determine the paragraph information of the document content according to the recognition result.
[0075]The paragraph recognition device of this embodiment is used to realize the corresponding paragraph recognition methods in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here.
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


