Unlock instant, AI-driven research and patent intelligence for your innovation.

Text extraction processing method and device, terminal and storage medium

A processing method and text technology, applied in the field of text processing, can solve problems such as text content confusion, affecting subsequent processing of text content, semantic incompleteness, etc.

Pending Publication Date: 2020-10-30
BEIJING BYTEDANCE NETWORK TECH CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In many business fields, it is necessary to extract the text content of documents, such as classifying, clustering, information extraction and mining analysis of resumes, public company announcements, papers and other documents. For pdf files typed in columns, the text content extracted by tools may be confusing, resulting in incoherent and incomplete semantics of the extracted text content, which will greatly affect the subsequent processing of the extracted text content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text extraction processing method and device, terminal and storage medium
  • Text extraction processing method and device, terminal and storage medium
  • Text extraction processing method and device, terminal and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.

[0027] It should be understood that various steps described in the method implementation manners of the present disclosure may be executed in sequence and / or in parallel. Additionally, method embodiments may include additional steps and / or omit performing illustrated steps. The scope of the present disclosure is not limited in this regard.

[0028] As use...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text extraction processing method and device, a terminal and a storage medium. The text extraction processing method comprises the steps: extracting an original text to obtaina plurality of fields, wherein text lines in the original text extend in the first direction and are arranged in the second direction; selecting title fields from the fields, and grouping the title fields according to the positions of the title fields in the first direction; according to the position of the non-title field in the first direction, distributing the non-title field to each group; sorting the fields in the groups to obtain grouped texts, and combining the grouped texts to obtain a target text. According to the embodiment of the invention, the problems of text content incoherence,semantic incompleteness and semantic disorder caused by mutual interference of contents of different text columns after the column text is extracted in the prior art are solved.

Description

technical field [0001] The present disclosure relates to the technical field of text processing, and in particular to a text extraction and processing method, device, terminal and storage medium. Background technique [0002] In many business fields, it is necessary to extract the text content of documents, such as classifying, clustering, information extraction and mining analysis of resumes, public company announcements, papers and other documents. If you use a column typesetting pdf file, the text content extracted by the tool may be chaotic, resulting in incoherent and incomplete semantics of the extracted text content, which will greatly affect the subsequent processing of the extracted text content. Contents of the invention [0003] In order to solve the existing problems, the present disclosure provides a text extraction and processing method, device, terminal and storage medium. [0004] The present disclosure adopts the following technical solutions. [0005] I...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/258G06F40/205G06F40/189
CPCG06F40/258G06F40/205G06F40/189
Inventor 罗强
Owner BEIJING BYTEDANCE NETWORK TECH CO LTD