Unlock instant, AI-driven research and patent intelligence for your innovation.

A Method for Identifying Scientific Formulas in Format Documents

A technology for format files and formulas, applied in the field of file processing

Active Publication Date: 2018-09-07
同方知网数字出版技术股份有限公司 +1
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The research on layout files mainly focuses on the methods of table recognition and blank space recognition in layout files, and there is no related method for formula recognition in layout files

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method for Identifying Scientific Formulas in Format Documents
  • A Method for Identifying Scientific Formulas in Format Documents
  • A Method for Identifying Scientific Formulas in Format Documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the embodiments and accompanying drawings.

[0019] Such as figure 1 As shown, it is the process of identifying scientific formulas in the format file, including:

[0020] Step 101 traverses the character stream information extracted from the format file, and performs content-based preprocessing on the character stream.

[0021] Preprocess the extracted character stream information, including redundant spaces and redundant characters that affect layout analysis and merging such as columns. Here, a content-based method is used to remove redundant characters; and a structure tree is designed to store the encoding information, coordinate information, and font size information of each character.

[0022] Step 102 generates a document layout through a layout analysis algorithm on the processed c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for recognizing scientific formulas in a layout file. The method includes: traversing character stream information extracted from the layout file, and preprocessing the extracted character stream information; subjecting a character stream after being preprocessed to a layout analysis algorithm to generate a file layout; extracting layout space layout features and content features as combined features; according to the space layout features and the content features, using a classifying algorithm based on dynamic weighting of the combined features to position and extract the scientific formulas; merging multiple rows of the formulas before processing. By the method, the scientific formulas in the layout file can be recognized quickly and accurately.

Description

technical field [0001] The invention relates to the technical field of file processing, in particular to a method for identifying scientific formulas in format files based on dynamic weighting of combined features. Background technique [0002] With the rapid development of science and technology, layout documents are widely used in various disciplines and various fields of life and production, and the number is huge. As a special information carrier, scientific formulas also widely exist in layout documents. [0003] In electronic format files, the storage formats of formulas are mainly divided into three types: text formulas, picture formulas, and text-picture mixed formulas. Among them, text-type formulas refer to formulas stored and displayed in character format; picture-type formulas refer to formulas stored and displayed in image format; text-picture mixed formulas mean that part of the formula is stored in image format and the other part is stored in character format...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00G06K9/20
Inventor 薛蓓邹季英袁仁慧
Owner 同方知网数字出版技术股份有限公司