Method and software for digitalizing full text of standard document

A standard document, full-text technology, applied in the field of extended markup language technology, can solve the problems of a huge number of standard documents, incomplete results, and inability to achieve full-text retrieval functions.

Inactive Publication Date: 2010-08-04
广东省标准化研究院
View PDF0 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since the standard bibliography database has very little standard content, it is difficult to reflect all the content of the standard, so it is difficult to meet the requirements of the standard search for the relevant topics, and the results of the search are often incomplete. Even if the relevant standards are found, It is also difficult to see the relevant specification content, and the accuracy of the result search is not high
[0003] At present, although there are software tools that can search the full-text files of standard documents in word, PDF and other formats, it is difficult to meet the needs of users for standard information. The main reaso

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and software for digitalizing full text of standard document
  • Method and software for digitalizing full text of standard document
  • Method and software for digitalizing full text of standard document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0193] like figure 1 Shown is a schematic diagram of the framework of the standard document full-text digitizing software of the present invention.

[0194] The standard document digital processing system consists of an image module, a character module, a structured module, a standard document processing task scheduler and a quality inspection and submission program. The image module is mainly composed of an image scanning program, including scanning and importing processes. ;Characterization module is mainly composed of OCR recognition program, including layout analysis, OCR, proofreading, exporting processes; Structured module is composed of bibliographic information editor, structured full-text data editor, batch bibliographic editor, with bibliographic information Entry, structured full-text data production, batch bibliographic data sorting, batch bibliographic data quality inspection and storage process.

[0195] The database system is to store and manage the data in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and software for digitalizing the full text of a standard document, belongs to the technical field of standard documents and information, solves the problems of the full text retrieval and detailed retrieval of the standard document and realizes standard information text mining. Set out from the application prospect of the standard document, processes including visualizing, characterizing and structuring are performed; the digitalization processing method is performed by a scanned image processing module, an OCR identifying and correcting module, a standard title recording module, a structured full text making module and the like; and a standard full text XML format recording and defining file and a standard full text XML file are defined. According to the standard full text XML format recording and defining file and the standard full text XML file, the method and the software define schema file development software, realize data processing of a standard title, a single-layer PDF file, a double-layer PDF file, the full text XML file, a table, an image and the like, and realize image and table retrieval and data deriving in determined ranges, such as a standard preface, a foreword, a range, referenced files, terms and the like.

Description

technical field [0001] The invention belongs to the field of standard documents and information technology, and specifically relates to standard documents, information structuring technology, document typesetting structure, and extended markup language (XML) technology. Background technique [0002] Standard is the crystallization of technology accumulation. Standard document is a kind of scientific and technological document. It is a necessary technical document for modern enterprises to organize production, improve product quality, and promote product import and export. It is also the law for product inspection by technical supervision departments and commodity inspection departments. in accordance with. Especially in today's rapid development of science and technology, the latest standards are often the carrier of new technologies. In today's intense global competition, standards have become a prerequisite for companies to compete. At present, the relevant standards are...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/22G06F17/30
Inventor 刘华陈洪江黎东初毛君浩张晓丹
Owner 广东省标准化研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products