Unlock instant, AI-driven research and patent intelligence for your innovation.

Word document information extraction method and device, electronic equipment and medium

A technology of document information and extraction method, which is applied in the direction of electronic digital data processing, unstructured text data retrieval, special data processing applications, etc., can solve the problems of high maintenance cost, poor compatibility, text merging errors, etc., and achieve maintenance cost High, poor compatibility, and reduced maintenance costs

Inactive Publication Date: 2021-02-26
SHANGHAI MININGLAMP ARTIFICIAL INTELLIGENCE GRP CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The applicant found in the research that in the prior art, when the data columns in the text increase or decrease, the code needs to be re-modified, and the existence of line breaks and column separation in the text is not considered, resulting in many errors in text merging. Therefore, this solution has poor compatibility and high maintenance costs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word document information extraction method and device, electronic equipment and medium
  • Word document information extraction method and device, electronic equipment and medium
  • Word document information extraction method and device, electronic equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only It is a part of the embodiments of this application, not all of them. The components of the embodiments of the application generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of the application. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Word document information extraction method and device, electronic equipment and a medium. The method comprises the steps of detecting whether a file extension name of a current Word document is docx or not; if the file extension name of the current Word document is docx, converting the current Word document into an xml file; extracting files of different file types in thexml file; and for each file type, performing information extraction on the file of the file type by adopting an extraction mode corresponding to the file type. According to the Word document information extraction scheme provided by the embodiment of the invention, a large number of rules do not need to be manually maintained, the compatibility can be improved, and meanwhile, the maintenance costis greatly reduced.

Description

technical field [0001] The present application relates to the technical field of document extraction, in particular to a Word document information extraction method, device, electronic equipment and media. Background technique [0002] Microsoft Word is a word processor with huge advantages in current use, which makes the Word file (.doc) dedicated to Word the most common standard in fact. Details of the Word file format are not publicly available. There is more than one Word file format, because with the update of the Word software itself, the file format will be more or less revised, and the new version of the format may not be read by the old version of the program (probably because the old version does not have built-in support for the new version. format capabilities). [0003] The current Word files basically use docx as the file extension. docx is the file extension of Microsoft Word. It is used in versions after Microsoft Office 2007. Its compressed file format bas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/151G06F40/295G06F16/36
CPCG06F16/367G06F40/151G06F40/295
Inventor 祝彦森孙靖文孙泽懿徐凯波
Owner SHANGHAI MININGLAMP ARTIFICIAL INTELLIGENCE GRP CO LTD