Engineering archive structured data extraction method, system and device and storage medium

A technology of structured data and extraction methods, applied in structured data browsing, structured data retrieval, visual data mining, etc., to achieve the effect of improving the efficiency of archives digitization

Pending Publication Date: 2022-07-15
STATE GRID FUJIAN ELECTRIC POWER CO LTD
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this existing technology needs to manually prepare a large number of template libraries in advance, and at the same time, it only selects the template library in a simple keyword matching manner, and it is difficult to avoid the situation that the template library is not applicable.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Engineering archive structured data extraction method, system and device and storage medium
  • Engineering archive structured data extraction method, system and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] see figure 1 , a method for extracting structured data from engineering archives, comprising the following steps:

[0057] The engineering file rule base is constructed according to the historical engineering files and the engineering file management method. The engineering files in this embodiment are specifically the engineering project files of the State Grid. Therefore, the historical power grid engineering files are collected and the engineering files are constructed by the power grid engineering project management method and the power grid management method. The rule base extracts structured data from a large number of historical data of power grid engineering archives, and generates a large number of rule meta-attributes through manual verification and multiple rounds of iterations;

[0058] The pre-trained text extraction model collects the structured data of the professional vocabulary of historical engineering archives as the original data, uses the data minin...

Embodiment 2

[0074] This embodiment provides a system for generating structured data of engineering archives, including:

[0075] The engineering file rule base building module is used for constructing the engineering file rule base according to the historical engineering file and the engineering file management method, and the engineering file rule base includes several rule meta-attributes;

[0076] The text extraction model training module is used to collect the structured data of the professional vocabulary of historical engineering archives as the original data, use the data mining technology to extract the rules from the original data, form the benchmark model, and use the extracted rules for unsupervised learning, Label the data other than the rules to iteratively train the benchmark model to obtain a text extraction model;

[0077] The input module is used to obtain the input data from the project file, preprocess the input data, and input the preprocessed data to the pre-training ...

Embodiment 3

[0094] The present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the engineering described in any of the embodiments of the present invention when the processor executes the program Archival structured data extraction methods.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an engineering file structured data extraction method, which comprises the following steps of: constructing an engineering file rule base which comprises a plurality of rule element attributes; pre-training a text extraction model, collecting structured data of professional vocabularies of historical engineering archives as original data, and performing iterative training by using unsupervised learning to obtain the text extraction model; obtaining input data from the project file, preprocessing the input data, inputting the preprocessed data into a pre-training text extraction model, and extracting text vocabularies; performing feature association and data cleaning processing on the text vocabularies to obtain text metadata; performing character matching on the text metadata to obtain a plurality of character attributes in the text metadata; performing rule matching in an engineering file rule base through character attributes, and determining rule element attributes matched with the text metadata, so as to determine entities associated with the text metadata; and generating structured data according to the entities associated with the text metadata.

Description

technical field [0001] The invention relates to a method, system, equipment and storage medium for extracting structured data of engineering archives, and belongs to the technical field of digital processing of engineering archives. Background technique [0002] With the rise and popularization of the field of artificial intelligence, the repetitive work content has been greatly reduced. Deep learning can be used to train models to meet our daily repetitive work, reduce the cost of manual input, and greatly enhance the value of work. The field of artificial intelligence is widely used. Archives business usually needs to save some important historical data, and the data is often stored in different ways with the technology of the times. From the original paper documents to the current digital data, a large amount of data needs to be stored according to Manual analysis and description, engineering archives is one of them. It contains important information in the process of eng...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/26G06F16/215G06K9/62G06N3/04G06N3/08G06V30/422
CPCG06F16/26G06F16/215G06N3/08G06N3/044G06F18/2415
Inventor 邹永增魏宏俊翁非张望华黄云飞林衍
Owner STATE GRID FUJIAN ELECTRIC POWER CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products