Construction method of power grid equipment word segmentation dictionary and fault case database

A power grid equipment and word segmentation dictionary technology, applied in the direction of neural learning methods, neural architecture, semantic tool creation, etc., can solve problems such as insufficient mining of related information, insufficient support for maintenance decision-making, and low efficiency of retrieval and browsing, so as to facilitate intuitive understanding and improve The effect of application value and improving the accuracy of word segmentation

Active Publication Date: 2022-05-27
ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER COMPANY +2
View PDF17 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to overcome the deficiencies of the above technologies, the present invention provides a method for problems such as low retrieval and browsing efficiency, insufficient mining of related information, and insufficient support for maintenance decision-making in power grid equipment failure case text data, from data preprocessing, data mining, and data persistence. Proceeding from aspects such as , data application, etc., a good solution is proposed, and the construction method of power grid equipment word segmentation dictionary and fault case library is designed and implemented.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Construction method of power grid equipment word segmentation dictionary and fault case database
  • Construction method of power grid equipment word segmentation dictionary and fault case database
  • Construction method of power grid equipment word segmentation dictionary and fault case database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] The power grid equipment failure defect case text contains a large number of specialized terms, which are usually not included in the existing general word segmentation tools' lexicon. If a general word segmentation tool is used to segment the text in the power grid field, a large number of professional terms will be misclassified, which will affect the reliability of subsequent word vector training and text classification. Therefore, before word segmentation, expanding domain-specific words on the public domain dictionary of mature word segmentation tools, and constructing a word segmentation dictionary in the power grid field are crucial to improving the accuracy of subsequent steps.

[0055] Methods A semi-supervised method combining automatic labeling based on named entity recognition model and manual manual screening was used to construct a power grid domain dictionary. The process is shown in the appendix. figure 2 . Solving the professional compliance of the id...

Embodiment 2

[0067] Further, in step b), before performing text information extraction, extract pictures, file names, author information, filter labels and typo noise, import the extracted and filtered text into the word segmentation tool of the power grid dictionary for word segmentation, and complete the text preprocessing work. . The actual fault defect cases handled are usually written manually, and are rich text files including tables, pictures, texts and labels, such as pdf, word and other formats. Before extracting text information, information such as stored pictures, file names, and authors should be extracted, and noise such as labels and typos should be filtered. The processed text is accurately segmented in the word segmentation tool imported into the word segmentation dictionary in the power grid field, and the text preprocessing is completed.

Embodiment 3

[0069] Further, step c) comprises the steps:

[0070] c-1) The purpose of information extraction of power grid equipment fault text data is to extract meaningful information for the description of power grid equipment faults and defects through the analysis and processing of unstructured text data, and to form structured data, which is convenient for certain future targets. accurate retrieval of content information. Considering the diversity of power grid fault text descriptions, a unified attribute template is used to extract attributes from text data. The attribute types are divided into digital state attribute, phrase state attribute and sentence state attribute. The state quantity attribute is to be extracted by a rule-based method, the phrase type state quantity attribute is to be extracted by the entity matching method based on grammar rules, and the sentence type state quantity attribute is to be classified by distributed text representation and neural network model. I...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method for constructing word segmentation dictionary and fault case library for power grid equipment, constructing a word segmentation dictionary in the power grid field, then performing preprocessing such as format conversion and word segmentation on fault case data, and then using various technical means to analyze and generate a structured power grid from text data Information such as equipment failure cases, feature tags, keyword clouds, and association rules. Design a relational database schema for the above information, take the report as the main key, and store the above text information together with the pictures and author information retained in the preprocessing to form a power grid equipment failure case database. The word segmentation accuracy of text in the power grid field is improved, and the structured case database makes the retrieval based on the content of the case more accurate. The feature labels in the fault case database are used as an item set, and the effective association rules for mining faults are sorted out, which can be used for fault early warning. It fills the gap in the application of text analysis technology in the power grid field. It improves the application value of the corpus in the power grid field and reduces the cost of reference.

Description

technical field [0001] The invention relates to the technical field of industrial data and Internet informatization, in particular to a method for constructing a word segmentation dictionary of power grid equipment and a fault case database. Background technique [0002] With the development of intelligent technologies such as mobile Internet, Internet of Things, artificial intelligence, and deep learning, their applications in the power field are becoming more and more common. To build a smart grid, realize a high degree of integration of "power flow, information flow, and business flow" Integration is an integral part of technological development. In the field of electric power, various text data accumulated over many years, especially the research value of power grid equipment failure cases, can provide suggestions and experience in actual equipment maintenance work. Due to the complexity and uncertainty of its maintenance scenarios, how to construct a power grid word se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/36G06F16/35G06F16/33G06F40/211G06F40/242G06F40/247G06F40/295G06N3/04G06N3/08
CPCG06F16/374G06F16/35G06F16/3344G06F40/295G06F40/242G06F40/247G06F40/211G06N3/08G06N3/088G06N3/045Y04S10/50
Inventor 杨祎秦佳峰闫丹凤秦晔辜超林颖白德盟郑文杰刘萌朱庆东李杰朱文兵朱孟兆
Owner ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER COMPANY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products