Power grid equipment word segmentation dictionary and fault case library construction method

A power grid equipment and word segmentation dictionary technology, applied in the direction of neural learning methods, neural architecture, semantic tool creation, etc., can solve problems such as low retrieval and browsing efficiency, insufficient maintenance decision-making support, and insufficient mining of related information, so as to facilitate intuitive understanding and improve The effect of application value and improving the accuracy of word segmentation

Active Publication Date: 2021-04-30
ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER COMPANY +2
View PDF17 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to overcome the deficiencies of the above technologies, the present invention provides a method for problems such as low retrieval and browsing efficiency, insufficient mining of related information, and insufficient support for maintenance decision-making in power grid equipment failure case text data, from data preprocessing, data mining, and data persistence. Proceeding from aspects such as , data application, etc., a good solution is proposed, and the construction method of power grid equipment word segmentation dictionary and fault case library is designed and implemented.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Power grid equipment word segmentation dictionary and fault case library construction method
  • Power grid equipment word segmentation dictionary and fault case library construction method
  • Power grid equipment word segmentation dictionary and fault case library construction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] The case texts of power grid equipment failure defects contain a large number of technical terms, which are usually not included in the dictionaries of existing general word segmentation tools. If a general word segmentation tool is used to segment text in the power grid domain, a large number of professional terms will be misclassified, which will affect the reliability of subsequent word vector training and text classification. Therefore, before word segmentation, it is very important to expand the domain-specific words on the public domain dictionary of mature word segmentation tools, and to build a word segmentation dictionary in the power grid field to improve the accuracy of subsequent steps.

[0055] Method A semi-supervised method combining automatic labeling based on named entity recognition model and manual screening is used to construct a power grid domain dictionary. The process is as follows figure 2 . Solving the professional compliance of the identified...

Embodiment 2

[0066] Further, in step b), before extracting text information, extract pictures, file names, and author information, filter label and typo noise, import the extracted and filtered text into the word segmentation tool of the power grid domain dictionary for word segmentation, and complete the text preprocessing work . The fault defect cases that are actually handled are usually manually written, and are rich text files including tables, pictures, text and labels, such as pdf, word and other formats. Before extracting text information, information such as stored pictures, file names, and authors should be extracted, and noise such as labels and typos should be filtered. The processed text is imported into the word segmentation tool of the word segmentation dictionary in the above-mentioned power grid field for precise word segmentation, and the text preprocessing work is completed so far.

Embodiment 3

[0068] Further, step c) includes the following steps:

[0069] c-1) The purpose of extracting power grid equipment fault text data information is to extract meaningful information about power grid equipment faults and defects through the analysis and processing of unstructured text data, and form structured data, which is convenient for future Accurate retrieval of content information. Considering the diversity of power grid fault text descriptions, a unified attribute template is used for attribute extraction when extracting text data. The attribute types are divided into digital state quantity attributes, phrase state quantity attributes and sentence state quantity attributes. The state quantity attribute is proposed to be extracted using a rule-based method, the phrase-type state quantity attribute is proposed to be extracted using the entity matching method based on grammatical rules, and the sentence-type state quantity attribute is proposed to be classified using a distr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a power grid equipment word segmentation dictionary and a fault case library construction method. The method comprises the steps: constructing a power grid field word segmentation dictionary, carrying out the format conversion and word segmentation of fault case data, carrying out the analysis and generation of structured power grid equipment fault cases, feature tags, keyword clouds and association rules from text data through employing a plurality of technical means. and designing a relational database Schema for the information, taking a report as a main key, and storing the text information and information such as pictures, authors and the like reserved in preprocessing in a library to form a power grid equipment fault case library. According to the method, the word segmentation accuracy of the power grid field text is improved, the structured case database enables retrieval according to case contents to be more accurate, the feature tags in the fault case database serve as item sets, effective association rules of faults are sorted and mined, the method can be used for fault early warning, and the blank of application of the power grid field text analysis technology is filled up. The application value of corpora in the power grid field is improved, and the consulting cost is reduced.

Description

technical field [0001] The invention relates to the technical field of industrial data and Internet informatization, in particular to a method for constructing a power grid equipment word segmentation dictionary and a fault case database. Background technique [0002] With the development of intelligent technologies such as mobile Internet, Internet of Things, artificial intelligence, and deep learning, their applications in the electric power field are becoming more and more common. To build a smart grid, to achieve a high degree of integration of "power flow, information flow, and business flow" Integration of technology is an indispensable part of technological development. In the field of electric power, all kinds of text data accumulated over the years, especially the research value of power grid equipment failure cases, can provide suggestions and experience in actual equipment maintenance work. Due to the complexity and uncertainty of its maintenance scenarios, how t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F16/35G06F16/33G06F40/211G06F40/242G06F40/247G06F40/295G06N3/04G06N3/08
CPCG06F16/374G06F16/35G06F16/3344G06F40/295G06F40/242G06F40/247G06F40/211G06N3/08G06N3/088G06N3/045Y04S10/50
Inventor 杨祎秦佳峰闫丹凤秦晔辜超林颖白德盟郑文杰刘萌朱庆东李杰朱文兵朱孟兆
Owner ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER COMPANY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products