Multi-field-oriented power lexicon construction method

A construction method and multi-field technology, applied in the field of electric power lexicon construction, can solve the problems of accelerating professional corpus accumulation, lack of professional vocabulary, lack of recognition, error correction power lexicon production and operation management mechanism, etc., to improve the level of research and development and application capabilities, and the effect of promoting innovation and development

Pending Publication Date: 2021-07-23
STATE GRID ZHEJIANG ELECTRIC POWER +1
View PDF2 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The power industry has accumulated a large amount of text data, including text fragments in the power grid database, power-related documents on the internal and external networks, such as power science and technology papers, project reports, power regulations, power operation manuals, etc. These textual data and unstructured data have not yet been fully utilized
[0005] (2) The application of artificial intelligence lacks the support of the professional subject database of electric power
[0007] (3) Lack of a set of power lexicon production and operation management mechanism covering identification, error correction, generation, and service application
At present, a large part of the accumulation of many professional lexicons relies on the way experts and others sort out and confirm them. There is a lack of a set of online management mechanisms for professional vocabulary generation, management, and external services from identification, error correction, generation to service applications, which accelerates professional development. The accumulation of corpus facilitates the use of artificial intelligence applications

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-field-oriented power lexicon construction method
  • Multi-field-oriented power lexicon construction method
  • Multi-field-oriented power lexicon construction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0034] A multi-field-oriented power lexicon construction method, such as figure 1 shown, including the following steps:

[0035] Step 1, collect power-related documents, extract text information of power-related documents, and enumerate all text fragments in the text information, and the length of the text fragments is less than the set threshold;

[0036] Step 2, filter the text fragments according to the lexical-related statistical indicators, the filtered text fragments are candidate new words, and all candidate new words form a candidate lexicon;

[0037] Step 3: Compare the candidate new word in the candidate lexicon with the common vocabulary, if the candidate new word is a common vocabulary, discard the candidate new word, and if the candidate new word is not a common vocabulary, then define the candidate new word as formal new word;

[0038] Step 4, all formal new words form a professional thesaurus.

[0039] In the early stage of lexicon construction, due to the sm...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an electric power lexicon construction method oriented to multiple fields, which overcomes the defects in the prior art and comprises the following steps: step 1, collecting electric power related documents, extracting character information of the electric power related documents, enumerating all text segments in the character information, and enabling the length of each text segment to be smaller than a set threshold value; 2, the text fragments are filtered according to lexical related statistical indexes, the filtered text fragments are candidate new words, and all the candidate new words form a candidate word bank; 3, comparing the candidate new words in the candidate word bank with the common vocabularies, if the candidate new words are the common vocabularies, abandoning the candidate new words, If the candidate new words are not the common vocabularies, defining the candidate new words as formal new words; and 4, all the formal new words form a professional lexicon.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a multi-field-oriented electric power thesaurus construction method. Background technique [0002] The existing construction of electric power thesaurus is generally selected through manual screening, and there are the following problems: [0003] (1) A large number of professional data resources of electric power texts have not been utilized [0004] The power industry has accumulated a large amount of text data, including text fragments in the power grid database, power-related documents on the internal and external networks, such as power science and technology papers, project reports, power regulations, power operation manuals, etc. These textual data and unstructured data have not yet been be fully utilized. [0005] (2) The application of artificial intelligence lacks the support of the professional subject database of electric power [0006] Thesaurus is a colle...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/335G06F16/31G06F40/216G06F40/284G06F40/295
CPCG06F16/335G06F16/31G06F40/216G06F40/284G06F40/295
Inventor 王红凯冯珺刘瀚琳潘思辰王嘉琦赵帅彭梁英王仲锋丁雪花王永平汪娟玉蒋斌刘晓枫
Owner STATE GRID ZHEJIANG ELECTRIC POWER
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products