Unlock instant, AI-driven research and patent intelligence for your innovation.

Entity classification method and device for corpus data in a thermal power generation field

A classification method and entity technology, applied in text database clustering/classification, electrical digital data processing, natural language data processing, etc., can solve problems such as unable to record and classify

Active Publication Date: 2019-04-05
YGSOFT INC
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When performing entity classification on power generation corpus data, since the names of equipment in daily records may have differences in expression due to different personal usage habits, the corresponding records cannot be classified correctly when using standard equipment names for classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity classification method and device for corpus data in a thermal power generation field
  • Entity classification method and device for corpus data in a thermal power generation field
  • Entity classification method and device for corpus data in a thermal power generation field

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] Preferred embodiments of the present invention will be specifically described below in conjunction with the accompanying drawings, wherein the accompanying drawings constitute a part of the application and are used together with the embodiments of the present invention to explain the principles of the present invention.

[0046] The embodiment of the present invention discloses a method for entity classification of corpus data in the field of thermal power generation, such as figure 1 shown, including the following steps:

[0047] Step S1, perform initial classification on the text set S to be classified containing corpus data in the field of thermal power generation;

[0048] 1) Create input data for classification;

[0049] The input data specifically includes:

[0050] Text collection S to be classified: {s 1 ,s 2 ,···,s i ,···s m}, where s i is a certain text record in the collection, which corresponds to a certain entity in the equipment entity, and m is the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an entity classification method and device for thermal power generation field corpus data, and belongs to the technical field of thermal power generation, and the method comprises the steps: carrying out the primary classification of a to-be-classified text set S containing the thermal power generation field corpus data, and obtaining a successfully classified text set S1and an unsuccessfully classified text set S2; Extracting entity new words in the unsuccessful classification text set S2, and establishing an entity new word list E; And performing entity alignment onthe entity new words in the entity new word list and the successfully classified text set S1 one by one, and confirming the entity types of the entity new words. According to the method, the text data in the thermal power generation field are utilized, the unsupervised professional vocabulary discovery algorithm and the text classification algorithm are comprehensively adopted, entity classification of the power generation corpus data is achieved, and the constructed thermal power generation professional lexicon can also be used for corpus support of text data mining in the field.

Description

technical field [0001] The invention relates to the technical field of thermal power generation, in particular to an entity classification method and device for corpus data in the thermal power generation field. Background technique [0002] As a typical un / semi-structured data, the processing of text data has always been one of the hotspots of data mining. [0003] Analysis and mining of text data in the field of thermal power generation For regular defect inventory of thermal power generation companies and the construction of enterprise knowledge graphs for long-term information construction of enterprises, assist enterprises to understand the operation and health status of production equipment from the overall level and perform multi-dimensional data fusion It is of great significance to dig deep knowledge. [0004] At present, text data analysis and mining in the field of thermal power generation is still in its infancy. The main reason is that the document data accumu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F16/36G06F17/27
CPCG06F40/295
Inventor 唐静彭一轩解来甲
Owner YGSOFT INC