Unsupervised power document theme generation method and system

It is an unsupervised technology for document topics, which is applied in text database query, electronic digital data processing, and unstructured text data retrieval. hard to get effect

Active Publication Date: 2019-11-01
STATE GRID INFORMATION & TELECOMM GRP +3
View PDF9 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Neither of the above two methods can perform topic extraction on data in a specific field. After the first method classifies the documents, the sentences suitable for the topic may contain a large number of sentences that are not related to the specific field. At the same time, the second method is using LDA There are also situations where domain-independent data is modeled when performing topic modeling

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised power document theme generation method and system
  • Unsupervised power document theme generation method and system
  • Unsupervised power document theme generation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] In order to make the above-mentioned features and advantages of the present invention more comprehensible, the specific implementation manners of the present invention will be further described below in conjunction with the accompanying drawings.

[0056] This embodiment provides an unsupervised power document topic generation method to realize the unsupervised rapid generation of power document topics, such as figure 1 shown, including the following steps:

[0057] 1. Match the data that is relevant to the electric power field in the original data. The specific steps are as follows:

[0058] 1.1 Raw data is collected from the State Grid Public Opinion Monitoring System. Data collection sources include text publishing platforms such as WeChat official account, Sina Weibo, Tieba, forums and news;

[0059] 1.2 The collected raw data includes the title and content of the document, and organize the document;

[0060] 1.3 Randomly take some original data from the document,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an unsupervised power document topic generation method and system, which are used for quickly generating document topics in the field of power. According to the method, firstly,a correlation analysis method is used for screening document data related to the specific field, then a clustering method is used for finding documents of the same category, theme extraction is carried out on the documents, and the documents are applied to a theme extraction system, so that theme extraction of the specific field is more feasible.

Description

technical field [0001] The invention relates to document subject extraction, in particular to a method and system for generating an unsupervised power document subject, and belongs to the fields of natural language processing and computer software systems. Background technique [0002] In recent years, with the rapid development of the Internet, the data on various news publishing platforms has grown exponentially. How to compress and extract massive and messy data with high quality so that users can efficiently search for useful information from these data has become a natural Research focus in the field of language processing. Data compression and extraction mainly involves document topic technology, and document topic extraction is divided into extraction and generation. The extractive topic method is to evaluate and score the sentences in the original text, and select a few sentences that best represent the gist of the original text as the full-text topic. The generati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F16/33G06F16/35
Inventor 刘迪陈静崔迎宝陈薇邱镇王腾蛟刘园园
Owner STATE GRID INFORMATION & TELECOMM GRP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products