New event theme extraction method

An extraction method and technology of new events, applied in the field of network information, can solve the problems of huge resource expenditure, professional quality, poor performance of new data models, etc., and achieve the effect of simple method and accurate expression.

Active Publication Date: 2020-08-28
QINGDAO UNIV
View PDF13 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method still has the following disadvantages: first, this method can only be used for specific domain data sets, and is not suitable for general data sets in various fields; Professiona

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • New event theme extraction method
  • New event theme extraction method
  • New event theme extraction method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0025] Example:

[0026] In the embodiment of the present invention, the process of realizing new event theme extraction includes the following steps:

[0027] Step 1: Obtain the news event text data stream according to the event keywords, and construct the news event text data set according to the obtained news event text data stream. Each record in the text includes the event type label of the news text and the specific text description of the event. And the news event text data set is divided into training set Train, verification set Val and test set Test, specifically:

[0028] Step 1.1: Determine the keywords of specific news events according to the acquisition requirements of news event text data;

[0029] Step 1.2: For the determined news event keywords, build a data crawler system based on the Scrapy framework to obtain the news event text data link through the Baidu search engine, and obtain the news event text data stream;

[0030] Step 1.3: Standardize the text co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of network information and relates to a new event theme extraction method. The vectorization representation is carried out on the news event text data setbased on BERT; the context connection is closer; expression ways are more accurate, a bidirectional long-short memory network of an attention mechanism is utilized to learn news texts with large datavolume in the network; to discover new events, efficient and accurate utilization of the data is realized; compared with a single mode, a mode of combining a supervised method and an unsupervised method is more efficient, the method is simple, semantic information can be extracted deeply, news texts in a network can be analyzed and mined, new events can be discovered, real-time mastering of the new events by related supervision departments and personal users is facilitated, and subsequent work is facilitated.

Description

Technical field: [0001] The invention belongs to the field of network information technology, and relates to a method for extracting new event topics, especially a bidirectional long-short memory network training new event discovery model based on BERT and attention mechanism and multi-feature fusion topic modeling analysis to extract new event topics Methods. Background technique: [0002] With the development of the Internet in the era of big data, people are surrounded by a large number of news information from a wide range of sources, such as newspapers and the Internet. The most common carrier of news is text, which is the easiest way to obtain valuable information. Due to the variety of news information generated from different sources, the format of news texts and the information contained in them are often messy, and the amount of news information generated is also extremely large. It is almost impossible to completely rely on manual detection of Chinese news events....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/34G06F16/33G06F40/242G06F40/258G06F40/289G06F40/295G06K9/62G06N3/04G06N3/08
CPCG06F16/345G06F16/3344G06N3/08G06N3/045G06F18/23213
Inventor 云红艳贺英张秀华李正民
Owner QINGDAO UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products