Unlock instant, AI-driven research and patent intelligence for your innovation.

Word vector analysis-based online article belonging event detection method and device

A detection method and word vector technology, applied in network data retrieval, network data indexing, unstructured text data retrieval, etc.

Inactive Publication Date: 2016-09-28
BEIJING JIAOTONG UNIV +1
View PDF2 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, with the accelerated pace of life, people do not have much time to understand the current hot events; and due to the rapid spread of microblog information, negative news spreads too fast, and currently there is no effective mechanism to detect the emergence of negative news on the Internet

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector analysis-based online article belonging event detection method and device
  • Word vector analysis-based online article belonging event detection method and device
  • Word vector analysis-based online article belonging event detection method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0066] The embodiment of the present invention provides a flow chart of a method for detecting events of network articles based on word vector analysis. figure 1 As shown, the method includes the following steps:

[0067] Step S110: Establish a training set with event labels;

[0068] Collect and establish network article samples with event tags from the network through web crawler technology, form all network article samples into a training set, and use a set number of users to mark the events of each network article sample, if there is more than the set ratio If users have inconsistent labeling results on the event to which a sample of a web article belongs, the sample of web articles is removed from the training set, and finally an optimized typical training set is obtained. Each web article sample included in the training set is labeled with a corresponding event label.

[0069] For example, let 7 users mark the events of each network article sample, if more than 3 users...

Embodiment 2

[0095] This embodiment provides a device for detecting events of network articles based on word vector analysis. The specific structure of the device is as follows: Figure 4 shown, including:

[0096] A typical training set building module 41, used to utilize network article samples with event tags to set up a typical training set;

[0097] The normalized web article sample text acquisition module 42 is used to segment each web article sample in the typical training set, remove useless words for preprocessing, and obtain a normalized web article sample text;

[0098] The multi-dimensional word vector acquisition module 43 corresponding to the network article sample text is used to extract features from each normalized network article sample text using the word2vec algorithm and the LDA algorithm, and fuse the word2vec features and LDA features of the extracted network article sample text , to obtain the multi-dimensional word vector corresponding to each network article samp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the invention provide a word vector analysis-based online article belonging event detection method and device. The method mainly comprises the following steps: establishing a typical training set; carrying out pre-processing such as word segmentation and useless word removal on each online article sample in the typical training set to obtain normalized online article sample texts; extracting features of each normalized online article sample text by using a word2vec algorithm and an LDA algorithm so as to obtain a multi-dimensional word vector corresponding to each online article sample text; inputting the multi-dimensional word vector corresponding to each online article sample text and an event label into a random forest algorithm, wherein the random forest algorithm outputs a classification model for events; and recognizing to-be-recognized online article texts by utilizing the classification model for the events, and judging the events to which the to-be-recognized online article texts belong. According to the word vector analysis-based online article belonging event detection method and device, the information of online text samples is fully utilized, and the correctness of classifying the events to which the online text samples belong is improved.

Description

technical field [0001] The present invention relates to the technical field of network article event detection, in particular to a method and device for detecting network article events based on word vector analysis. Background technique [0002] With the rapid development of the Internet, especially the popularity and popularity of Weibo, we can easily share various events we know, hear, and see across regions. However, with the accelerated pace of life, people do not have much time to understand the current hot events; and due to the rapid spread of information on Weibo, negative news spreads too fast, and there is currently no effective mechanism to detect the emergence of negative news on the Internet. Therefore, an effective event detection system in the Internet is of great significance to facilitate people to quickly understand network events and to contain the spread of negative news early. [0003] In recent years, microblogging has become more and more popular, an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F16/951G06F40/205
Inventor 郎丛妍于兆鹏何伟明王涛冯松鹤杜雪涛杜刚张晨
Owner BEIJING JIAOTONG UNIV