Word vector analysis-based online article belonging event detection method and device
A detection method and word vector technology, applied in network data retrieval, network data indexing, unstructured text data retrieval, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0066] The embodiment of the present invention provides a flow chart of a method for detecting events of network articles based on word vector analysis. figure 1 As shown, the method includes the following steps:
[0067] Step S110: Establish a training set with event labels;
[0068] Collect and establish network article samples with event tags from the network through web crawler technology, form all network article samples into a training set, and use a set number of users to mark the events of each network article sample, if there is more than the set ratio If users have inconsistent labeling results on the event to which a sample of a web article belongs, the sample of web articles is removed from the training set, and finally an optimized typical training set is obtained. Each web article sample included in the training set is labeled with a corresponding event label.
[0069] For example, let 7 users mark the events of each network article sample, if more than 3 users...
Embodiment 2
[0095] This embodiment provides a device for detecting events of network articles based on word vector analysis. The specific structure of the device is as follows: Figure 4 shown, including:
[0096] A typical training set building module 41, used to utilize network article samples with event tags to set up a typical training set;
[0097] The normalized web article sample text acquisition module 42 is used to segment each web article sample in the typical training set, remove useless words for preprocessing, and obtain a normalized web article sample text;
[0098] The multi-dimensional word vector acquisition module 43 corresponding to the network article sample text is used to extract features from each normalized network article sample text using the word2vec algorithm and the LDA algorithm, and fuse the word2vec features and LDA features of the extracted network article sample text , to obtain the multi-dimensional word vector corresponding to each network article samp...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 