Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

News event mining method and device, computer equipment and storage medium

A computer program and news technology, which is applied in the field of news event mining methods, devices, computer equipment and storage media, and can solve the problems of low mining accuracy and not considering contextual relations, etc.

Inactive Publication Date: 2018-06-15
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF7 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, because the same word may have different meanings in different contexts, sentences containing the same word have different meanings due to different word orders, and the BOW model is based on the number of different words contained in the text and the presence of each word in the text. The number of occurrences in the text is expressed as a vector, without considering the contextual relationship between the words in the text, so the mining accuracy of the news text mining method based on the BOW model is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • News event mining method and device, computer equipment and storage medium
  • News event mining method and device, computer equipment and storage medium
  • News event mining method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0039] The news event mining method, device, computer equipment and storage medium of the embodiments of the present invention will be described below with reference to the accompanying drawings.

[0040] In the Chinese language, the same word can express different meanings in different contexts, and different words can also express the same meaning, that is, there is a problem of polysemous synonyms in Chinese vocabulary. For example, "apple" means both a fruit and a technology company; "taxi" and "taxi" both mean taxis.

[004...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a news event mining method and device, computer equipment and a storage medium, wherein the method comprises the steps of: acquiring a plurality of news texts; for each news text, according to a semantic vector of each vocabulary in the news text, obtaining a semantic vector of the news text, wherein the semantic vector of each vocabulary carries semantic information of corresponding contexts; according to a similarity among the semantic vectors of the plurality of news texts, carrying out clustering on the plurality of news texts to obtain a plurality of clusters; according to release time distribution of the news texts contained in the plurality of clusters, determining a target cluster from the plurality of clusters; and according to a subject of the news texts contained in the target cluster, determining required news events. By the method, contents of the news texts can be effectively understood and shown in a form of the semantic vectors, and by carrying the semantic information of the corresponding contexts in the semantic vectors of the vocabularies and considering correlation of the contexts among the vocabularies, mining accuracy can be improved.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a news event mining method, device, computer equipment and storage medium. Background technique [0002] Text mining can summarize, classify, cluster, and correlate the content of document collections, etc. Mining news texts is conducive to automatic retrieval of news content to meet the needs of network users to retrieve news information. [0003] The traditional news text mining method is based on the Bag Of Word (BOW) model. After data preprocessing such as Chinese word segmentation and stop word removal are performed on the news text, the content of the preprocessed news text is analyzed using the BOW model. Then use the clustering algorithm to cluster, and get the required news events according to the clustering results. [0004] However, because the same word may have different meanings in different contexts, sentences containing the same word h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/9535
Inventor 徐敏王佳黄涛
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products