Method, device and storage medium for automatic construction of event corpus based on dual mode

An automatic construction and corpus technology, applied in the field of data processing, can solve problems such as low accuracy, incomplete coverage, and high cost, and achieve the effects of improving accuracy, saving costs, and complete content

Active Publication Date: 2022-07-05
EAST CHINA UNIV OF SCI & TECH +1
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The inventors found that there are at least the following problems in the prior art: when constructing most news topic event corpora, experts often need to manually label news information related to topic events, which is not only inefficient but also high in cost
And for news events, generally a theme event will have many related sub-theme events, it is difficult to collect all relevant event corpus during manual labeling, resulting in incomplete corpus, incomplete coverage, and low accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, device and storage medium for automatic construction of event corpus based on dual mode
  • Method, device and storage medium for automatic construction of event corpus based on dual mode
  • Method, device and storage medium for automatic construction of event corpus based on dual mode

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to make the objectives, technical solutions and advantages of the embodiments of the present invention clearer, the various embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can appreciate that, in the various embodiments of the present invention, many technical details are set forth in order for the reader to better understand the present application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solutions claimed in the present application can be realized.

[0031] The first embodiment of the present invention relates to a method for automatically constructing an event corpus based on a dual mode. The specific process is as figure 1 The specific process is as follows:

[0032] Step 101: Obtain a first theme event keyword input by a user.

[0033] Wherein, in thi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiments of the present invention relate to the field of data processing, and disclose a method, a device, and a storage medium for automatically constructing an event corpus based on a dual mode. The method for automatically constructing an event corpus based on a dual mode includes: acquiring a first theme event keyword input by a user; performing retrieval according to the first theme event keyword to obtain a first theme event corpus, and expanding the first theme event corpus to obtain a first theme event corpus. Second-theme event corpus; the third-theme event corpus is obtained according to the correlation between the second theme event corpus and the theme, and the third event corpus constitutes a corpus. By providing an automatic construction method of event corpus based on dual mode, it does not need experts to label news information related to topic events, thus improving the efficiency of constructing corpus and saving labor costs. Moreover, all relevant event corpora can be automatically collected, making the corpus more complete and more accurate.

Description

technical field [0001] Embodiments of the present invention relate to the field of data processing, and in particular, to a method, device and storage medium for automatically constructing an event corpus based on a dual mode. Background technique [0002] In recent years, with the rapid development of network technology, Internet data has become the main source for people to obtain information due to the advantages of rapid updating, wide range, and easy access. According to statistics, the vast majority of network data is stored in the form of text, recording a large number of news events, and these news events often revolve around a certain theme. In the era of big data, all news events related to a certain topic are filtered out from massive data, and a corpus of news topic events is constructed, which is helpful for the mining and analysis of news events. [0003] The inventor found that there are at least the following problems in the prior art: when constructing most...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/295G06F40/30
CPCG06F40/295G06F40/30
Inventor 过弋王志宏
Owner EAST CHINA UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products