News automatic derivation association mechanism method and system

An automatic derivation and news technology, applied in the Internet field, can solve the problems such as the inability to effectively remove duplicate or redundant news and events, the low accuracy of event extraction, and the difficulty of event extraction, so as to remove duplicate or redundant news and events, The effect of improving the accuracy of event extraction and reducing the requirements of syntactic analysis

Inactive Publication Date: 2020-05-19
南京思通聚宝信息技术有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example: a piece of news contains multiple people, events, places, and institutions. Over time, the amount of news information has grown explosively. It is impossible to manually analyze each event, each topic, and each news information. Perspective analysis and analysis of the implicit relationship between each other, some events with a long span are difficult to extract, the existing event extraction based on syntactic analysis has high requirements for syntactic analysis, and the recall rate of event extraction is low; at the same time, There is a problem of limited accuracy of the existing single dependency parsing; it cannot effectively remove repeated or redundant news and events, and the accuracy of event extraction is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • News automatic derivation association mechanism method and system
  • News automatic derivation association mechanism method and system
  • News automatic derivation association mechanism method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] see figure 1 , a method for automatically deriving affiliated organizations from news, comprising the following steps:

[0054] S1. Establishment of basic corpus: set the scope of data source collection for news information according to ongoing events, and build a basic corpus, and collect information based on the current basic corpus;

[0055] S2. News information collection: use Internet data collection tools to collect corresponding news sentences for news media, financial media, and financial institutions according to the basic corpus established in S1;

[0056] S3. Analysis of news information: After obtaining a large amount of news information data through S2, it is necessary to perform text analysis on the collected news sentences;

[0057] S4. News information identification: perform multiple entity unit identification on the news information after text analysis, and mark the identified entity units;

[0058] S5. News information association: judge and analyze t...

Embodiment 2

[0061] see figure 2 , as another preferred embodiment of the present invention, the difference from Embodiment 1 is:

[0062] The formation of the basic corpus in step S1 includes the following steps:

[0063] S11. According to the basic corpus to be formed, first build a learning model;

[0064] S12. According to the collection scope, use the learning model to collect corresponding news sentences;

[0065] S13. Analyzing the collected news sentences to obtain an analysis result;

[0066] S14. According to the analysis result, conduct attitude analysis to determine whether the news sentence has corpus value, and if so, add the corresponding news sentence to the basic corpus, and finally complete the formation of the basic corpus.

Embodiment 3

[0068] As another preferred embodiment of the present invention, the difference from Embodiment 1 is that the data collection of the Internet data collection tool in step S2 includes: pre-determining the initial grab seed sample, pre-determined web page classification directory and classification The seed samples corresponding to the directory, the captured data samples displayed and marked by simulating the user browsing process, and the search-style data capture of large vertical websites by pre-setting keywords.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a news automatic derivation association mechanism method, which comprises the following steps of establishing a basic corpus: setting a data source collection range for news information according to a performed event, and establishing the basic corpus; news information collection: adopting an internet data collection tool to collect corresponding news statements; news information analysis, wherein text analysis needs to be conducted on the collected news sentences after a large amount of news information data is obtained; news information recognition: identifying a plurality of entity units of news information after text analysis, and marking the news information; news information association: carrying out association on the marked entity units; and news result derivation: deriving a final news event according to the established association relationship. The invention further provides an automatic news derivation association mechanism system. According to the method, the problem of extraction of part of events with relatively long spans is solved, and the event extraction recall rate is increased; Meanwhile, the event extraction accuracy is improved.

Description

technical field [0001] The invention belongs to the Internet technology, and in particular relates to a method and system for automatically deriving a news association mechanism. Background technique [0002] Digging valuable clues in relevant news media, financial media, and financial institutions cannot use manual methods to penetrate and analyze the direct, joint, and related relationships between the entire event. For example: a piece of news contains multiple people, events, places, and institutions. Over time, the amount of news information has grown explosively. It is impossible to manually analyze each event, each topic, and each news information. Perspective analysis and analysis of the implicit relationship between each other, some events with a long span are difficult to extract, the existing event extraction based on syntactic analysis has high requirements for syntactic analysis, and the recall rate of event extraction is low; at the same time, There is a probl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F16/9535G06F40/117G06F40/205G06F40/289
CPCG06F16/367G06F16/9535
Inventor 黄毅王涛王义
Owner 南京思通聚宝信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products