Mass data-based causal group extraction method and system, and computer readable storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An extraction method and mass data technology, applied in reasoning methods, computer components, calculations, etc., to achieve high reliability, improve accuracy, and reduce noise data

Pending Publication Date: 2022-06-28

GUANGZHOU DATASTORY INFORMATION TECH CO LTD

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In this prior art, there is no special optimization and improvement for the accuracy and redundancy of causality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0036] like figure 1 As shown, a first aspect of the present invention provides a method for extracting causal groups based on massive data, comprising the following steps:

[0037] S1: Obtain network texts and store them separately by time period;

[0038] It should be noted that the network text obtained in the embodiments of the present application is the content disclosed on the Internet. Hadoop distributed storage system HDFS.

[0039] S2: uniformly sample the acquired network text to obtain a sample set and pre-label the sample set;

[0040] The obtained Internet samples are uniformly sampled to obtain a sample set. It should be noted that the number of samples in the sample set should not be too small. In a specific implementation, the number of samples can be about 10,000. Then the sample set is pre-labeled. Pre-labeling is to use the method of keyword and regular matching to mark whether each sample contains causal relationship for the first time. For example, reg...

Embodiment 2

[0059] A second aspect of the present invention provides a system for extracting causal groups based on massive data. The system includes: a memory and a processor, wherein the memory includes a method program for extracting causal groups based on massive data. When the event group extraction method program is executed by the processor, the following steps are implemented:

[0060] S1: Obtain network texts and store them separately by time period;

[0061] It should be noted that the network text obtained in the embodiments of the present application is the content disclosed on the Internet. Hadoop distributed storage system HDFS.

[0062] S2: uniformly sample the acquired network text to obtain a sample set and pre-label the sample set;

[0063] The obtained Internet samples are uniformly sampled to obtain a sample set. It should be noted that the number of samples in the sample set should not be too small. In a specific implementation, the number of samples can be about 10...

Embodiment 3

[0084] This embodiment illustrates the method of the present invention by processing specific triples. For example, in a specific embodiment, the above-mentioned BERT+CRF model is used to perform causal extraction on network text to obtain the following triples:

[0085] [("Cool down by 5 degrees today", 0.55,"It will rain tomorrow"),("Cold wave is coming",0.9,"Down jacket sales increase"),("Wire short circuit",0.7,"Fire broke out"),( "Double Eleven is Coming", 0.65, "The manufacturer's down jacket is out of stock"), ("Cable short circuit", 0.55, "Cause a fire") ("Cold air goes south", 0.85, "Down jacket hot sale"), ("Temperature Dip", 0.55, "Down jacket in short supply"), ("Aging line", 0.9, "Fire hazard"), ("Down jacket in short supply", 0.55, "Temperature drop")]

[0086] Calculate the semantic vector of the above triplet, use the semantic vector to calculate the cosine distance, and use the cosine distance (cosine distance = 1-cosine similarity) as the metric index to clus...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a causal group extraction method and system based on mass data and a computer readable storage medium. The method comprises the following steps: acquiring a web text and storing the web text according to a time period; uniformly sampling the obtained web text to obtain a sample set, and pre-labeling the sample set; performing event labeling in a BIO format and causal relationship labeling on the pre-labeled text set; training the BERT + CRF model by using the data obtained by marking; carrying out causal extraction on the stored web text by utilizing a BERT + CRF model, and forming a triple in a preset format; clustering the triple through a clustering algorithm to obtain a causal group; and performing selection and reduction processing on the obtained causal group, and storing the reduced causal group. According to the method, the causality extraction accuracy is improved, noise data, redundant data and isolated data in an extraction result are reduced, and the method has relatively high reliability.

Description

technical field [0001] The invention belongs to the technical field of event graphs in artificial intelligence natural language processing, and more particularly, relates to a method, system and computer-readable storage medium for extracting causal event groups based on massive data. Background technique [0002] The traditional causality extraction scheme usually mainly considers the extraction of events that contain causal relationships and does not pay much attention to optimizing the accuracy and redundancy of the extracted causal relationships. Existing rule-based or statistical rule-based methods usually need to discover causal relations based on causal relation words, which cannot well discover hidden causal relations. However, the deep learning-based method adopted in this scheme uses the language model BERT pre-trained on large-scale corpus, so it can mine causality from semantic and contextual reasoning to a certain extent. [0003] The prior art discloses a meth...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N5/04G06F40/30G06K9/62

CPCG06N5/04G06F40/30G06F18/22G06F18/23213

Inventor 杨俊波何宇轩牟昊李旭日徐亚波

Owner GUANGZHOU DATASTORY INFORMATION TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Mass data-based causal group extraction method and system, and computer readable storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology