Entity coreference resolution method based on similarity

A technology of coreference resolution and similarity, applied in the field of data fusion, it can solve the problems of entity coreference resolution, duplicate names, incomplete data quality assessment system, and synonyms, etc., and achieves easy promotion, strong practicability, and easy handling effect of effect

Inactive Publication Date: 2017-01-25
QILU UNIV OF TECH
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, although existing methods can effectively identify entities in many applications, there are still many deficiencies: (1) At present, entity coreference resolution has the problem of duplicate names and heteronyms; (2) traditional entity coreference resolution The resolution method is often based on the similarity comparison of tuples to obtain results; (3) The system for data quality evaluation is not perfect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity coreference resolution method based on similarity
  • Entity coreference resolution method based on similarity
  • Entity coreference resolution method based on similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0029] Embodiment: a kind of similarity-based entity coreference resolution method, refer to the attached figure 1 , including the following steps:

[0030] Step 1: Preprocessing process.

[0031] Take the data description object as the entity, preprocess the data in the data set, select k fields in each piece of data as the key, and the entire data record as the value, forming a key-value pair form. As shown in the following table, there are multiple data records. Select the four items of product name, product ID, product price and color as the key, and the complete information of this record as the value of this record.

[0032]

[0033]

[0034] Step 2: Calculate the Cartesian product of the data set, that is, each piece of data is paired in pairs to form a data pair. For example, there are 4 pieces of data A, B, C, and D, which can form more than 6 data pairs of AB, AC, AD, BC, BD, and CD. Combined with Table 1, this record numbered 0201 can be combined with all ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an entity coreference resolution method based on similarity. The implementation process includes the steps that firstly, data in a data set is preprocessed to form data pairs, and the data pairs are entity pairs; secondly, weights are set, and similarity values are calculated and compared with a set threshold; thirdly, when the set threshold is reached, entity unification is carried out, that is, all the data pairs reaching the threshold are fused into one datum; when the set threshold is not reached, data summarization is carried out, and the data pair data is summarized to form a new data set, wherein a summarization result comprises the combined datum and data smaller than the threshold. Compared with the prior art, the weights and measurement indexes of similarity, a good processing effect is achieved, the requirement for entity coreference resolution in massive data processing can be met, effective guarantee is provided for entity coreference resolution, practicality is high, and popularization is easy.

Description

technical field [0001] The invention relates to the technical field of data fusion, in particular to a highly practical similarity-based entity coreference resolution method. Background technique [0002] With the continuous progress and development of industrial automation and information technology, various types of large-scale data are generated in enterprises in the industrial field. Structured, semi-structured and unstructured data are increasing exponentially, which brings great difficulties to enterprises to analyze and process and make better use of data. With the advent of the information age, all kinds of data are continuously generated, and entity coreference resolution is facing new difficulties and challenges: (1) The amount of data has increased sharply, the amount of calculation and the difficulty of calculation have increased, and calculation efficiency has also become a problem. Problems that need to be solved urgently; (2) data sources are diverse, there a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/288
Inventor 耿玉水李鹏赵晶
Owner QILU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products