Apparatus and method for name disambiguation clustering

A name and clustering technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problem of difficult text collection to achieve ideal clustering effect, etc., to improve the final clustering effect, improve self-adaptability, The effect of reducing clustering effect bias

Inactive Publication Date: 2014-10-22
FUJITSU LTD
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, if a fixed threshold is used for clustering, it is difficult to achieve an ideal clustering effect for text sets with different similarity characteristics

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Apparatus and method for name disambiguation clustering
  • Apparatus and method for name disambiguation clustering
  • Apparatus and method for name disambiguation clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] Embodiments of the present invention will be described below with reference to the drawings. It should be noted that representation and description of components and processes that are not related to the present invention and known to those of ordinary skill in the art are omitted from the drawings and descriptions for the purpose of clarity.

[0019] figure 1 is a block diagram showing the configuration of an apparatus for performing data processing on a name training set according to an embodiment of the present invention.

[0020] like figure 1 As shown, the apparatus 100 for performing data processing on the name training set includes a representative similarity determination unit 110 , a preferred similarity threshold selection unit 120 and a function fitting unit 130 .

[0021] Each name training set in the name training set to be processed by the apparatus 100 includes a plurality of texts for the same name, and the clustering relationship of the plurality of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a device and a method for name disambiguation clustering. The device for data processing on a name training set comprises the following units: a representative similarity determination unit for determining the representative similarity of the name training set, wherein the representative similarity is a representative value of the inter-textual similarity in the name training set; a preferable similarity threshold selection unit for clustering the name training set by using different similarity thresholds so as to select the similarity threshold which makes the clustering effect better as the preferable similarity threshold; and a function fitting unit for fitting a function which represents the corresponding relation between the representative similarity and the preferable similarity threshold according to the representative similarity and the preferable similarity threshold of each name training set in at least two name training sets.

Description

technical field [0001] The invention relates to name disambiguation clustering, in particular to a device and method for name disambiguation clustering using dynamic thresholds. Background technique [0002] Name disambiguation is a recently emerging research direction. Name disambiguation is proposed for the name ambiguity phenomenon caused by the same name (person name, place name, organization name, etc.) being used by multiple entities in reality. At present, most name disambiguation schemes use the method of text clustering . For example, when using a search engine to search for a certain name, a large number of webpages containing the name are returned as search results D={d 1 , d 2 ,...,d n}, the names in these webpages may point to different entities in reality, the purpose of clustering is to aggregate the text collections composed of these webpages into several categories according to different entities C={c 1 , c 2 ,...,c m}, where each class c i Correspon...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 王新文夏迎炬孟遥张姝贾文杰于浩
Owner FUJITSU LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products