Apparatus and method for name disambiguation clustering

A name and clustering technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problem of difficult text collection to achieve ideal clustering effect, etc., to improve the final clustering effect, improve self-adaptability, The effect of reducing clustering effect bias
CN102654881BInactive Publication Date: 2014-10-22FUJITSU LTD

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Patents(China)
Current Assignee / Owner
FUJITSU LTD
Publication Date
2014-10-22
Estimated Expiration
Not applicable Β· inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention provides a device and a method for name disambiguation clustering. The device for data processing on a name training set comprises the following units: a representative similarity determination unit for determining the representative similarity of the name training set, wherein the representative similarity is a representative value of the inter-textual similarity in the name training set; a preferable similarity threshold selection unit for clustering the name training set by using different similarity thresholds so as to select the similarity threshold which makes the clustering effect better as the preferable similarity threshold; and a function fitting unit for fitting a function which represents the corresponding relation between the representative similarity and the preferable similarity threshold according to the representative similarity and the preferable similarity threshold of each name training set in at least two name training sets.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to name disambiguation clustering, in particular to a device and method for name disambiguation clustering using dynamic thresholds. Background technique

[0002] Name disambiguation is a recently emerging research direction. Name disambiguation is proposed for the name ambiguity phenomenon caused by the same name (person name, place name, organization name, etc.) being used by multiple entities in reality. At present, most name disambiguation schemes use the method of text clustering . For example, when using a search engine to search for a certain name, a large number of webpages containing the name are returned as search results D={d 1 , d 2 ,...,d n}, the names in these webpages may point to different entities in reality, the purpose of clustering is to aggregate the text collections composed of these webpages into several categories according to different entities C={c 1 , c 2 ,...,c m}, where each class c i Correspon...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More