Entity classification method for data space

A data-oriented and classification-oriented technology, applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve problems such as classification

Inactive Publication Date: 2016-11-02
HARBIN ENG UNIV
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to solve the problem that the entity cannot be classified by assuming that the entity is in a static state under the evolutionary environment, and propose a data space-oriented entity classification method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity classification method for data space
  • Entity classification method for data space
  • Entity classification method for data space

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0023] The data space-oriented entity classification method of this embodiment combines figure 1 As shown in the flowchart, the method is realized through the following steps:

[0024] Step 1. For the evolved data space entity, propose an evolutionary K-Means clustering framework, that is, define an objective cost function based on contour values ​​and KL-divergence;

[0025] Step 2, designing a method for measuring the similarity of entities in data space;

[0026] Step 3. Propose an evolutionary K-Means clustering algorithm, and solve the initial point selection problem and the evolutionary data space entity classification problem;

[0027] Step 4. Expand the K-Means clustering framework evolved in Step 1 when the number of clusters changes over time or snapshot entities are added or removed over time.

specific Embodiment approach 2

[0028] Different from the specific embodiment 1, the data space-oriented entity classification method of this embodiment of the present embodiment proposes an evolutionary K-Means clustering framework as described in step 1, that is, defines the The process of the objective cost function is,

[0029] Step 11. Define the total objective cost function by linear combination:

[0030] The cost function consists of two parts: the snapshot cost of the current time step and the historical cost of the historical time step, respectively denoted as Cost snapshot and Cost temporal ;; The former is only used to measure the snapshot quality of the current clustering result about the current entity information, which reflects the clustering algorithm's metrics. Obviously, the higher the snapshot cost, the lower the snapshot quality; while the latter is based on the current clustering The degree of fitting between the structure and the historical cluster structure is used to measure the ti...

specific Embodiment approach 3

[0066] The difference from the specific embodiment 1 or 2 is that in the data space-oriented entity classification method of this embodiment, the process of designing the data space entity similarity measurement method described in step 2 is as follows:

[0067] On the one hand, the snapshot entity itself contains rich information, such as structured attribute information and unstructured content information; on the other hand, in the data space environment, entities reappear over time, and this historical occurrence pattern information is very important for judging Whether two entities are similar also plays a role. For this reason, the data space entity is the snapshot entity, and the similarity of the snapshot entity is measured according to the entity's own information and the entity's historical occurrence mode information, that is, the similarity function of the snapshot entity is determined by the self-similarity Composed of two parts, sex and historical similarity, the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an entity classification method for a data space and belongs to the natural language processing field. Under an evolutionary environment, entities cannot be classified through supposing that the entities are in static states. The method includes the following steps that: an improved and evolutionary K-Means clustering framework is put forwards for evolutionary data space entities, namely, a contour value and KL-divergence-based objective cost function is put forward; a novel data space entity similarity measure method is put forward; an evolutionary K-Means clustering algorithm is put forward according to a heuristic rule; and the evolutionary clustering framework put forward in the invention is further expanded, so that a condition that the number of clusters changes with the time or snapshot entities are added or deleted with the time can be processed. With the method of the invention adopted, the clustering result of current entities can be captured in a high-quality manner, and historical clustering situations can be reflected robustly.

Description

technical field [0001] The invention relates to a data space-oriented entity classification method. Background technique [0002] Data space integration is one of the important ways to construct data space. Since the data space is faced with large-scale data with diverse structures, complex semantic relations, and distributed storage, data space integration mainly includes two aspects of work: (1) entity integration; (2) entity-relationship integration. At present, the existing data space integration work] mainly focuses on entity-relationship integration and proposes some effective strategies or methods. However, the research on entity integration [44] Relatively small. Therefore, it is of great significance to study data space integration (especially entity integration). As an important step of entity integration, entity classification has a wide range of applications, for example, query question answering system, relation extraction, data space query, machine translati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213G06F18/24
Inventor 王念滨王红滨周连科祝官文何鸣王瑛琦宋奎勇
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products