Semantic integration method for multi-source heterogeneous database

A semantic integration, multi-source heterogeneous technology, applied in the direction of unstructured text data retrieval, semantic tool creation, special data processing applications, etc., can solve problems such as not being able to adapt to application requirements

Pending Publication Date: 2021-03-26
CETC BIGDATA RES INST CO LTD
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the current integration effect based on heterogeneous data can no longer meet the increasingly complex application requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic integration method for multi-source heterogeneous database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] Using the above scheme, specifically include the following steps:

[0056] S1: Through the entity extraction model based on deep learning algorithm, domain-related entities are extracted from unstructured text, the start and end positions of the entities are obtained, and the corresponding categories of the entities are identified;

[0057] S2: Match the identified category of the entity to be aligned with the ontology concept in the knowledge graph to obtain a candidate set with the same category as the entity to be aligned;

[0058] S3: Obtain the graph representation of the entity to be aligned according to the entity context information of the unstructured text, and obtain the graph representation of the candidate entity according to the neighborhood relationship of the nodes in the knowledge graph;

[0059] S4: Use the deep reinforcement learning model to compare the graph representation of the candidate entity in the candidate set with the graph representation of ...

Embodiment 2

[0062] Using the above scheme, specifically include the following steps:

[0063] S11: Establish a label system, mark the unstructured text according to the label system, and construct a data set for entity extraction tasks;

[0064] S12: Using the pre-trained language model BERT, construct a sequence labeling model of BERT combined with conditional random field CRF. Based on this model, the entity extraction of the remaining unstructured text is completed, the start and end positions of the entity are obtained, and the corresponding category of the entity is identified;

[0065] Among them: for specific domain applications, the pre-trained language model BERT can first perform domain adaptation in large-scale unlabeled domain-related text corpora; for specific application tasks, task adaptation can be performed in task-related text corpora. To improve the performance of language model BERT in entity extraction tasks.

[0066] S21: According to the entity extraction result o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a semantic integration method for a multi-source heterogeneous database. The semantic integration method comprises the following steps: (1) entity extraction: extracting domain-related entities from an unstructured text based on an entity extraction model and identifying corresponding categories; (2) concept matching: performing matching with ontology concepts in the knowledge graph according to the corresponding categories to obtain candidate entity sets of the same category; (3) neighborhood matching: obtaining an aligned entity graph representation according to the context information of the related entities, and obtaining a candidate entity graph representation according to the domain relationship of the candidate entity set in the knowledge graph; and (4) comparison decision making: performing comparison decision making on the aligned entity graph representation and the candidate entity graph representation to obtain the most matched candidate entity arrangement as a matching result. According to the method, the deep reinforcement learning technology and multi-source heterogeneous database semantic integration are combined, the semantic mapping relationbetween knowledge in different forms is established, and semantic integration-based semantic retrieval, intelligent question and answer and other related applications can be better supported.

Description

technical field [0001] The invention relates to a multi-source heterogeneous database semantic integration method. Background technique [0002] With the development of information society, the fragmentation problem of multi-source heterogeneous database is becoming more and more prominent. In the era of big data, the current way of utilizing information resources is changing from relying on homogeneous structured data for information management to multi-source heterogeneous resource sharing for information integration management. However, the current integration effect based on heterogeneous data can no longer meet the increasingly complex application requirements. How to effectively use these structured and unstructured data and combine these independent and disparate databases is of great significance to realize the openness of information sharing and enhance the value of data utilization. How to realize the semantic fusion of unstructured text databases and structured ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/36G06K9/62
CPCG06F16/367G06F18/214Y02D10/00
Inventor 蔡惠民程序刘汪洋王胜漪
Owner CETC BIGDATA RES INST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products