Ontology fusion preprocessing method for multi-source heterogeneous resources

A multi-source heterogeneous and ontology technology, applied in the field of computer services, can solve problems such as inability to model, and achieve the effect of improving extraction efficiency

Active Publication Date: 2020-11-27
HARBIN INST OF TECH
View PDF11 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] At present, there are data in RDF, RDFS, OWL and XSD formats, and these formats can be conv

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Ontology fusion preprocessing method for multi-source heterogeneous resources
  • Ontology fusion preprocessing method for multi-source heterogeneous resources
  • Ontology fusion preprocessing method for multi-source heterogeneous resources

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0053] Specific implementation mode 1. This implementation mode provides a method for converting ontology modeling into JSON description, such as figure 2 As shown, the method includes the following steps:

[0054] Step S1, extracting entity concepts in the ontology model:

[0055] This step is mainly aimed at the entity concept in the ontology model, and the ontology model is essentially a graph structure, which satisfies the data structure of the graph. Traverse all the tags in the ontology model, filter out the owl:Class tag, and extract the entity concept at the same time, and store it in the Entity table of the relational database MySQL. According to the concept name of the stored entity, it is sorted according to the GBK encoding sequence, and duplicate entities are removed.

[0056] Step S2, extracting relational concepts in the ontology model:

[0057] This step is mainly aimed at the relationship concepts in the ontology model. Traversing all the tags in the onto...

specific Embodiment approach 2

[0062] Specific Embodiment 2. This embodiment provides a grammatical structure definition method for uniform fusion of heterogeneous (semi) structured data, such as image 3 As shown, the method includes the following steps:

[0063] Step S1, finding the obvious structure of the data to be obtained:

[0064] Heterogeneous (semi) structured data is chaotic, but it is also necessary to look for potential laws. The present invention uses regular expression matching rules and proposes two matching methods. The first is "character + colon + character" represented by '[a-zA-Z0-9]+\\:[a-zA-Z0-9]+'; the second is "character + equal sign + character "Represented by '[a-zA-Z0-9]+\\=[a-zA-Z0-9]+'.

[0065] Step S2, return the subscript of the content satisfying the regular expression structure in the source data:

[0066] In order to extract the data that satisfies the regular expression in step S1, use the findIndex(pattern, str) method to obtain the subscript of the matching string,...

specific Embodiment approach 3

[0084] Specific implementation mode three: Figure 5 Describes the ontology model of the faculty. Assistant_Professor, Staff_Member, Professor, Associate_Professor, and Staff_Member are subclasses of Academic_Staff_Member.

[0085] The steps of converting ontology modeling into JSON description are as follows: first step, extract the entity concept in the model, delete the label as owl:Class, extract the entities as Course, Literal, Professor, Assistant_Professor, Staff_Member, Academic_Staff_Member and Staff_Member, and set The above concepts are stored in the Entity table of Mysql. The second step is to extract the relationship concept in the model, delete the label as owl:ObjectProperty, and take out (Course, involves, Academic_Staff_Member), (staff_Member, id, Literal) and (staff_Member, phone, Literal) triples and store them in TDB database, and these relationships do not declare reflexivity, etc., do not need to be marked. The third step is to extract the attribute co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an ontology fusion preprocessing method for multi-source heterogeneous resources. The method comprises the following steps: S1, extracting an entity concept in an ontology model; S2, extracting a relation concept in the ontology model; S3, extracting an attribute concept in the ontology model; S4, exporting a JSON data file; S5, discovering an obvious structure of the datato be acquired, and expressing the two structures by using a regular expression; s6, returning subscripts of the content meeting the regular expression structure in the source data; s7, acquiring sub-strings meeting conditions; s8, further performing character string matching on the sub-character strings; s9, fusing the labels of the source data; and S10, storing the data, wherein the storage format is a structured format. According to the method, the ontology modeling result can be converted into the JSON narration, and the syntax structure definition for consistent fusion of heterogeneous (semi-) structured data is given.

Description

technical field [0001] The invention belongs to the technical field of computer services, and relates to a multi-source heterogeneous resource-oriented ontology fusion preprocessing method, in particular to a method for converting unstructured and semi-structured resources into structured resources. Background technique [0002] In recent years, with the rapid development of the Internet, knowledge graph technology has gradually been applied in various fields. Ontology refers to a formalized, explicit and detailed description of a shared conceptual system. Ontology provides a shared vocabulary, which is the relationship between object types or concepts and their attributes that exist in a specific domain. [0003] At present, the resources distributed on the Internet often exist in a decentralized and heterogeneous form, and also have the characteristics of redundancy, noise and incompleteness. Internet resources can be divided into three categories: unstructured resources...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/25G06F16/28G06F16/903G06F16/36
CPCG06F16/258G06F16/254G06F16/285G06F16/288G06F16/90344G06F16/367Y02D10/00
Inventor 张凯涂志莹初佃辉张麟宇申义黎阳
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products