OWLHorst rule distributed type parallel reasoning algorithm in combination with Spark platform

A distributed and rules-based technology, applied in the field of the Semantic Web, can solve the problems of time-consuming startup, whether the rules can be activated or not, multiple redundant calculations, etc., to achieve the effect of reducing overhead

Active Publication Date: 2017-08-04
FUZHOU UNIV
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

J.Urbani and others use WebPIE to reason on the RDFS / OWL rule set, which can satisfy the parallel reasoning of big data; but the algorithm enables one or more MapReduce tasks for each rule to reason, because the startup of the job is relatively time-consuming, Therefore, with the increase of RDFS / OWL reasoning rules, the efficiency of overall reasoning is limited
Gu Rong and others proposed an efficient and scalable semantic reasoning engine (YARM) based on MapReduce, which enables reasoning to complete the reasoning of RDFS rules within one MapReduce task; but this algorithm is not suitable for reasoning of complex OWL rules
In addition, when a new triplet generated by a certain rule is repeated, YARM will have too many redundant calculations and generate useless data
Wang Jingbing and others proposed a distributed parallel reasoning algorithm for RDF data combined with Rete. This algorithm combines RDF data ontology to construct a list of schema triples and a rule label model; in the RDFS / OWL reasoning stage, combined with MapReduce to implement the alpha stage and In the beta stage, the distributed inference of the Rete algorithm can be realized; however, the algorithm needs to consume more memory when connecting to the beta network for inference and is inefficient when performing multiple iterations, so this algorithm is limited by the cluster memory and platform
Gu Rong and others proposed an efficient parallel reasoning engine (Cichlid) based on Spark, combined with the RDD programming model, optimized the parallel reasoning algorithm; but this algorithm does not consider whether the rules can be activated, and reasoning is required, resulting in Waste of inference performance and redundancy of transmission
[0004] Due to the rapid growth of Semantic Web data, the memory limitations of centralized environments are no longer suitable for reasoning on large-scale data
Although there are currently distributed inference engines that can achieve data parallel inference, but the number of MapReduce tasks is large and time-consuming, and complex OWL Horst rules cannot be inferred, and there are too many redundant calculations that generate useless data and consume A large amount of memory is inefficient for multiple iterations, which makes it impossible to efficiently and correctly implement the reasoning of RDFS / OWL rules when the amount of data increases

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • OWLHorst rule distributed type parallel reasoning algorithm in combination with Spark platform
  • OWLHorst rule distributed type parallel reasoning algorithm in combination with Spark platform
  • OWLHorst rule distributed type parallel reasoning algorithm in combination with Spark platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] The present invention will be further explained below in conjunction with the accompanying drawings and specific embodiments.

[0016] The present invention provides a kind of OWL Horst rule distributed parallel reasoning algorithm combined with Spark platform, and it comprises the following steps: DPRS algorithm mainly comprises the following several steps:

[0017] 1. Load pattern triplet set P j _RDD, O k _RDD and Rule m _linkvar_RDD and broadcast.

[0018] 2. Build a rule tag model Flag_Rule m and broadcast.

[0019] 3. To Flag_Rule m The rules in parallel execute the parallel inference of OWL Horst rules and output intermediate results.

[0020] 4. Remove duplicate triplets.

[0021] 5. If a new pattern triplet data is generated, then skip to 2, if a new instance triplet data is generated, then skip to 3, otherwise the algorithm ends.

[0022] Whole frame diagram of the present invention sees figure 1 .

[0023] Definition 1. (SchemaTriple) means that the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an OWLHorst rule distributed type parallel reasoning algorithm in combination with a Spark platform. According to the characteristics of Spark RDD, the principle of a TREAT platform is combined, an alpha register Om_RDD or Pt_RDD corresponding to a mode triad is constructed for RDF ontology data and broadcast, and a rule marking model is constructed; a mode first component of each rule is connected, a corresponding connecting mode triad set Rulem_linkvar_RDD is generated, and therefore the matching speed in the reasoning process is increased. At the OWL Horst reasoning stage, an alpha stage in a TREAT algorithm is achieved in combination with MapReduce, distributed parallel reasoning of multiple rules is achieved, and then the reasoning result is subjected to de-weight processing; a large number of instance triads can be filtered through the alpha register and the rule marking model, output of key assignment pairs at a Map stage is reduced, and therefore invalid network transmission is reduced.

Description

technical field [0001] The invention belongs to the technical field of semantic web, and in particular relates to a distributed parallel reasoning algorithm of OWLHorst rules combined with a Spark platform. Background technique [0002] The RDF and OWL standards in the Semantic World Wide Web have been widely used in various fields, such as general knowledge (DBpedia), medical life sciences (LODD), bioinformatics (UniProt), geographic information systems (Linkedgeodata) and semantic search engines (Watson )Wait. With the application of the Semantic World Wide Web, a large amount of semantic information has been produced. Due to the complexity and large-scale nature of the data, how to efficiently discover the hidden information in it through semantic information parallel reasoning is an urgent problem to be solved. Due to the rapid growth of Semantic Web data, the memory limitations of centralized environments are no longer suitable for reasoning on large-scale data. [0...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/24564G06F40/30
Inventor 汪璟玢叶怡新
Owner FUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products